10,000 Matching Annotations
  1. Oct 2025
    1. Reviewer #3 (Public review):

      Summary:

      The authors performed wide-field and 2-photon imaging in vivo in awake head-fixed mice, to compare receptive fields and tonotopic organization in thalamocortical recipient (TR) neurons vs corticothalamic (CT) neurons of mouse auditory cortex. TR neurons were found in all cortical layers while CT neurons were restricted to layer 6. The TR neurons at nominal depths of 200-400 microns have a remarkable degree of tonotopy (as good if not better than tonotopic maps reported by multiunit recordings). In contrast, CT neurons were very heterogenous in terms of their best frequency (BF), even when focusing on the low vs high frequency regions of primary auditory cortex. CT neurons also had wider tuning.

      Strengths:

      This is a thorough examination using modern methods, helping to resolve a question in the field with projection-specific mapping.

      Weaknesses:

      There are some limitations due to the methods, and it's unclear what the importance of these responses are outside of behavioral context or measured at single timepoints given the plasticity, context-dependence, and receptive field 'drift' that can occur in cortex.

      (1) Probably the biggest conceptual difficulty I have with the paper is comparing these results to past studies mapping auditory cortex topography, mainly due to differences in methods. Conventionally, tonotopic organization is observed for characteristic frequency maps (not best frequency maps), as tuning precision degrades and best frequency can shift as sound intensity increases. The authors used six attenuation levels (30-80 dB SPL) and report that the background noise of the 2-photon scope is <30 dB SPL, which seems very quiet. The authors should at least describe the sound-proofing they used to get the noise level that low, and some sense of noise across the 2-40 kHz frequency range would be nice as a supplementary figure. It also remains unclear just what the 2-photon dF/F response represents in terms of spikes. Classic mapping using single-unit or multi-unit electrodes might be sensitive to single spikes (as might be emitted at characteristic frequency), but this might not be as obvious for Ca2+ imaging. This isn't a concern for the internal comparison here between TR and CT cells as conditions are similar, but is a concern for relating the tonotopy or lack thereof reported here to other studies.

      (2) It seems a bit peculiar that while 2721 CT neurons (N=10 mice) were imaged, less than half as many TR cells were imaged (n=1041 cells from N=5 mice). I would have expected there to be many more TR neurons even mouse for mouse (normalizing by number of neurons per mouse), but perhaps the authors were just interested in a comparison data set and not being as thorough or complete with the TR imaging?

      (3) The authors definitions of neuronal response type in the methods needs more quantitative detail. The authors state: ""Irregular" neurons exhibited spontaneous activity with highly variable responses to sound stimulation. "Tuned" neurons were responsive neurons that demonstrated significant selectivity for certain stimuli. "Silent" neurons were defined as those that remained completely inactive during our recording period (> 30 min). For tuned neurons, the best frequency (BF) was defined as the sound frequency associated with the highest response averaged across all sound levels." The authors need to define what their thresholds are for 'highly variable', 'significant', and 'completely inactive'. Is best frequency the most significant response, the global max (even if another stimulus evokes a very close amplitude response), etc.

      Comments on revisions:

      I think the authors misunderstood my point about sound level and characteristic frequency vs best frequency tonotopic maps. Yes, many studies of cortical responses present stimuli at higher intensities than the characteristic frequencies, but as tuning curves widen with sound level, the macroscopic tonotopic organization of primary auditory cortex breaks down at higher intensities. This is why most of the classic studies of tonotopy e.g., from the Merzenich lab) generated maps of characteristic frequency. As I mentioned before, this isn't so much of an issue for the authors' comparisons of TR vs CT organization in their own study, but in general, this makes it difficult to compare aspects of spatially-organized tonotopy from imaging studies with the older electrophysiological 'truer' tonotopic maps. That said, this means that CT cells also might be tonotopically organized if the authors had been able to look at lower intensity tuning properties.

    1. eLife Assessment

      This study presents a valuable assessment of and solid evidence for increased similarity in visual appearance combined with increased chemical differences between two butterfly species in sympatry compared with differences between three populations of one of the two species in allopatry. The similarity in visual appearance hints to an evolutionary response to shared predators (but alternative explanations are possible). Thus, the difference in chemical signaling likely helps to avoid between-species mating in sympatry.

    2. Joint Public Review:

      Summary:

      Ledamoisel et al. examined the evolution of visual and chemical signals in closely related Morpho butterfly species to understand their role in species coexistence. Using an integrative, state-of-the-art approach combining spectrophotometry, visual modeling, and behavioral mate choice experiments, they quantified differences in wing iridescence and assessed its influence on mate preference in allopatry and sympatry. They also performed chemical analyses to determine whether sympatric species exhibit divergent chemical cues that may facilitate species recognition and mate discrimination. The authors found iridescent coloration to be similar in sympatric Morpho species. Furthermore, male mate choice experiments revealed that in sympatry, males fail to discriminate conspecific females based on coloration, reinforcing the idea that visual signal convergence is primarily driven by predation pressure. In contrast, the divergence of chemical signals among sympatric species suggests their potential role in facilitating species recognition and mate discrimination. The authors conclude that interactions between ecological pressures and signal evolution may shape species coexistence.

      Strengths:

      The study is well-designed and integrates multiple methodological approaches to provide a thorough assessment of signal evolution in the studied species. We appreciate the authors' careful consideration of multiple selective pressures and their combined influence on signal divergence and convergence. Additionally, the inclusion of both visual and chemical signals adds an interesting and valuable dimension to the study, enhancing its importance. Beyond butterflies, this research broadens our understanding of multimodal communication and signal evolution in the context of species coexistence.

      Reviewing Editor comment:

      The authors have improved their submission after revisions and responded to the previous concerns of the reviewers.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      In this study, Ledamoisel et al. examined the evolution of visual and chemical signals in closely related Morpho butterfly species to understand their role in species coexistence. Using an integrative, state-of-the-art approach combining spectrophotometry, visual modeling, and behavioral mate choice experiments, they quantified differences in wing iridescence and assessed its influence on mate preference in allopatry and sympatry. They also performed chemical analyses to determine whether sympatric species exhibit divergent chemical cues that may facilitate species recognition and mate discrimination. The authors found iridescent coloration to be similar in sympatric Morpho species. Furthermore, male mate choice experiments revealed that in sympatry, males fail to discriminate conspecific females based on coloration, reinforcing the idea that visual signal convergence is primarily driven by predation pressure. In contrast, the divergence of chemical signals among sympatric species suggests their potential role in facilitating species recognition and mate discrimination. The authors conclude that interactions between ecological pressures and signal evolution may shape species coexistence.

      Strengths:

      The study is well-designed and integrates multiple methodological approaches to provide a thorough assessment of signal evolution in the studied species. I appreciate the authors' careful consideration of multiple selective pressures and their combined influence on signal divergence and convergence. Additionally, the inclusion of both visual and chemical signals adds an interesting and valuable dimension to the study, enhancing its importance. Beyond butterflies, this research broadens our understanding of multimodal communication and signal evolution in the context of species coexistence.

      Weaknesses:

      (1) The broader significance of the findings needs to be better articulated. While the authors emphasize that comparing adaptive traits in sympatry and allopatry provides insights into selective processes shaping reproductive isolation and coexistence, it is unclear what key conceptual or theoretical questions are being addressed. Are these patterns expected under certain evolutionary scenarios? Have they been empirically demonstrated in other systems? The authors should explicitly state the overarching research question, incorporate some predictions, and better contextualize their findings within the existing literature. If the results challenge or support previous work, that should be highlighted to strengthen the study's importance in a broader context.

      We thank the reviewer for their valuable feedback. We understand that the framing of the results and the discussion may fail to convey the broader significance of our findings. In the first version of the manuscript, we framed our manuscript around the processes shaping reproductive isolation and co-existence in sympatry, but now realize that this question was too broad in regards to our results. We thus strictly focused on outlining the importance of ecological interactions in the evolution of traits in sympatric species. In the revised version of the manuscript, we rewrote the first paragraph of the introduction to introduce context regarding the effect of ecological interactions on trait evolution (lines 43-60). We then explicitly introduce the theoretical question investigated in our paper (i.e. “we investigate how ecological interactions in sympatry can constrain natural and sexual selection shaping trait evolution”, lines 62-63) and our predictions regarding the evolution of traits in sympatry vs. allopatry (lines 74-80). We also added predictions regarding our experiments on Morpho at the end of the introduction (lines 146-157). As a result, the discussion is now better aligned with the introduction, by discussing the putative effect of predation and mate choice on the evolution of wing iridescence in Morpho.

      (2) The motivation for studying visual signals and mate choice in allopatric populations (i.e., at the intraspecific level) is not well articulated, leaving their role in the broader narrative unclear. In particular, the rationale behind experiments 1, 2, and 3 is not well defined, as the authors have not made a strong case for the need for these intraspecific comparisons in the introduction. This issue is further compounded by the authors' primary focus on signal evolution in sympatry throughout both the results and the discussion. For instance, the divergence of iridescence in allopatry is a potentially interesting result. But the authors have not discussed its implications.

      We now clearly state in the introduction our motivation for studying visual signals and mate choice in allopatric populations (lines 74-80, lines 146-157). We argued that intraspecific comparisons help identify whether visual cues can be used in mate recognition between phylogenetically close subspecies, between whom visual resemblance is supposed to be higher than between closely-related species (tetrad experiment, and experiment 1). As M. h. bristowi and M. h. theodorus have different wing pattern, we also used this comparison to identify the traits involved in male mate preference within a species, testing the importance of iridescent color (experiment 2) or iridescent patterning (experiment 3). The results of those experiments can then be used to assess whether these traits are used in species recognition between sympatric species. See also our answers to recommendations 11 and 15 from reviewer #1.

      Overall, given that the primary conclusions are based on results and analyses in sympatry, the role of allopatric populations in shaping these conclusions needs to be better integrated and justified. Without a stronger link between the comparative framework and the study's key takeaways, the use of allopatric populations feels somewhat peripheral rather than central to the study's aim. Since the primary conclusions remain valid even without the allopatric comparisons, their inclusion requires a clearer rationale.

      To make a stronger case for the use of the allopatric population in our manuscript, we strengthened the justification behind the study of intraspecific allopatric populations vs. interspecific sympatric populations, as the iridescence measurements and the mate choice experiments in allopatric populations can serve as a baseline in studying how species interactions can shape the evolution of traits and mate recognition when compared to sympatric populations. Following your major comment #1, we rewrote the introduction to include a justification to the need for studying allopatric vs. sympatric populations (lines 74-80), and also further highlighted the need to study iridescence in sympatric species to fully understand the trait evolution of sympatric species in the discussion (339-343).

      (3) While the authors demonstrate that iridescence is indistinguishable to predators in sympatry, they overstate the role of predation in driving convergence. The present study does not experimentally demonstrate that iridescence in this species has a confusion effect or contributes to evasive mimicry. Alternatively, convergence could result from other selective forces, such as signal efficacy due to environmental conditions, rather than being solely driven by predation.

      We acknowledge that our study does not directly demonstrate that iridescence contributes to evasive mimicry. We did tone down the interpretation of the results in the discussion and state that predation is not the only selective pressure that could have promoted a convergent evolution of iridescence in sympatric species, as iridescence is a trait that could be involved in thermoregulation (lines 346-353) and camouflage (lines 363-369) for example. We made sure to mention that convergence in iridescent signals in sympatry is only an indirect support to the evasive mimicry hypothesis, and that further research is still needed, including direct predation experiments, to show that this convergence is indeed triggered by predation (lines 391-396).  

      Reviewer #2 (Public review):

      This study presents an investigation of the visual and chemical properties and mating behaviour in Morpho butterflies, aimed at addressing the nature of divergence between closely related species in sympatry. The study species consists of three subspecies of Morpho helenor (bristowi, theodorus, and helenor), and the conspecific Morpho achilles achilles. The authors postulate that whereas the iridescent blue signals of all (sub)species should function as a predator reduction signal (similar to aposematism) and therefore exhibit convergence, the same signals should indicate divergence if used as a mating signal, particularly in sympatric populations. They also assess chemical profiles among the species to assess the potential utility of scent in mediating species/sex discrimination.

      The authors first used reflectance spectrometry to calculate hue, brightness, and chroma, plus two measures of "iridescence" (perhaps better phrased as angular dependence) in each (sub)species. This indicated the ubiquitous presence of sexual dimorphism in brightness (males brighter), which also appears to be the case for iridescence (Figure 3A-B). Analysis of these data also indicated that whereas there is evidence for divergence among subspecies in allopatry, the same evidence is lacking for species in sympatry (P = 0.084). This was supported further by visual modelling, which showed that both conspecifics and birds should be (theoretically) capable of perceiving the colour difference among allopatric populations of M. helenor, whereas the same is not true for the sympatric species.

      The authors then conducted mate choice trials, first using live individuals and second using female dummies. The live experiments indicated the presence of assortative mating among the two subspecies of M. helenor (bristowi and theodorus). The dummy presentations indicated (a) bristowi males prefer conspecific wings, whereas theodorus have no preference, (b) bristowi males prefer the con(sub)specific colour pattern, (c) theodorus prefer the con(sub)specific iridescence when the pattern is manipulated to be similar among female dummies. A fourth experiment, using sympatric M. achilles and M. helenor, indicated no preference for conspecific female dummies. Finally, chemical analysis indicated substantial differences between these two species in putative pheromone compounds, and especially so in the males.

      The authors conclude that the similarity of iridescence among species in sympatry is suggestive of convergence upon a common anti-predation signal. Despite some behavioural evidence in favourof colour (iridescence)-based mate discrimination, chemical differences between Achilles and Helenor are posed as more likely to function for species isolation than visual differences.

      Overall, I enjoyed reading this manuscript, which presents a valiant attempt at studying visual, chemical and behavioural divergence in this iconic group of butterflies.

      Major comments

      My only major comment concerns the authors' favoured explanation for aposematism (or evasive mimicry) for convergence among species, which is based upon the you-can't-catch-me hypothesis first presented by Young 1971. Although there is supporting work showing that iridescent-like stimuli are more difficult to precisely localize by a range of viewers, most of the evidence as applied to the Morpho system is circumstantial, and I'm not certain that there is widespread acceptance of this hypothesis. Given that the present study deals with closely-related  (sub)species, one alternative explanation - a "null" hypothesis of sorts - is for a lack of divergence (from a common starting point) as opposed to evolutionary convergence per se. in other words, two subspecies are likely to retain ancestral character states unless there is selection that causes them to diverge. I feel that the manuscript would benefit from a discussion of this alternative, if not others. Signalling to predators could very well be involved in constraining the extent of convergence, but this seems a little premature to state as an up-front conclusion of this work. There is also the result of a *dorsal* wing manipulation by Vieira-Silva et al. 2024 which seems difficult to reconcile in light of this explanation. Whereas this paper is cited by the authors, a more nuanced discussion of their experimental results would seem appropriate here.

      We thank the reviewer for their constructive comments on our manuscript. We appreciate the reviewer’s concern regarding the way iridescence convergence between sympatric species is discussed in our manuscript, which align with similar concerns raised by Reviewer 1. Indeed, the you-can't-catch-me hypothesis has not been yet empirically tested in Morpho, this is currently a working hypothesis only supported by indirect lines of evidence.

      Among the 30 known Morpho species, iridescence is most likely the ancestral character, notably because iridescence is a trait shared by a majority of Morpho (we now mention this in the introduction lines 108-110). In this paper, we thus did not aim to identify the evolutionary forces involved in the appearance of iridescence in this group, but rather wanted to understand to what extent ecological interactions can impact the diversification (or not) of this trait. As such, the dorsal manipulations performed in Vieira-Silva et al 2024 showing that iridescence in Morpho may have a similar effect than crypsis does not impact our working hypothesis. Instead, we use VieraSilva et al 2024 to discuss the potential anti-predator effect of iridescence, that could potentially promote convergent evolution of iridescent patterns.

      In the main text, we now clearly mention our null hypothesis: under a scenario of neutral evolution of iridescence, we would expect that the divergence in wing coloration between two M. helenor subspecies would be lower than between two different Morpho species (M. helenor and M. achilles) and showed that our results sharply differ from this null expectation.

      We then improved the discussion by adding alternative hypotheses potentially explaining the convergent iridescent signal detected in sympatric species: we discussed the expected effect under neutral evolution (lines 339-343), but also added alternative hypotheses regarding the diversification of iridescence due to camouflage (lines 363-369), predator evasion (lines 373-377) and thermoregulation (lines 346-353).

      Reviewer #3 (Public review):

      The authors investigated differences in iridescence wing colouration of allopatric (geographically separated) and sympatric (coexisting) Morpho butterfly (sub)species. Their aim was to assess if iridescence wing colouration of Morpho (sub)species converged or diverged depending on coexistence and if iridescence wing colouration was involved in mating behaviour and reproductive isolation. The authors hypothesize that iridescence wing colouration of different (sub)species should converge in sympatry and diverge in allopatry. In sympatry, iridescence wing colouration can act as an effective antipredator defence with shared benefits if multiple (sub)species share the same colouration. However, shared wing colouration can have potential costs in terms of reproductive interference since wing colouration is often involved in mate recognition. If the benefits of a shared antipredator defence outweigh the costs of reproductive interference, iridescence wing colouration will show convergence and alternative mate recognition strategies might evolve, such as chemical mate recognition. In allopatry, iridescence wing colouration is expected to diverge due to adaptation to different local conditions and no alternative mate recognition is expected.

      Strengths:

      (1) Using allopatric and sympatric (sub)species that are closely related is a powerful way to test evolutionary hypotheses

      (2) By clearly defining iridescence and measuring colour spectra from a variety of angles, applying different methods, a very comprehensive dataset of iridescence wing colouration is achieved.

      (3) By experimentally manipulating wing coloration patterns, the authors show visual mate recognition for M. h. bristowi and could, in theory, separate different visual aspects of colouration (patterns VS iridescence strength).

      (4) Measurements of chemical profiles to investigate alternative mate recognition strategies in case of convergence of visual signals.

      Weaknesses:

      In my opinion, studies should be judged on the methods and data included, and not on additional measurements that could have been taken or additional treatments/species that should be included, since in most ecological and evolutionary studies, more measurements or treatments/species can always be included. However, studies do need to ensure appropriate replication and appropriate measurements to test their hypothesis AND support their conclusions. The current study failed to ensure appropriate replication, and in various cases, the results do not support the conclusions.

      First, when using allopatric and sympatric (sub)species pairs to test evolutionary hypotheses, replication is important. Ideally, multiple allopatric and sympatric (sub)species pairs are compared to avoid outlier (sub)species or pairs that lead to biased conclusions. Unfortunately, the current study compares 1 allopatric and 1 sympatric (sub)species pair, hence having poor (no) replication on the level of allopatric and sympatric (sub)species pairs,

      We would like to thank the reviewer for their constructive feedback. We agree that replication is important to test evolutionary hypotheses and that our study lacks replication for allopatric and sympatric Morpho populations. Ideally, one would require several allopatric and sympatric replicates to conclude on the effect of species interaction in trait evolution. Our study is a preliminary attempt at answering this question, covering a few Morpho populations but proposing a broad assessment of iridescence and mate preference for those populations. We clearly mentioned in the discussion that investigating multiple populations is needed to test whether the trend we observed in this paper can be generalized (line 388-392).

      Second, chemical profiles were only measured for sympatric species and not for allopatric (sub)species, which limits the interpretation of this data. The allopatric (sub)species could have been measured as non-coexistence "control". If coexistence and convergence in wing colouration drives the evolution of alternative mate recognition signals, such alternative signals should not evolve/diverge for allopatric (sub)species where wing colouration is still a reliable mate recognition cue. More importantly, no details are provided on the quantification of butterfly chemical profiles, which is essential to understand such data. It is unclear how the chemical profiles were quantified and what data (concentrations, ratios, proportions) were used to perform NDMS and generate Figure 5 and the associated statistical tests.

      We recognize that having the chemical profiles of the genitalia of the Morpho from the allopatric populations would have made a stronger case in favor of reinforcement acting on the divergence of the chemical compounds found on the genitalia of the sympatric Morpho species. Due to limited access to the biological material needed at the time of the chromatography, we could not test for lower divergence in the chemical profiles of allopatric Morpho butterflies. We made sure to mention this limitation in the discussion (lines 457-461). 

      We already stated in the methods that we compiled the area under the peak of each components found in the chromatograms of our samples and that we performed all the statistical analyses on this dataset. To make it clearer, we mention in the new version of the manuscript that the area under the peak of each component allows to measure the concentration of the components (in the methods lines 720, 723, 733). We also added some precisions in the legend of Figure 5.

      Third, throughout the discussion, the authors mention that their results support natural selection by predators on iridescent wing colouration, without measuring natural selection by predators or any other measure related to predation. It is unclear by what predators any of the butterfly species are predated on at this point

      We made sure to mention in the introduction (line 132-136) and in the discussion (line 373-377) that previous predation experiments performed on Morpho and other butterflies showed evidence that birds are likely predators for these species. These observations lead us to test for the putative effect of predation on the evolution of their color pattern, without directly testing predatory rates. We made sure this information is transparent in the revised manuscript, and now precise that assessing wing convergence is only an indirect way of testing the escape mimicry hypothesis (line 393-396).

      To continue on the interpretation of the data related to selection on specific traits by specific selection agents: This study did not measure any form of selection or any selection agent. Hence, it is not known if iridescent wing colouration is actually under selection by predators and/or mates, if maybe other selection agents are involved or if these traits converge due to genetic correlations with other traits under selection. For example, Iridescent colouration in ground beetles has functions as antipredator defence but also thermo- and water regulation. None of these issues are recognized or discussed.

      The lack of discussion of alternative selective pressures involved in the evolution of iridescence was pointed out by all reviewers. We thus modified the text to account for this comment, and no longer limit our discussion to the putative effects of predation. We now specifically discuss alternative hypotheses, including crypsis (362-369) and thermoregulation (line 346-353).

      Finally, some of the results are weakly supported by statistics or questionable methodology.

      Most notably, the perception of the iridescence coloration of allopatric subspecies by bird visual systems. Although for females, means and errors (not indicated what exactly, SD, SE or CI) are clearly above the 1 JND line, for males, means are only slightly above this line and errors or CIs clearly overlap with the 1 JND line. Since there is no additional statistical support, higher means but overlap of SD, SE or CI with the baseline provides weak statistical support for differences.

      We thank the reviewer for bringing interpretation issues concerning the chromatic distances of allopatric Morpho species measured with a bird vision model. We made sure to be nuanced in the description of this graph in the results section (line 208-212). Note that this addition does not change our main conclusion stating that Morpho and predator visual models better discriminate iridescence differences between allopatric subspecies than between sympatric species.

      We now also clearly mention in the figure’s legend that the error bars represent the confidence intervals obtained after performing a bootstrap analysis, in addition to the mention of the nature of the error bars already mentioned in the methods (line 580).

      Regarding the assortative mating experiment, the results are clearly driven by M. bristowi. For M. theodorus, females mate equally often with conspecifics (6 times) as with M. bristowi (5 times). For males, the ratio is slightly better (6 vs 3), but with such low numbers, I doubt this is statistically testable. Overall low mating for M. bristowi could indicate suboptimal experimental conditions, and hence results should be interpreted with care.

      We recognize that the tetrad experiment results are mainly driven by M. bristowi’s behavior as already mentioned in the results (line 231-232) but we now also mention it in the discussion (lines 401-402). This experiment would have benefited from more replicates, but the limited access to live males and virgin females for both subspecies was a limiting factor. Fisher’s exact test used to assess assortative mating is specifically appropriate to small sample sizes. We recognize that the sampling size is not ideal, however it is still statistically testable.

      Regarding the wing manipulation experiment, M. theodorus does not show a preference when dummies with non-modified wings are presented and prefers non-modified dummies over modified dummies. This is acknowledged by the authors but not further discussed. Certainly, some control treatment for wing modification could have been added.

      The use of controls to consider the effect of wing modification and odor by the permanent marker were already mentioned in the methods (lines 636-639). Following your recommendation and comments from the other reviewers, we now mention the use of this control in the results (lines 278283). We also address a potential issue that would have resulted in the rejection of these modified dummies by live males: we cannot be sure whether butterflies perceive these modifications as equivalent to natural coloration (lines 281-282). An additional control could have been used, adding black ink on the black dorsal parts of the pattern to assess its potential visual effect. The constraints on sampling unfortunately did not allow to add another treatment.

      Overall, the fact that certain measurements only provide evidence for 1 of the 2 (sub)species (assortative mating, wing manipulation) or one sex of one of the species (bird visual systems) means overall interpretation and overgeneralization of the results to both allopatric or sympatric species should be done with care, and such nuances should ideally be discussed.

      The aim of the authors, "to investigate the antagonistic effects of selective pressures generated by mate recognition and shared predation" has not been achieved, and the conclusions regarding this aim are not supported by the results. Nevertheless, the iridescence colour measurements are solid, and some of the behavioural experiments and chemical profile measurements seem to yield interesting results. The study would benefit from less overinterpretation of the results in the framework of predation and more careful consideration of methodological difficulties, statistical insecurities, and nuances in the results.

      Overall, we would like to thank all reviewers for their thorough assessment of our work. We understand that the imbalance between mate choice data, visual model data and chemical data only gives us a partial assessment of species recognition in Morpho butterflies, thus requiring more precision in the interpretation and the discussion of our results. We made sure to add balanced interpretations in our discussion, by mentioning the lack of replicates for allopatric and sympatric populations (lines 391-392), and the lack of chemical characterization of allopatric species (lines 458361, see previous comments) and by being more transparent on methodological limitations that we failed to convey in the first version of our manuscript. We brought nuance to our discussion and also discussed alternative hypotheses to predation to explain the convergence of iridescence found in sympatry.

      Reviewing Editor Comments:

      While all reviewers acknowledge the value of your data, they converge in their recommendations to tone down the evolutionary interpretations. Ideally, to test your main hypothesis, you would need several species pairs, or if only one, as in your case, replicated sympatric and allopatric sites for both species. Furthermore, your more specific hypotheses about convergence (vs. nondivergence), response to predators (vs. other environmental variables), and avoiding interspecific mating in sympatry (vs. not avoiding it in allopatry) would require appropriate alternative treatments/controls. We therefore recommend that you focus on those statements that you can support with your experiments and data, and introduce these statements in the introduction with reference to the appropriate literature.

      Reviewer #1 (Recommendations for the authors):

      (1) Line 25: This stated aim seems a bit off. The authors did not sensu stricto quantify 'how shared adaptive traits may shape genetic divergence' in this study. I suggest rewriting or deleting this whole sentence altogether. The study's aim is already clear in lines 29-34.

      We deleted the mention of the characterization of genetic divergence, since this study did not focus on any genetic analysis.

      (2) Line 34: The authors here state that they compared allopatric vs sympatric populations. This is strictly not true for M. Achilles. Further, the results after this sentence focus solely ondivergence/convergence in sympatry, nothing at the intraspecific level and implications of the findings

      We now mention that we tested allopatric vs. sympatric species of M. helenor only (lines 28-29). We also mention that the behavioral experiments were based on intraspecific comparisons, and discuss the implications of this result in the discussion.

      (3) Line 35: 'convergence driven by predation': this is a strong statement and cannot be directly inferred from the present set of experiments. Consider toning it down.

      We added nuance to this statement by rephrasing it “suggesting that predation may favors local resemblance” (lines 32-33)

      (4) Line 36: Replace 'behavioral results' with 'behavioral experiments' or something similar.

      Corrected

      (5) Line 45-49: These opening statements need some citations.

      We provided references for the first few lines, by citing terHorst et al 2018 (line 44) underlining the importance of species interactions in trait evolution, and Blomberg et al 2003 (line 45) showing that closely-related species tend to resemble each other by quantifying the phylogenetic signal of various traits.

      (6) Line 83, 165: 'visual effect', not sure what the authors are referring to. Please rewrite.

      We defined “visual effect” as the way wing color patterns could be perceived by predators or mates. We removed mentions of “visual effect” and directly used its definition instead.

      (7) Line 105 onwards: This section of the introduction could benefit from more concise writing. The authors might consider reducing the number of specific examples and instead offering broader general statements, supported by citations from multiple studies.

      We reduced the number of examples given in this paragraph and used general statements supported by multiple citations as examples. (lines 102-119).

      (8) Line 108-110: This sentence seems to be redundant with the previous one.

      We merged this sentence with the previous one to improve clarity. (lines 103-105)

      (9) Line 140: 'with chemical defenses': include citations here.

      We added citations of Joron et al 1999 and Merrill et al 2014, which document the evolution of convergent wing patterns (mimicry) in butterfly species with chemical-defenses.

      (10) Line 149: This is a bit of a stretch. Note that genetic divergence could be influenced by many other things, not only the processes that the authors examined.

      We agree with the reviewer that the study of the convergent vs. divergent evolution of visual cues is not enough to fully understand the mechanisms allowing genetic divergence between species. Because this paper does not focus on characterizing genetic divergence, we removed it from the manuscript to avoid oversimplification.

      (11) Line 151: Again. Here, the author's primary focus seems to be at an interspecific level. One is left to wonder about the need for comparisons at the intraspecific level in M.helenor and the implications. Please clarify

      In the end of the introduction (lines 146-157), we specifically highlighted the importance of intraspecific comparisons. While studying the effect of sympatry on the evolution of the iridescent color pattern, we use this intraspecific comparison as a baseline to account for convergence or divergence of iridescence in a sympatric interspecific pair of Morpho, because under neutral evolution two subspecies are expected to be more similar than two different species (this assumption has been clarified line 147-148). We also used intraspecific mate choice to test for the use of visual cues in mate recognition (experiment 1) and to test what type of signal could be perceived by Morphos (the iridescent coloration or the iridescent pattern, experiment 2 and 3). These results help contextualize the interspecific mate choice, focused on determining whether visual cues could also be used in species recognition. Since we show that iridescent coloration is important in mate recognition at the intraspecific scale, it helps understand why species recognition is low at the interspecific scale because of wing color convergence between M. helenor and M. achilles.

      (12) Line 154: 'signals on mate preferences'.

      Corrected.

      (13) Line 189: 'At the intraspecific level', maybe in the brackets include 'allopatric populations' just so the results are in a similar format as in the color contrast section below.

      We added details to make clearer that the intraspecific level is studied between allopatric Morpho populations (line 189).

      (14) Line 189-192: Please rearrange the figure (current B as A and vice versa) or present the results in order as in the figure (interspecific first and then intraspecific level).

      We rearranged Figure 3 so that the intraspecific comparison (allopatric population) appears as A and the interspecific level (sympatric population) appears as B, to follow the order of presentation in the main text.

      (15) Line 232: The motivation behind experiments 1, 2, and 3 is unclear. The authors have not made a strong point in the introduction about the need for these comparisons at an intraspecific level. Given that the authors are focused on divergence/convergence at an interspecific level, this set of experiments seems to be irrelevant to the present study. The implications of these findings are also not discussed.

      We added motivation to the use of experiment 1, 2, and 3 in the introduction (lines 151-154) by stating that those experiments were used to assess whether blue color could indeed be used as a mating cue in Morpho helenor (experiment 1) and to try to understand what part of the visual signal is important in mate choice in Morpho helenor: the wing pattern (experiment 2) or the iridescent coloration (experiment 3). Although motivation for these experiments was not detailed in our manuscript, we already discussed the implications of the results of experiments 1, 2 and 3 in the discussion by stating that visual cues can take many forms and that considering both color AND pattern is important in understanding visual cues (lines 408-416). We carefully reworked this new version to make it more straightforward.

      (16) Line 260: Insert 'wild-type' before model to ensure similar wording as in the previous section.

      Corrected.

      (17) Line 286: Insert 'sympatric' after mimetic.

      Corrected.

      (18) Line 307: Include a reference to the figures or table where these results are presented.

      We now mention in the main text that the different proportions of beta-ocimene found between males M. helenor and M. achilles are shown in Table S2.

      (19) Line 343: These inferences are speculative. Add a line here, something like 'although this warrants further research in this species'.

      We detailed what additional experiments are needed lines 388-396.

      (20) Line 357: The authors have not discussed their results on iridescence divergence in allopatric populations (line 190) and its implications.

      We now made clear in the beginning of the discussion that the divergence of iridescence in allopatric populations is used as a baseline to test for convergent iridescence between species (lines 339-343).

      (21) Line 361 onwards: This first paragraph is a bit confusing, as the results mainly focus on allopatry, while the title refers to sympatry.

      To avoid confusion between the title and the content of the discussion, we divided the last part of the discussion into two different parts. As the first paragraph mainly focus on allopatry, we isolated it and titled it “Iridescent color patterns can be used as mate recognition cues in M. helenor” (line 498). The next paragraph of the discussion, focusing on the sympatric Morpho populations, has been titled “Evolution of visual and olfactory cues in mimetic sister-species living in sympatry” (line 418).

      (21)  Line 383: visual cues 'as' poor species.

      Corrected.

      (23) Line 405: Why females here and not males? This is again confusing since the authors tested for male mate choice in the main experiments. Some background information on sex-specific mate choice in the methods might help.

      In this specific sentence, we talk about performing mate choice experiments to test for the discrimination of olfactory cues by females (and not males) because we found a high divergence in the chemical compounds found on male genitalia. Although female chemical compounds could also be used as a cue by males in mate recognition, olfactive mate choice is often driven by female choice in butterflies. We recognize that this perspective does not line up with the mate choice presented in our results section which focused on male mate choice based on visual cues, because of ecological reasons (Morpho males tend to be attracted to bright blue colorations but not females) and technical reasons (in cages, females tend to hide away from the males or male dummies, and this behavior is not compatible with experiments involving flying around false males). In the discussion, we made sure to precise that the perspective we cite here is about testing the implications of divergence in male olfactory cues (line 454). We also added motivation to why we chose to investigate male (and not female) mate choice based on visual cues in the methods (lines 613-618) and in the results (219-223).

      (24) Line 417: This inference is speculative. Consider toning it down.

      We rewrote the sentence: “We find evidence of converging iridescent patterns in sympatry suggesting that predation could play a major role in the evolution of iridescence. Further work is nevertheless needed to directly test this hypothesis and establish the important of evasive mimicry in Morpho” (lines 465-468).

      (25) Line 429: 'Convergent trait evolution leads to mutualistic interactions enhancing coexistence'. Careful here. It is not very evident how convergent trait evolution (iridescence) is mutualistic in this case, as there is no experimental evidence for evasive mimicry yet. Consider rewording or toning this sentence down.

      We agree with the reviewer and removed this statement, only keeping the end of the sentence: “Altogether, this study addresses how convergence in one trait as a result of biotic interactions may alter selection on traits in other sensory modalities, resulting in a complex mosaic of biodiversity. (lines 479-481).

      (26) Line 442: Since the samples come from a breeding farm, I have a few questions. How are the authors sure about the location where the specimens were collected? How long have they been kept in captivity? Have they been subjected to any artificial selection? More details are needed here.

      Since M. helenor bristowi and M. helenor theodorus are only found in the wild in West and East Ecuador respectively, those M. helenor subspecies can only be collected in those two allopatric populations. Their phenotype is directly linked to their geographic repartition, this is how we made sure about their collect location. M. h. theodorus we used in this study were caught in East Ecuador in Tena, and M. h. bristowi were caught in West Ecuador in Pedro Vincente Madonado. We received pupae from the breeding farm, meaning that the Morpho used for the experiments were raised in captivity since their date of emergence. Upon emergence, they were transferred into cages for 4 to 5 days to wait for sexual maturity before performing the tetrad and mate choice experiments. This information was added to the method (lines 490-496).

      (27) Line 476: Include some citations supporting this statement.

      We now cite Bennett and Théry (2007), reviewing avian color vision, and Briscoe (2008), characterizing the sensitivity of the photoreceptors found in the eyes of butterflies. Both citations show that the 300-700nm range is seen by avian and butterfly visual systems.

      (28) Line 480 onwards: Please clarify if the analysis used only one value (mean?) per species, sex, angle of measurement, and locality or included data from multiple individuals.

      The analyses of both colorimetric variables and global iridescence were performed using iridescence data from multiple individuals (10 males and 10 females from M. h. bristowi, M. h. theodorus, M. h. helenor and M. a. achilles), for which we measured iridescence at 21 angles of illumination. Sampling size are mentioned lines 507, 515, 540-542.

      (29) Line 510: Is there a specific reason that authors did not investigate achromatic contrasts? Provide some justification here. Or include the results of achromatic contrasts in the supplement.

      We added the achromatic results in the supplement and in the results (lines 200-204). For both the avian visual model and the Morpho visual model, the confidence intervals always overlapped with the JND threshold, showing that neither birds nor butterflies could theoretically discriminate the wing reflectance brightness in allopatric and sympatric populations.

      (30) Line 552 onwards: I may have missed it. It is not entirely clear why the authors focused on male mate choice rather than female preference for visual cues. The authors should explicitly justify this choice and cite previous studies demonstrating that male mate choice, rather than female preference, is important in this species. This should be stated in the results section as well.

      We added a paragraph in the method (lines 613-618) to describe the ecological and technical reasons leading to testing only male mate choice using visual cues (also see our response to recommendation #23).

      (31) Line 537 onwards: What was the criterion used to score that mating had occurred? Why first mating and not how long they were mating? Please add these details.

      We stopped the experiment as soon as a male/female pair was formed by joining their genitalia (we added this information in the method lines 599-600). Since the tetrad experiment involves the interaction of two males and two females from different subspecies, we considered that mate choice happened before the formation of any couple, and is not necessarily dependent on how long they mate by observing their mating behavior. For instance, we witnessed avoidance behaviors from females that systematically hide their genitalia and refused to join their abdomen to some males, while being very ‘open’ to others (but did not quantify it).  

      (32) Line 571: The authors used a black permanent marker to modify wing patterns but did not validate whether butterflies perceive these modifications as equivalent to natural coloration. It is possible that the alterations introduced unintended visual cues and may explain why most males rejected the dummies (line 267). The authors should acknowledge this limitation here.

      We now acknowledge this limitation in the method (lines 638-639) and in the results section (lines 278-283).

      (33) Line 591: Insert 'above' after protocol.

      Corrected.

      (34) Line 605: If the authors included random effects in their model, then it should be generalized linear mixed model (GLMM) and not GLM as they wrote.

      We indeed included a random effect in our model accounting for male ID and trial number, we thus replaced “GLM” by “GLMM” in the manuscript.

      (35) Line 615: This set of analyses does not seem to account for pseudo-replication, as the data were recorded from the same male more than once (Line 583). Please clarify and redo the analysis with the GLMM framework

      We run new analyses using the GLMM framework: we used a binomial GLMM to test whether individuals preferentially interacted with dummy 1 vs. dummy 2 while accounting for pseudoreplication. The previously detected tendencies hold true with these new analyses, except for the visual mate discrimination of M. achilles: we now find statistical evidence that M. achilles tend to approach more their conspecifics during the mate choice experiment, although the signal is weak (line 297-307). Indeed, while we previously concluded that both species in sympatry (M. helenor and M. achilles) could not discriminate their conspecific mates, we now emphasize that M. achilles is somewhat sensitive to some visual signals. However, its estimated probability of approaching a conspecific is only 0.54, which is low compared to the estimated probability of approaching (0.61) or touching (0.84) a con-subspecific for M. bristowi. We thus concluded that even though some visual cues could be relevant for mate recognition, they are less reliable for male choice in sympatric populations were color patterns are more convergent, compared to allopatric populations. We thus updated Figure 4 and Figure S8 and S9, which are now picturing the probability of approaching or touching a conspecific or con-subspecific with the updated pvalues retrieved from the GLMM analyses. We also updated the results (line 297-307) and the discussion (lines 430-438) to bring nuance to our previous results.  

      (36) Line 963: Figure 3D. Is there a particular reason for comparing allopatric populations only within Ecuador rather than between Ecuador and French Guiana for M. helenor? Please clarify.

      We aimed at comparing the putative discrimination of blue coloration using visual models vs. what the butterflies actually discriminate using mate choice experiments. Since we only performed mate choice experiments involving M. h. bristowi x M. h. theodorus (allopatric populations within Ecuador) and M. h. helenor x M. a. achilles (sympatric population from Ecuador), we only looked at those comparisons using visual models. We added this precision lines (559-560).

      (37) Line 980: Are these predicted probabilities or just mean proportions as written in line 614? Then the label should be changed to 'Proportion of approaches' or something similar.

      Following our answer to recommendation #35, the points now represent the probability of touching a conspecific in the graph for each male, for every trial of every male tested. We corrected the legend of the figure. 

      Reviewer #2 (Recommendations for the authors):

      (1) Line 25: "...therefore facilitating co-existence in sympathy".

      Corrected.

      (2) Line 28: "contrasting" instead of contrasted.

      Corrected.

      (3) Line 33: begin a new sentence at the colon.

      Corrected.

      (4) Line 49: the phrase "habitat filtering" is unclear and should perhaps be defined or qualified.

      We replaced “habitat filtering” by its definition and cited Keddy (1992), describing the community assembly rules and defining habitat filtering (line 46)

      (5) Line 52: remove "even".

      Corrected.

      (6) Line 53: divergent suites may also result because traits are often constrained by genetic architecture (multivariate genetic covariances). This is discussed at length and specifically in relation to ornamental coloration by Kemp et al. 2023

      We rewrote the introduction and focused on only reviewing the ecological interactions promoting trait divergence in sympatric species, and did not mention genetics in this paper.

      (7) Line 87: (and throughout) refer to "colouration" or "colour pattern" rather than "colourations".

      Corrected.

      (8) Line 151: Remove "To do so,".

      Corrected.

      (9) Line 191: I would like to see the degrees of freedom for this test.

      We added the F-statistic=2.09 and the degrees of freedom df=1 of this test, and for all the following tests.

      (10) Line 201: (and throughout) replace "on" with "of".

      Corrected.

      (11) Line 205: modelling the visual properties of the wings allows one to infer what is theoretically visible/distinguishable. The modelling is useful but not necessarily definitive of vision/behaviour per se under different conditions in the wild. I therefore think it is appropriate to phrase the wording around the modelling approach more carefully. Perhaps refer to "theoretical" or "inferred" discriminability, or state (e.g.) that species should/should not be capable of perceiving differences based on the modelling data. You do this well in your wording of lines 207-209. This need not apply in the discussion because you're then dealing with the combination of modelling results and behaviour (mating trials).

      We agree with the reviewer that visual modelling only allows to infer what is theoretically discriminated by the butterflies, and that the wording of our sentence is confusing. We therefore modified the sentence to account for those precisions: “Morpho butterflies and predators can theoretically visually perceive the difference in the blue coloration between different subspecies of M. helenor…… using both bird and Morpho visual models” (line 206-209).

      (12) Line 222: Either the chi-square test or Fisher's exact test should be sufficient (why report both?)

      Chi-square test relies on large-sample assumptions (expected counts>5) whereas Fischer’s exact test does not and is valid even with small or unbalanced sample sizes. Since the M. bristowi female/M. h. theodorus male paring only occurred 3 times, we do not meet the primary assumptions to apply a Chi-square test, although it is significant. We used a Fischer’s test to confirm the results. Using both and finding that both tests are significant shows that the results are robust, although they may appear redundant. To simplify, we remove the results of the Chisquare test and only keep the Fisher’s test in the methodology and the results.

      (13) Line 224 (and throughout): Degrees of freedom should be provided for statistical tests.

      We reported the statistic value and the degrees of freedom for all mentions of the statistical tests in the main text, except for the Fischer test which does not rely on an asymptotic distribution like the Chi-squared distribution as it is an exact test.

      (14) Lines 266-267: This sentence has interest, but it is rather vague at present. Wouldn't your controls account for the effect of manipulation? This could be explained further.

      During our mate choice experiments, all Morpho female dummies used for the experiments were painted with black markers, either on their dorsal blue band to modify their blue iridescent phenotype, or on their ventral side, thus controlling for the effect of manipulation. However, we cannot rule out that the modification of the dorsal blue iridescence could have had a “repulsive” effect for males for several reasons. For example, depending on the visual discrimination of darker colors by Morphos, the painted black band could have a slightly different color compared to the dark “brown” usually surrounding their blue iridescent patterns. We now explain this in the results (lines 278-283) and in the methodology (lines 638-639)  

      (15) Line 316: I'm not certain that the similarity is best described as "striking", given a P-value of 0.084 for this contrast

      We agree with the reviewer and removed this adjective for this line.

      (16) Lines 387-390: This sentence is puzzling because, theoretically speaking, we should expect selection on visual preference to be heightened (not relaxed) in sympatry if colouration isincluded among the traits used in mate selection. I'm not certain I have understood the meaning here.

      We would like to thank the reviewer for pointing out this typo. If shared predatory pressures favors convergent evolution of color pattern, then the visual signals become less reliable for species recognition. As a result, sexual selection on visual preference is heightened and becomes stronger, favoring the evolution of alternative cues used to discriminate conspecific mates. We changed the sentence and now write “the convergent evolution of iridescent wing patterns… may have negatively impact visual discrimination and favored the evolution of divergent olfactory cues” (lines 457-458).

      (17) Line 529: Mating experiments. Given that these are quite large butterflies, I wondered whether a 3x3x2m cage would be sufficient in size to allow the expression of male courtship. A brief description of the courtship behaviour in these species or Morphos generally would be a useful addition to the paper.

      A cage this size was enough for the males to express a flight behavior similar to what can be seen in nature, while also being able to see the females (live females or dummies). We tried to perform mate experiments in a larger cage (7m x 5m x 3m) but the trials were not conclusive because male did not find the dummies depending on where they were flying in the cage. A 3mx3mx2m cage is a good compromise maximizing interactions while still allowing enough space to fly. We now describe Morpho male behavior and female behavior in the methods (lines 613-618).

      (18) Line 546: Why are both tests needed (chi-square AND Fisher's exact)?

      Similarly to our answer on recommendations #12, were used both tests to show robustness in the statistical results. We only kept the Fisher’s test results to simplify the results.

    1. eLife Assessment

      This study presents important information about the role of mu opioid receptors in neurotransmission between the medial habenula and the interpeduncular nucleus. The authors provide convincing evidence that mu opioid receptor activation has differential effects on transmission from substance P neurons and cholinergic neurons, and that blockade of potassium channels can unmask a nicotinic cholinergic synaptic response. This work will be of high interest to those studying this brain region, and potentially to the larger neuroscience community studying motivated behavior.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors demonstrate for the first time that opioid signaling has opposing effects on the same target neuron depending on the source of the input. Further, the authors provide evidence to support the role of potassium channels in regulating a brake on glutamatergic and cholinergic signaling, with the latter finding being developmentally regulated and responsive to opioid treatment. This evidence solves a conundrum regarding cholinergic signaling in the interpeduncular nucleus that evaded elucidation for many years.

      Strengths:

      This manuscript provides 3 novel and important findings that significantly advance our understanding of the medial habenula-interpeduncular circuitry:

      (1) Mu opioid receptor activation (mOR) reduces postsynaptic glutamatergic currents elicited from substance P neurons while simultaneously enhancing postsynaptic glutamatergic currents from cholinergic neurons, with the latter being developmentally regulated.

      (2) Substance P neurons from the Mhb provide functional input to the rostral nucleus of the IPN, in addition to the previously characterized lateral nuclei.

      (3) Potassium channels (Kv1.2) provide a break on neurotransmission in the IPN,

      The findings here suggest that the authors have identified a novel mechanism for the normal function of neurotransmission in the IPN, so it would be expected to be observable in almost any animal. In the revised manuscript, the authors put forth significant effort to increase the n, thus increasing the confidence in the observations.

      There are also significant sex differences in nAChR expression in the IPN that might not be functionally apparent using the low n presented here. In the revised manuscript, the authors increased the n, and provided data to the reviewers that no significant sex differences were apparent, although there was a trend. Future studies should examine sex differences in detail.

      There are also some particularly novel observations that are presented but not followed up on, and this creates a somewhat disjointed story. For example, in Figure 2, the authors identify neurons in which no response is elicited by light stimulation of ChAT-neurons, but application of DAMGO (mOR agonist) un-silences these neurons. Are there baseline differences in the electrophysiological or morphological properties of these "silent" neurons compared to the responsive neurons? In the revised manuscript, the authors directly tested this with new experiments in SST+ neurons in the IPN, demonstrating convincingly that mOR activation unsilences these neurons.

      With the revisions, the authors have addressed the reviewers' concerns and significantly improved the manuscript. I find no further weaknesses.

    3. Reviewer #2 (Public review):

      Summary:

      In this paper, Chittajallu and colleagues present compelling evidence that mu opioid receptor (MOR) activation can potentiate synaptic neurotransmission in a medial habenula to interpeduncular nucleus (mHb-IPN) subcircuit. While, projections from mHb tachykinin 1 (Tac1) neurons onto lateral IPN neurons show a canonical opioid-induced synaptic depression in glutamate release, excitatory neurotransmission in mHb choline acetyltransferase (ChAT) projections to the rostral IPN is potentiated by opioids. This function emerges around age P27 in mice, when MOR expression in the IPN peaks.

      Strengths:

      Carefully executed electrophysiological experiments with appropriate controls. Interesting description of a neurodevelopmental change in the effects of opioids on mHb-IPN signaling.

      Weaknesses:

      A minor concern is that the genetic strategy used to target the mHb-IPN pathway (constitutive ChR2 expression in all ChAT+ and Tac1+ neurons) might not specifically target this projection. Future studies are needed to examine the precise mechanism whereby MOR signaling can potentiate glutamatergic neurotransmission in ChAT+ MHb-IPN projections."

    4. Reviewer #3 (Public review):

      Summary:

      Here the authors describe the role of mORs in synaptic glutamate release from substance P and cholinergic neurons in the medial habenula to interpeduncular nucleus (IPN) circuit in adult mice. They show that mOR activation reduces evoked glutamate release from substance P neurons yet increases evoked glutamate release and Ach release from cholinergic neurons. Unlike glutamate release, Ach release is only detected when potassium channels are blocked with 4-AP or dendrotoxin. The authors also report a previously unidentified glutamatergic input to IPR from SP neurons and describe the developmental timing of mOR- facilitation in adolescent mice.

      Strengths:

      - The experiments provide new insight into the role of mORs in controlling evoked glutamate release in a circuit with high levels of mORs and established roles in relevant behaviors.

      - The experiments are rigorous, and the results are clear cut. The conclusions are supported by the data.

      - The findings will be of interest to those working in the field of synaptic transmission and those interested in the function of the medial habenula or interpeduncular nucleus, as well as those seeking to understand the role of opioids on normal and pathological behaviors.

      Weaknesses:

      - The mechanistic underpinnings of these interesting and novel results are not pursued.

    1. eLife Assessment

      This important study elucidates the role of the exocyst component EXOC6A at distinct stages of ciliogenesis, which advances our understanding of ciliary membrane remodeling and cilium formation. The authors provide solid evidence that EXOC6A interacts with myosin-Va and is dynamically recruited via dynein-, microtubule-, and actin-dependent mechanisms, to support proper formation of the ciliary membrane. The study will be of interest to cell biologists and other researchers interested in vesicular trafficking, organellar membrane dynamics, and ciliogenesis.

    2. Reviewer #1 (Public review):

      Summary:

      The study by Lin et al. studies the role of EXOC6A in ciliogenesis and its relationship with the interactor myosin-Va using a range of approaches based on the RPE1 cell line model. They establish its spatio-temporal organization at centrioles, the forming ciliary vesicle and ciliary sheath using ExM, various super-resolution techniques, and EM, including correlative light and electron microscopy. They also perform live imaging analyses and functional studies using RNAi and knockout. They establish a role of EXOC6A together with myosin-Va in Golgi-derived, microtubule- and actin-based vesicle trafficking to and from the ciliary vesicle and sheath membranes. Defects in these functions impair robust ciliary shaft and axoneme formation due to defective transition zone assembly.

      Strengths:

      The study provides very high-quality data that support the conclusions. In particular, the imaging data is compelling. It also integrates all findings in a model that shows how EXOC6A participates in multiple stages of ciliogenesis and how it cooperates with other factors.

      Weaknesses:

      The precise role of EXOC6A remains somewhat unclear. While it is described as a component of the exocyst, the authors do not address its molecular functions and whether it indeed works as part of the exocyst complex during ciliogenesis.

    3. Reviewer #2 (Public review):

      Summary:

      The molecular mechanisms underlying ciliogenesis are not well understood. Previously, work from the same group (Wu et al., 2018) identified myosin-Va as an important protein in transporting preciliary vesicles to the mother vesicles, allowing for initiation of ciliogenesis. The exocyst complex has previously been implicated in ciliogenesis and protein trafficking to cilia. Here, Lin et al. investigate the role of exocyst complex protein EXOC6A in cilia formation. The authors find that EXOC6A localizes to preciliary vesicles, ciliary vesicles, and the ciliary sheath. EXOC6A colocalizes with Myo-Va in the ciliary vesicle and the ciliary sheath, and both proteins are removed from fully assembled cilia. EXOC6A is not required for Myo-Va localization, but Myo-VA and EHD1 are required for EXOC6A to localize in ciliary vesicles. The authors propose that EXOC6A vesicles continually remodel the cilium: FRAP analysis demonstrates that EXOC6A is a dynamic protein, and live imaging shows that EXOC6A fuses with and buds off from the ciliary membrane. Loss of EXOC6A reduces, but does not eliminate, the number of cilia formed in cells. Any cilia that are still present are structurally abnormal, with either bent morphologies or the absence of some transition zone proteins. Overall, the analyses and imaging are well done, and the conclusions are well supported by the data. The work will be of interest to cell biologists, especially those interested in centrosomes and cilia.

      Strengths:

      The TEM micrographs are of excellent quality. The quality of the imaging overall is very good, especially considering that these are dynamic processes occurring in a small region of the cell. The data analysis is well done and the quantifications are very helpful. The manuscript is well-written and the final figure is especially helpful in understanding the model.

      Weaknesses:

      Additional information about the functional and mechanistic roles of EXOC6A would improve the manuscript greatly.

    4. Reviewer #3 (Public review):

      Summary:

      Lin et al report on the dynamic localization of EXOC6A and Myo-Va at pre-ciliary vesicles, ciliary vesicles, and ciliary sheath membrane during ciliogenesis using three-dimensional structured illumination microscopy and ultrastructure expansion microscopy. The authors further confirm the interaction of EXOC6A and Myo-Va by co-immunoprecipitation experiments and demonstrated the requirement of EHD1 for the EXOC6A-labeled ciliary vesicles formation. Additional experiments using gene-silencing by siRNA and pharmacological tools identified the involvement of dynein-, microtubule-, and actin in the transport mechanism of EXOC6A-labeled vesicles to the centriole, as they have previously reported for Myo-Va. Notably, loss of EXOC6A severely disrupts ciliogenesis, with the majority of cells becoming arrested at the ciliary vesicle (CV) stage, highlighting the involvement of EXOC6A at later stages of ciliogenesis. As the authors observe dynamic EXOC6A-positive vesicle release and fusion with the ciliary sheath, this suggests a role in membrane and potentially membrane protein delivery to the growing cilium past the ciliary vesicle stage. While CEP290 localization at the forming cilium appears normal, the recruitment of other transition zone components, exemplified by several MKS and NPHP module components, was also impaired in EXOC6A-deficient cells.

      Strengths:

      (1) By applying different microscopy approaches, the study provides deeper insight into the spatial and temporal localization of EXOC6A and Myo-Va during ciliogenesis.

      (2) The combination of complementary siRNA and pharmacological tools targeting different components strengthens the conclusions.

      (3) This study reveals a new function of EXOC6A in delivering membrane and membrane proteins during ciliogenesis, both to the ciliary vesicle as well as to the ciliary sheath.

      (4) The overall data quality is high. The investigation of EXOC6A at different time points during ciliogenesis is well schematized and explained.

      Weaknesses:

      (1) Since many conclusions are based on EXOC6A immunostaining, it would strengthen the study to validate antibody specificity by demonstrating the absence of staining in EXOC6A-deficient cells.

      (2) While the authors generated an EXOC6A-deficient cell line, off-target effects can be clone-specific. Validating key experiments in a second independent knockout clone or rescuing the phenotype of the existing clone by re-expressing EXOC6A would ensure that the observed phenotypes are due to EXOC6A loss rather than unintended off-target effects.

      (3) Some experimental details are lacking from the materials and methods section. No information on how the co-immunoprecipitation experiments have been performed can be found. The concentrations of pharmacological agents should be provided to allow proper interpretation of the results, as higher or lower doses can produce nonspecific effects. For example, the concentrations of ciliobrevin and nocodazole used to treat RPE1 cells are not specified and should be included. More precise settings for the FRAP experiments would help others reproduce the presented data. Some details for the siRNA-based knockdowns, such as incubation times, can only be found in the figure legends.

      Taken together, the authors achieved their goal of elucidating the role of EXOC6A in ciliogenesis, demonstrating its involvement in vesicle trafficking and membrane remodeling in both early and late stages of ciliogenesis. Their findings are supported by experimental evidence. This work is likely to have an impact on the field by expanding our understanding of the molecular machinery underlying cilia biogenesis, particularly the coordination between the exocyst complex and cytoskeletal transport systems. The methods and data presented offer valuable tools for dissecting vesicle dynamics and cilium formation, providing a foundation for future research into ciliary dysfunction and related diseases. By connecting vesicle trafficking to structural maturation of an organelle, the study adds important context to the broader description of cellular architecture and organelle biogenesis.

    1. eLife Assessment

      This valuable study investigates the role of HIF1a signaling in epicardial activation and neonatal heart regeneration in mice. Using a combination of genetic and pharmacological approaches, the authors demonstrate that stabilization of HIF1a enhances epicardial activation and extends the regenerative capacity of the heart beyond the typical neonatal window following myocardial infarction. The main conclusion is well supported by solid data, although some minor concerns regarding experimental interpretation require further clarification to ensure accuracy.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript by Gamen et al. analyzed the functional role of HIF signaling in the epicardium providing evidence that stabilization of the hypoxia signaling pathway might contribute to neonatal heart regeneration. By generating different conditionally mouse mutants and performing pharmacological interventions, the authors demonstrate that stabilizing HIF signaling enhances cardiac regeneration after MI in P7 neonatal hearts.

      Strengths:

      The study presents convincing genetic and pharmacological approaches on the role of hypoxia signaling enhance the regenerative potential of the epicardium

      Weaknesses:

      The major weakness remains the lack of convincing evidence demonstrating the role of hypoxia signaling in EMT modulation in the epicardial cells. The authors claimed that EMT assays adopted in this study are based on similar previous studies. Surprisingly, two of the references provided correspond to their own research group (PMID: 17108969, PMID: 19235142), limiting the credit for such claims, and the other two (PMID: 27023710, PMID: 12297106) assessment of cell migration but not EMT is reported. Thus, EMT remains to be convincingly demonstrated.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, Gamen et al. investigated the roles of hypoxia and HIF1a signaling in regulating epicardial function during cardiac development and neonatal heart regeneration. The authors identified hypoxic regions in the epicardium during development and demonstrated that genetic and pharmacological stabilization of HIF1a during neonatal heart injury prolonged epicardial activation, preserved myocardium, enhanced infarct resolution, and maintained cardiac function beyond the normal postnatal regenerative window.

      Strengths:

      HIF1a signaling was manipulated in an epicardium-specific manner using appropriate genetic tools.

      Weaknesses:

      Some conclusions still need clarification.

      Comments on revisions:

      (1) The authors' comment on the partial overlap of HP1 and HIF1a IF signals (HIF1a is highly unstable ... broader regions of hypoxia) is reasonable and would help readers interpret the data if included in the text describing Fig. 1.

      (2) The conclusion regarding WT1+ cells in Fig. 2a and b remains unclear. Both panels display larger and smaller magenta cells, and when all are taken into account, the overall number does not appear substantially different. Additional clarification is needed on how the quantification was performed.

      (3) Regarding Figure 6-figure supplement 1c, it seems difficult to conclude the endothelial identity of WT1+ cells based on EMCN staining, as the markers do not overlap. The authors note that WT1 is upregulated in endothelial cells, but this has been reported in the context of injury, which differs from the context of the present study involving Molidustat.

    4. Reviewer #3 (Public review):

      Summary:

      The author's research here was to understand the role of hypoxia and hypoxia-induced transcription factors Hif-1a in the epicardium. The authors noted that hypoxia was prevalent in the embryonic heart and this persisted into neonatal stages until post natal day 7 (P7). Hypoxic regions in the heart were noted in the outer layer of the heart and expression of Hif-1a coincided with the epicardial gene WT1. It has been documented that at P7, the mouse heart cannot regenerate after myocardial infarction and the authors speculated that the change in epicardial hypoxic conditions could play a role in regeneration. The authors then used genetic and pharmacological tools to increase the activity of Hif genes in the heart and noted that there was a significant improvement in cardiac function when Hif-1a was active in the epicardium. The authors speculated that the presence of Hif-1a improved cell survival.

      Strengths:

      A focus on hypoxia and its effects on the epicardium in development and after myocardial infraction. This study outlines a potential to extend the regenerative time window in neonatal mammalian hearts.

      Weaknesses:

      While the observations of improved cardiac function is clear, the exact mechanism of how increased Hif-1a activity causes these effects is not completely revealed. The authors mention improved myocardium survival, but do not include studies to demonstrate this.

      There is an indication that fibrosis is decreased in hearts where Hif activity is prolonged, but there are no studies to link hypoxia and fibrosis.

      Comments on revisions:

      In the manuscript revision, the authors address my comments. They outline differences between genetic disruption of Phd2 and chemical inactivation could be due to dosing and drug half-life of Molidustat. The other comments are addressed by explaining that they have analyzed enough heart sections and hearts to come to their conclusions. The authors also state they cannot generate more numbers for this study, therefore I accept their conclusions as stated.

    5. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This valuable study investigates the role of HIF1a signalling in epicardial activation and neonatal heart regeneration in mice. Through a combination of genetic and pharmacological approaches, the authors show that stabilization of HIF1a enhances epicardial activation and extends the regenerative capacity of the heart beyond the typical neonatal window following myocardial infarction (MI). However, several aspects of the study remain incomplete and would benefit from further clarification and additional experimental support to solidify the conclusions.

      We reveal herein prolonged epicardial activation following myocardial infarction (MI) beyond post-natal days 1-7 (P1-P7) by genetic or pharmacological stabilisation of HIF-signalling. This extends the so-called “regenerative window” during an adult-like response to injury, leading to enhanced survived myocardium and functional improvement of the heart, even against a backdrop of persistent, albeit reduced, fibrosis. The epicardium is known to enhance cardiomyocyte proliferation and myocardial growth during heart development via trophic growth factor (for example, IGF-1, FGF, VEGF, TGFβ and BMP) signalling (reviewed in PMID:29592950) and epicardium-derived cell-conditioned medium reduces infarct size and improves heart function (PMID: 21505261). Further experiments, outside of the scope of the current study, are required to determine whether activated neonatal epicardium elicits similar paracrine support to sustain the myocardium and heart function after injury beyond P7 into adulthood.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript by Gamen et al. analyzed the functional role of HIF signaling in the epicardium, providing evidence that stabilization of the hypoxia signaling pathway might contribute to neonatal heart regeneration. By generating different conditionally mouse mutants and performing pharmacological interventions, the authors demonstrate that stabilizing HIF signaling enhances cardiac regeneration after MI in P7 neonatal hearts.

      Strengths:

      The study presents convincing genetic and pharmacological approaches to the role of hypoxia signaling in enhancing the regenerative potential of the epicardium.

      Weaknesses:

      The major weakness is the lack of convincing evidence demonstrating the role of hypoxia signaling in EMT modulation in epicardial cells. Additionally, novel experimental approaches should be performed to allow for the translation of these findings to the clinical arena.

      We respectfully disagree that we have not convincingly demonstrated a role for HIF-signalling in promoting epicardial EMT. We adopt epicardial explant assays utilising a well characterised ex vivo protocol previously described for studying EMT in embryonic, neonatal and adult epicardium (PMID: 27023710, PMID: 12297106; PMID: 17108969, PMID: 19235142). These assays demonstrate in WT1<sup>CreERT2</sup>;Phd2<sup>fl/fl</sup> explants enhanced cobblestone to spindle-like change in cell morphology, increased cell migration, appearance of stress fibres and an up-regulation of the mesenchymal marker alpha-smooth muscle actin (αSMA); all parameters associated with EMT. In addition, our in vivo analyses of Wt1<sup>CreERT2</sup>;Phd2<sup>fl/fl</sup> hearts, in response to neonatal injury, reveal elevated numbers of WT1+ epicardial cells within the sub-epicardial region and underlying myocardium as is associated with active EMT and subsequent migration from the epicardium.

      Reviewer #2 (Public review):

      Summary:

      In this study, Gamen et al. investigated the roles of hypoxia and HIF1a signaling in regulating epicardial function during cardiac development and neonatal heart regeneration. They found that WT1<sup>+</sup> epicardial cells become hypoxic and begin expressing HIF1a from mid-gestation onward. During development, epicardial HIF1a signaling regulates WT1 expression and promotes coronary vasculature formation. In the postnatal heart, genetic and pharmacological upregulation of HIF1a sustained epicardial activation and improved regenerative outcomes.

      Strengths:

      HIF1a signaling was manipulated in an epicardium-specific manner using appropriate genetic tools.

      Weaknesses:

      There appears to be a discrepancy between some of the conclusions and the provided histological data. Additionally, the study does not offer mechanistic insight into the functional recovery observed.

      We respectfully disagree with the comment that our histological data does not support our conclusions and expand on this in the response to specific reviewer comments. We agree that further mechanistic experiments outside of the scope of the current study are required to identify precisely how activated neonatal epicardium results in increased healthy myocardium after injury beyond post-natal day 7 (P7).

      Reviewer #3 (Public review):

      Summary:

      The authors' research here was to understand the role of hypoxia and hypoxia-induced transcription factor Hif-1a in the epicardium. The authors noted that hypoxia was prevalent in the embryonic heart, and this persisted into neonatal stages until postnatal day 7 (P7). Hypoxic regions in the heart were noted in the outer layer of the heart, and expression of Hif-1a coincided with the epicardial gene WT1. It has been documented that at P7, the mouse heart cannot regenerate after myocardial infarction, and the authors speculated that the change in epicardial hypoxic conditions could play a role in regeneration. The authors then used genetic and pharmacological tools to increase the activity of Hif genes in the heart and noted that there was a significant improvement in cardiac function when Hif-1a was active in the epicardium. The authors speculated that the presence of Hif-1a improved cell survival.

      Strengths:

      A focus on hypoxia and its effects on the epicardium in development and after myocardial infarction. This study outlines the potential to extend the regenerative time window in neonatal mammalian hearts.

      We thank the reviewer for this positive endorsement and recognition of the importance of mechanistic insight into how to extend the window of neonatal heart regeneration.

      Weaknesses:

      While the observations of improved cardiac function are clear, the exact mechanism of how increased Hif-1a activity causes these effects is not completely revealed. The authors mention improved myocardium survival, but do not include studies to demonstrate this.

      We report an increase in healthy myocardium arising from prolonged activation of the epicardium during the neonatal window and following injury at post-natal day 7 (P7). We speculate this recapitulates the role of the epicardium during heart development which is known to be a source of trophic growth factors that can enhance myocardial growth. Further experiments are required, out-of-scope of this study, to define a mechanistic link between HIF-signalling, epicardial activation and myocardial survival in the setting of prolonged neonatal heart regeneration.

      There is an indication that fibrosis is decreased in hearts where Hif activity is prolonged, but there are no studies to link hypoxia and fibrosis.

      We believe the decreased fibrosis is a natural consequence of the increase in survived myocardium arising from the activated epicardium. There is strong precedent here following injury at post-natal day 1 (P1) in which fibrosis is evident early-on but is resolved over time with growth of the myocardium in the regenerating heart (PMID: 23248315).

      Recommendations for the authors:

      Reviewing Editor Comments:

      (1) Address issues related to image quality, colocalization, sample labeling, appropriate controls, and quantification - particularly in Figures 1, 2, 6, and Supplementary Figure 9. Increase sample size as noted by reviewers.

      The issues of co-localisation and sample labelling have been addressed under response to reviewers. We are unable to increase sample numbers but have clarified the number of regions per section and numbers of sections per heart analysed where appropriate.

      (2) Clarify the effects of epicardial HIF1a activation on neovascularization.

      We have removed reference in the abstract to an effect on neovascularisation.

      (3) Extend assessments of epicardial hypoxia and HIF1a expression to earlier embryonic stages, when epicardial EMT is more active.

      Our earliest timepoint of E12.5 marks the onset of epicardial EMT and E13.5 is the stage with the most significant mobilisation of epicardium-derived cells (EPDCs) into the sub-epicardial region and underlying myocardium (PMID: 32359445). In the same study, E11.5 lineage tracing of epicardial cells is restricted to outer layer of the heart; thus, our timepoints are representative in capturing both the onset and progression of in vivo EMT.

      (4) Strengthen EMT assays and mechanistic modeling. Provide evidence from physiologically relevant models, as current 2D culture assays do not adequately support conclusions about EMT. Include additional EMT markers and quantification where appropriate.

      We respectfully disagree that epicardial explants are not a valid assay for assessing EMT. As noted under responses to reviewers, such primary explants have been widely described elsewhere (PMID: 27023710, PMID: 12297106; PMID: 17108969, PMID: 19235142) and enable documentation of multiple parameters that are associated with active EMT, including an assessment of the extent of cell migration, cobblestone (epithelial) to spindle-like (mesenchymal) cell morphologies, stress fibre formation and expression of alpha-smooth muscle actin as a mesenchymal marker. We support our findings in explants by revealing reduced WT1+ epicardium-derived cells (EPDCs) in the sub-epicardial region and underlying myocardium of WT1<sup>CreERT2/+</sup>;Hif1a<sup>fl/fl</sup> embryonic hearts (data in Figure 2) indicative of impaired epicardial EMT and migration of EPDCs and in vivo following neonatal MI with pharmacological inhibition of PHD2, where we observe the reciprocal phenotype of increased numbers of epicardium-derived cells emerging from the outer epicardial layer (data in Figure 6).

      (5) Strengthen mechanistic insights into the role of epicardial cells in the functional recovery observed in MI hearts.

      We agree that further experiments are required, out-of-scope of this study, to define a mechanistic link between HIF-signalling, epicardial activation and myocardial survival in the setting of prolonged neonatal heart regeneration.

      Reviewer #1 (Recommendations for the authors):

      The manuscript by Gamen et al. analyzed the functional role of HIF signaling in the epicardium, providing evidence that stabilization of the hypoxia signaling pathway might contribute to neonatal heart regeneration. By generating different conditionally mouse mutants and performing pharmacological interventions, the authors demonstrate that stabilizing HIF signaling enhances cardiac regeneration after MI in P7 neonatal hearts. The study is potentially interesting, but it presents several major caveats.

      (1) One of the critical points reported in the early stages of this study is the early co-localization of Wt1, the hypoxic report (HP1), and HIF signaling pathways master regulators (i.e., HIF1a and HIF1b) during embryonic development. Figure 1 is meant to report such findings. However, unfortunately, I hardly see any co-localization at all in the Wt1+ epicardial cells for HP1, with some colocalization is seen for HIF1 and 2 alpha, although none of these data are quantified. Thus, it is hard to believe such co-localization.

      We respectfully disagree with this comment. We highlight cells in Figure 1 that are co-stained for WT1+ and HP1. In addition, we identify HIF1-α and HIF2- α positive cells which either reside within the epicardium, as the outer cell layer, or within the underlying sub-epicardial region, respectfully.

      (2) The authors claimed that they have analyzed the expression of the hypoxic report, as well as Wt1 and the HIF signaling pathways master regulators (i.e., HIF1a and HIF1b) in the AV groove, as compared to the apex, in embryonic heart ranging from E12.5 to E18.5 (Figure 1). Unfortunately, all images provided that are tagged as AV groove are rather misleading. They do not represent the AV groove but part of the right ventricular free wall. If the authors want to refer to the AV groove, AV cushions should be visible underneath.

      We have removed specific reference to the AV groove and refer to the highlighted regions as the “Base” of the heart.

      (3) The authors analyzed the hypoxic condition of the developing heart from E12.5 to E18.5. However, it remains unclear why the authors only explored the hypoxic conditions from E12.5 onwards, since epicardial EMT mainly occurs earlier than this time point, i.e., E10.5 onwards. Therefore, it would be needed to explore it already at this earlier time point.

      We respectfully disagree with the reviewer and refer to the comment above regarding the fact that E12.5 marks the onset of epicardial EMT and E13.5 is the stage with the most significant mobilisation of epicardium-derived cells (EPDCs) into the sub-epicardial region and underlying myocardium (PMID: 32359445).

      (4) The authors reported a conditional mouse model of HIF1alpha deletion by using the Wt1CreERT2 driver. Curiously, Wt1 is dependent on hypoxia signaling (i.e., HIF1a). Therefore, it is unclear whether there is a negative feedback loop between the deletion of Hif1alpha and the activation of the Cre driver might have functional consequences. Convincing evidence should be provided that such crosstalk does not interfere with Hif1alpha inactivation, and therefore, appropriate controls should be run in parallel.

      We discount a negative feedback loop in this instance based on the fact we have utilised heterozygous mice for the WT1<sup>CreERT2/+</sup> line and observe a consistent and reproducible phenotype for the developing hearts on a Wt1<sup>CreERT2/+</sup>;Hif1a<sup>fl/fl</sup> background and following injury in Wt1<sup>CreERT2/+</sup>;Phd2<sup>fl/fl</sup> mice. Collectively this indicates that the WT1-CreERT2 driver is active in the context of diminishing HIF-1α and Phd2, respectively. In addition, have carried out parallel experiments using epicardial explants derived from R26R-CreERT2;Phd2<sup>fl/fl</sup> (Figure 3) to circumvent any potential confounding issues; the results of which are consistent with increased epicardial EMT in support of our overall hypothesis.

      (5) On Figure 2a-f the authors reported that epicardial cells are diminished in Wt1CreERT2Hif1alpha mice as compared to controls. I am very sorry, but I do not see any difference. Furthermore, it is unclear to me how the authors quantified such differences, i.e., what marker signal did they use and how it was performed (Figure 2c and d)?

      We respectfully disagree with the reviewer and draw attention to the single channel panels of WT1+ staining in Figure 2, which show clear differences between numbers of epicardial cells in the mutant mice compared to controls (comparing magenta cells in panels a) versus b). Quantification was carried out for numbers of WT1+ cells residing within the PDPN-positive epicardium (and underlying PDPN-negative myocardium) across multiple images from multiple sections and multiple hearts.

      (6) On Figure 2g, the authors reported differences in total vessel length. Are they referring to impaired microvasculature development? Or is this analysis also including major coronary vessels? What about the major coronary vessels and trees, is there any affection?

      This analysis refers to the microvasculature and not the major coronary arteries or coronary trees.

      (7) The authors reported that there might be some differences in EMT markers, but unfortunately, all of them are analyzed on 2D cultures, where no substrate for EMT is present, i.e., an underlying ECM bed. Thus, the authors cannot claim that EMT is altered. Additional experiments using either collagen substrate and/or Matrigel are required to fully demonstrate that EMT is impaired. Furthermore, quantitative analyses of such differences should be provided.

      The 2D cultures are epicardial explants from mutant versus wild type hearts and represent a widely adopted previously published ex-vivo assay for investigating epicardial EMT across embryonic to adult stages (PMID: 27023710, PMID: 12297106; PMID: 17108969, PMID: 19235142); including an assessment of the extent of migration and cobblestone (epithelial) to spindle-like (mesenchymal) cell morphologies, stress fibre formation and expression of alpha-smooth muscle actin as a mesenchymal marker. We do not understand the comment regarding an “underlying ECM bed” as the cells exhibit EMT routinely on tissue culture plastic and will deposit their own ECM during the culture time course and in response to EMT/cell migration. In terms of quantification this was carried out for scratch assay experiments, as a proxy for EMT and emergent mesenchymal cell migration, as presented in Figure 3i, j with significant enhanced scratch closure and cell migration following Molidustat treatment.

      (8) The description of data provided on Supplementary Figure 5 is spurious and should be removed. A note in the discussion might be sufficient.

      We respectfully disagree. The ChIP-seq data, in what is now Figure 2- figure supplement 3, highlights a HIF-1 α binding site within the Wt1 locus suggesting putative upstream regulation of WT1 by HIF-1α. Thus this provides a potential explanation as to how HIF-1α may activate the epicardium through up-regulation of Wt1/WT1.

      (9) On Figure 3, the authors further illustrate the change of EMT markers using ex vivo cardiac explants. They reported increased expression of Snai2 that, although statistically significant, is most likely of no biological relevance (increase of only 20% at transcript level). What about Snai1, Prrx1, and other EMT promoters? Are they also induced? As previously stated, these 2D cultures do not provide supporting evidence that EMT is occurring, thus 3D gel assays should be performed in which Z-axis analyses will provide evidence on the different migratory behaviour of those cells.

      We respectfully suggest that a 20% change in snai2 expression is biologically meaningful with respect to EMT. This in-turn is supported by associated cell migration, reduced ZO-1 expression, increased stress fibres and increased alpha-SMA as a mesenchymal marker; all properties associated with active EMT. Other suggested markers have not been validated as formally required for EMT, for example Snai1 (PMID: 23097346). The migratory capacity of targeted versus epicardial cells was assessed by combined explant and scratch assay experiments.

      (10) The description of single-cell analyses is very incomplete. Which mice were used for these analyses, wildtype control, or hypoxic mice? Please provide a clearer description of the samples used. Additionally, the entire rationale of these analyses is dubious. Doing single-cell analyses to analyze a couple or three markers in a very small cell population is rather ridiculous. qPCR might be far more appropriate and convincing, or a bulk RNAseq analysis of isolated epicardial cells.

      The single-cell analyses represent an unbiased assessment of different pathways in epicardial cells (identified bioinformatically) between intact P1 and P7 stages in wild type (control) hearts, with a focus on hypoxia-related gene expression and HIF-dependent pathways. It was not designed to analyse a small number of genes, rather global differences in the hypoxic states between P1 and P7 hearts. Selected genes (Vegfa, Pdk3, Egln 1 (Phd2)) were analysed to highlight the key differences in hypoxic signalling across the regenerative window. The fact the hearts were uninjured/intact is clarified in the text and legends for Figure 4 and now Figure 4-figure supplement 1.

      (11) The analyses provided in Figure 5 are very interesting and their findings are very relevant. However, I would think that the complementary experimental approach should also be done, i.e, MI followed by activation with tamoxifen, since that situation would be more realistic in the clinical setting.

      Tamoxifen causes respiratory failure in neonates with MI, so the two cannot be combined at the same time or soon after surgery. Moreover, tamoxifen takes significant time to take effect on targeted gene down-regulation which may negate sufficient activation of the epicardium following injury.

      The experiments in Figure 5 were designed to demonstrate that prolonged heart regeneration could be elicited in a cell-specific (epicardial-specific) manner via a genetic approach. The pharmacological experiments in Figure 6 are complementary in this regard by demonstrating equivalent effects with drug (Molidustat) delivery to reduce PHD2 and stabilise HIF post-MI.

      (12) In Figure 6, expression of Wt1 is highly prominent in P7 controls, mainly restricted to the epicardial lining while in the experimental setting, such Wt1 expression is broadly distributed on the subepicardial space, nicely demonstrating epicardial activation. However, it is very surprising to see such Wt1 expression in controls, something that is not expected, as compared to the data reported in Figure 4g. Could the authors please reconcile these findings?

      Figure 6 represents the injury setting and Figure 4g the intact setting (as clarified above, in the text and revised figure legends). Hence in the latter WT1 expression is significantly reduced in the P7 heart, as anticipated. With injury at P7 we anticipate activation of WT1 in control hearts, albeit restricted to the epicardial layer (as occurs in adult hearts, PMID: 21505261). In contrast, following Molidustat-treatment of P7 hearts post-MI we observe extensive epicardial expansion into the sub-epicardial region and EPDC migration into the underlying myocardium (Figure 6b).

      Reviewer #2 (Recommendations for the authors):

      The role of hypoxia and HIF1a signaling in epicardial activation is an important topic, and the genetic approaches employed in this study are appropriate. However, several aspects of the study remain unclear and would benefit from further clarification or explanation by the authors:

      (1) The authors detected hypoxic regions using an anti-pimonidazole fluorescence-conjugated monoclonal antibody (HP1). The data would become more compelling if negative and positive controls were provided.

      We believe the HP1 staining is compelling in the images shown and is consistent with hypoxic regions of the developing heart. We reveal HP1 staining at cellular resolution with neighbouring cells positive and negative for the HP1 signal in the apex of the heart and within the epicardium and sub-epicardial regions at E12.5 (Figure 1a) and diminished/altered hypoxic/HP1 regional signal through subsequent developmental stages at E14.5-18.5 (Figure 1a-d).

      (2) Many HIF1a-positive cells in the AV groove region do not appear to overlap with HP1 staining (Figure 1a). Providing a low-magnification image of HIF1α expression would be helpful to better assess the extent of overlap with HP1 staining

      HIF-1 is highly unstable and hence detection of HIF-1+ cells will likely only sample of cells compared to HP1 which is a surrogate for broader regions of hypoxia.

      (3) Although the authors conclude that epicardial HIF1a deletion results in a significant reduction of WT1⁺ cells in both the epicardium and myocardium (Figure 2a-d), the provided images are not sufficiently clear to fully support this interpretation. Providing additional evidence to support this conclusion would be helpful.

      We respectfully disagree with the reviewer and draw attention to the single channel panels of WT1+ staining which show clear differences between numbers of epicardial cells in the mutant mice compared to controls (Figure 2a versus 2b; magenta WT1+ staining).

      (4) Similar to the point raised above, the authors' conclusion regarding the increased expression of WT1 following Molidustat treatment does not appear to be fully supported by the provided images (Figure 6b-f). Immunofluorescence staining for WT1 does not clearly demonstrate epicardial expression in the remote zone of either the control or Molidustat-treated hearts. In addition, while an increase of WT1<sup>+</sup> cells is observed in the infarct zone of the Molidustat-treated heart, it is somewhat unexpected that such expansion is not evident in the corresponding region of the control heart, given that epicardial cells typically expand near the infarct area. Clarification on these points would be helpful.

      Figure 6b reveals WT1 expression in controls (upper panel set) that is reactivated proximal to the infarct region, given WT1 is not expressed in adult epicardium but restricted to the epicardial layer (as occurs in injured adult mouse hearts PMID: 21505261). This contrasts with what is observed in the Molidustat-treated P7 hearts post-MI, where we observe epicardial expansion and migration of WT1+ cells into the underlying myocardium (Figure 6b, lower panel set, infarct zone).

      (5) The authors conclude that WT1<sup>+</sup> cells in the myocardial tissue exhibit endothelial identity based on the colocalization of WT1 and EMCN signals (Supplementary Figure 9c). However, this interpretation is difficult to assess, as WT1 is a nuclear marker and EMCN is a membrane protein, which makes precise colocalization challenging to confirm with confidence. Additional supporting evidence may be necessary to substantiate this conclusion.

      WT1 is known to be up regulated in endothelial cells in response to injury as shown previously in several studies (for example, PMID: 25681586). Here we show clear co-localisation of nuclear WT1 and cytoplasmic Endomucin (EMCN) in what is now Figure 6- figure supplement 1c and would encourage the reviewer and readers to magnify the image by zooming-in on the relevant co-stained panel.

      (6) The authors conclude that activation of epicardial HIF1a signaling has no effect on neovascularization in postnatal MI hearts (Figure 5c). However, the abstract states: "Finally, a combination of genetic and pharmacological stabilisation of HIF ... increased vascularisation, augmented infarct resolution and preserved function beyond the 7-day regenerative window" (Lines 38-41). Clarification regarding this apparent discrepancy would be appreciated.

      The abstract has been altered to remove the statement of increased vascularisation.

      (7) The study appears somewhat incomplete, as it lacks mechanistic insight into the functional recovery observed following epicardial Phd2 deletion and Molidustat treatment in postnatal MI hearts. Although the authors suggest a potential paracrine role of the epicardium in protecting cardiomyocytes from apoptosis, this hypothesis has not been experimentally addressed. Incorporating such analysis would help to reinforce the study's conclusions.

      Further experiments are required, which are out-of-scope of this study, to define a mechanistic link between the genetic or pharmacological stabilisation of HIF-signalling, epicardial activation and myocardial survival in the setting of prolonged neonatal heart regeneration.

      Other points:

      (1) Providing single-channel images for Figures 1a-d and 6g would be helpful for clarity and interpretation.

      We believe the combined channel views of co-staining for two markers on a background of DAPI staining to pin-point cell nuclei, are informative and support our conclusions.

      (2) Have the authors considered using AngioTool to quantify the number of vessels in Figure 5b-c?

      AngioToolTM was used to quantify the vessels, as we have used previously (PMID: 33462113) and this is now added to the methods and legend of Figure 2.

      Reviewer #3 (Recommendations for the authors):

      There are several areas where the manuscript can be improved, such that its conclusions can be solidified.

      (1) The authors highlight a point where blocking Phd2 can enhance survival of cardiac tissue, but did not report on survival markers. They surmised that apoptosis could be decreased in Phd2 mutant or Molidustat treatment but did not show this. The authors should determine if apoptosis is decreased in the myocardium and epicardium.

      We show evidence of increased levels of healthy myocardium in the genetic and pharmacological models of stabilised HIF-signalling. We exclude increased cardiac hypertrophy or increased cardiomyocyte proliferation as causative, so suggest as a reasonable alternative enhanced survival, albeit this need not necessarily be via an apoptotic pathway given the incidence of necrotic cell death during MI. We are unable to generate new surgeries and mutant/treated heart samples to analyse for apoptotic markers at this stage.

      (2) There appears to be no difference in cardiomyocyte proliferation in Molidustat-treated animals, but the experiment was only performed on 2 to 3 animals. This is too small a sample size to conclude from these results. The authors should increase the sample size to make this assertion.

      We respectfully disagree that we are unable to conclude no effect on cardiomyocyte proliferation. We analysed multiple heart regions per section, for EdU+/cTnT+ colocalised signals across several sections per heart, set against a consistency of effect on other parameters in hearts treated with Molidustat. We are unable to generate more P7 heart surgeries +/- Molidustat and +/- EdU at this stage.

      (3) It is curious as to how, after myocardial infarction, the fibrotic scar tissue is decreased in the Phd2 deletion but not as profound in Molidustat-treated mice at d21. Can the authors speculate why the difference exists and how this decrease arises? For example, are there decreased pro-inflammatory signals in Phd2 deleted mice? Is there decreased collagen deposition and ECM gene expression? Do macrophage recruitment into the infarct zone differ between mutant/treated vs WT?

      The representative images in Figure 6k reveal a trend towards reduced fibrosis with Molidistat treatment (Figure 6l), but across all hearts analysed this was not as significant as observed in the epicardial-specific deletion injured hearts (Figure 5g, h). This may be due to the relatively short half-life of Molidustat (approximately 4-10 hours, PMID: 32248614), the dosing regimen for the drug and/or the fact that it was not specifically delivered/targeted to the epicardium.

      (4) The magnified images in Figure 1 do not match the boxes in the whole heart images. It is unclear what the white boxes signify.

      The white boxes have been removed from Figure 1. The magnified image panels are from serial heart sections and this is now clarified in the Figure 1 legend.

    1. eLife Assessment

      This fundamental work substantially advances our understanding of how the glycocalyx of cells provide a non-specific barrier for the interaction of viruses with cell-surface receptors. Using both in vitro experiments and in vivo manipulations they provide compelling evidence for the properties of the glycocalyx to serve as an energy barrier as a main attribute of its mode of action. The work will be of broad interest to virologists and the cell biology community that studies host-pathogen interactions.

    2. Joint Public Review:

      This manuscript tests the notion that bulky membrane glycoproteins suppress viral infection through non-specific interactions. Using a suite of biochemical, biophysical, and computational methods in multiple contexts (ex vivo, in vitro, and in silico), the authors collect compelling evidence supporting the notion that (1) a wide range of surface glycoproteins erect an energy barrier for the virus to form stable adhesive interface needed for fusion and uptake and (2) the total amount of glycan, independent of their molecular identity, additively enhanced the suppression.

      As a functional assay the authors focus on viral infection starting from the assumption that a physical boundary modulated by overexpressing a protein-of-interest could prevent viral entry and subsequent infection. Here they find that glycan content (measured using the PNA lectin) of the overexpressed protein and total molecular weight, that includes amino acid weight and the glycan weight, is negatively correlated with viral infection. They continue to demonstrate that it is in effect the total glycan content, using a variety of lectin labelling, that is responsible for reduced infection in cells. Because the authors do not find a loss in virus binding this allows them to hypothesize that the glycan content presents a barrier for the stable membrane-membrane contact between virus and cell. They subsequently set out to determine the effective radius of the proteins at the membrane and demonstrate that on a supported lipid bilayer the glycosylated proteins do not transition from the mushroom to the brush regime at the densities used. Finally, using Super Resolution microscopy they find that above an effective radius of 5 nm proteins are excluded from the virus-cell interface.

      The experimental design does not present major concerns and the results provide insight on a biophysical mechanism according to which, repulsion forces between branched glycan chains of highly glycosylated proteins exert a kinetic energy barrier that limits the formation of a membrane/viral interface required for infection.

      In their revised manuscript and rebuttal, the authors address several general and specific concerns that were raised about their first submission. The revised manuscript now makes the strength of the evidence supporting their claims, compelling.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Public Review

      GENERAL QUESTIONS:

      (1) For many enveloped viruses, the attachment factors - paradoxically - are also surface glycoproteins, often complexed with a distinct fusion protein. The authors note here that the glycoportiens do not inhibit the initial binding, but only limit the stability of the adhesive interface needed for subsequent membrane fusion and viral uptake. How these antagonistic tendencies might play out should be discussed.

      When the surface density of receptor molecules for a virus with glycans increases, the density of free glycans not bound to the virus increases along with the amount of virus adsorbed. However, if the total amount of glycans is considered to be a function of the receptor density, the reaction may become more complicated. This complication may also be affected by the prolonged infection. If the receptor density on the cell surface is high, the infection inhibitory effect of glycans may not be obtained in a system in which a high concentration of virus is supplied from the outside world for a long time. This is because once viruses have entered the cell, they accumulate inside the cell, and viral infection is affected by the total accumulated amount, which is the integration of the number of viruses that have entered over time. This distinction indicates that the virus entry reaction and the total amount of infection in the cell must be considered separately. This is an important point, but it was not clearly mentioned in the original manuscript.

      Our experiments were conducted under conditions that clearly allowed us to detect the virusinhibiting function of glycans without being affected by the above points. In order to clarify these points, we will revise this article as follows, referring to an experiment that is somewhat related to this discussion (the Adenovirus infection experiment into HEK293T cells shown in Figure S1F)..

      (Page-3, Introduction)

      While there are known examples of glycans that function as viral receptors (Thompson et al., 2019), these results demonstrate that a variety of glycoproteins negatively regulate viral infection in a wide range of systems. All of these results suggest that bulky membrane glycoproteins nonspecifically inhibit viral infection.

      (Page 20, Discussion)

      When the virus receptor is a glycoprotein or glycan itself, the inhibition of virus infection by glycans becomes more complex because the total amount of glycans is also a function of the receptor density. It is also important to note that the total amount of infection into a cell is the time integral of virus entry. Even if the probability of virus entry is significantly reduced by glycans, the cumulative number of virus entries may increase if high concentrations of virus continue to be supplied from outside the cell for a long period of time. In the case of Adenovirus, which continues to amplify in HEK293T cells after infection, we showed that MUC1 on the cell surface has an inhibitory effect on long-term cumulative infection (Supplementary Figure 1F). However, such an accumulation effect may be caseby-case depending on the virus cell system, and may be more pronounced when the cell surface density of virus receptor molecules is high. As a result, if the virus receptor molecule is a glycan or glycoprotein and infection continues for a long period of time, the infection inhibition effect may not be observed despite an apparent increase in the total amount of glycans in the cell. In any case, our results clarified the factor of virus entry inhibition dependent on the total amount of glycans because appropriate conditions were set.

      (2) Unlike polymers tethered to solid surface undergoing mushroom-to-brush transition in densitydependent manner, the glycoproteins at the cell surface are of course mobile (presumably in a density-dependent manner). They can thus redistribute in spatial patterns, which serve to minimize the free energy. I suggest the authors explicitly address how these considerations influence the in vitro reconstitution assays seeking to assess the glycosylation-dependent protein packing.

      We performed additional experiments using lipid bilayers that had lost fluidity, and found that there is no significant difference in protein binding between fluid and nonfluid bilayers. The redistribution of molecules due to molecular fluidity may play some roles but not in our experimental systems. It suggests that glycoproteins can generate intermolecular repulsion even in fluid conditions such as cell membranes, just as they do on the solid phase. This experiment was also very useful because it allowed us to compare our results in the fluid bilayer with solid-state measurements of saturation molecular density and the brush transition. This comparison gave us confidence that in the reconstituted membrane system, even at saturation density, the membrane proteins are not as stretched as they are in the condensed brush state. We have therefore added a new paragraph and a new figure (Supplementary Fig. 5B) to discuss this issue, as follows:

      The molecular structural state of these proteins needs to be further discussed to estimate the contribution of f<sub>el</sub>, which represents resistance to molecular elongation. Our results suggest that these densely packed nonglycosylated molecules are no longer in a free mushroom state. However, their saturation density was several times lower than previously reported brush transition densities, such as 65000 µm<sup>-2</sup> for 17 kDa polyacrylamide (R<sub>F</sub> ~ 15 nm) on a solid surface (Wu et al., 2002). To compare our data on fluid bilayers with previously reported data on solid surfaces, we performed additional experiments with lipid bilayers that lost fluidity. No significant changes in protein binding between fluid and nonfluid bilayers were observed for both b-MUC1 and g-MUC1 molecules (Supplementary Figure 5B). This result suggests that membrane fluidity does not affect the average intermolecular distance or other relevant parameters that control molecular binding in the reconstituted system. Based on these, we speculate that the saturated protein density observed in our experiments is lower than or at most comparable to the actual brush transition density. Thus, although these crowded proteins may be restricted from free random motion, they are not significantly extended as in the condensed brush state, in which the contribution of resistance to molecular extension f<sub>el</sub> is expected to be small relative to the overall free energy of the system.

      (3) The discussion of the role of excluded volume in steric repulsion between glycoprotein needs clarification. As presented, it's unclear what the role of "excluded volume" effects is in driving steric repulsion? Do the authors imply depletion forces? Or the volume unavailable due to stochastic configurations of gaussian chains? How does the formalism apply to branched membrane glycoproteins is not immediately obvious.

      Regarding the excluded volume due to steric repulsion between glycoproteins, we considered the volume that cannot be used by glycans as Gaussian chains branching from the main chain. We would like to expand on this point by adding several papers that make similar arguments. I'm glad you brought this up because we hadn't considered depletion forces - the excluded volume between glycoproteins should generate a depletion force, but in this case we believe this force will not have a significant effect on viruses that are larger than the glycoproteins. We also attempted to clarify the discussion in this section by focusing on intermolecular repulsion, and restructured paragraphs, which are also related to General Question 2 and Specific Question 2. The relevant part has been revised as follows. (page 15~page16):

      To compare the packing of proteins with different molecular weights and R<sub>F</sub>, These were smaller than the coverage of molecules at hexagonal close packing that is ~90.7%. In contrast, the coverage of b-CD43 and b-MUC1 at saturated binding was estimated to be greater than 100% under this normalization standard, indicating that the mean projected sizes of these molecules in surface direction were smaller than those expected from their R<sub>F</sub> Thus, it is clear that glycosylation reduces the saturation density of membrane proteins, regardless of molecular size.

      Highly glycosylated proteins resisted densification, indicating that some intermolecular repulsion is occurring. In the framework of polymer brush theory, the intermolecular repulsion of densely packed highly glycosylated proteins is due to an increase in either f<sub>el</sub>, f<sub>int</sub> (d<R<sub>F</sub>), or both (Hansen et al., 2003; Wu et al., 2002). The term of intermolecular interaction, f<sub>int</sub>, is regulated by intermolecular steric repulsion, which occurs when neighboring molecules cannot approach the excluded volume created by the stochastic configuration of the polymer chain (Attili et al., 2012; Faivre et al., 2018; Kreussling and Ullman, 1954; Kuo et al., 2018; Paturej et al., 2016). The magnitude of this steric repulsion depends largely on R<sub>F</sub> in dilute solutions, but the molecular structure may also affect it when molecules are densified on a surface. In other words, the glycans protruding between molecules can cause steric inhibition between neighboring proteins (Figure 5D). Such intermolecular repulsion due to branched side chains occurs only when the molecules are in close proximity and sterically interact on a twodimensional surface, but not in dilute solution, and does not occur in unbranched polymers such as underglycosylated proteins (Figure 5D). Based on the above, we propose the following model for membrane proteins: Only when the membrane proteins are glycosylated does strong steric repulsion occur between neighboring molecules during the densification process, suppressing densification.

      The molecular structural state of these proteins needs to be further discussed to estimate the contribution of f<sub>el</sub>, which represents resistance to molecular elongation. Our results suggest that these densely packed nonglycosylated molecules are no longer in a free mushroom state. However, their saturation density was several times lower than previously reported brush transition densities, such as 65000 µm<sup>-2</sup> for 17 kDa polyacrylamide (R<sub>F</sub> ~ 15 nm) on a solid surface (Wu et al., 2002). To compare our data on fluid bilayers with previously reported data on solid surfaces, we performed additional experiments with lipid bilayers that lost fluidity. No significant changes in protein binding between fluid and nonfluid bilayers were observed for both b-MUC1 and g-MUC1 molecules (Supplementary Figure 5B). This result suggests that membrane fluidity does not affect the average intermolecular distance or other relevant parameters that control molecular binding in the reconstituted system. Based on these, we speculate that the saturated protein density observed in our experiments is lower than or at most comparable to the actual brush transition density. Thus, although these crowded proteins may be restricted from free random motion, they are not significantly extended as in the condensed brush state, in which the contribution of resistance to molecular extension f<sub>el</sub>, is expected to be small relative to the overall free energy of the system.

      Note that this does not mean that glycoproteins cannot form condensed brush structures: in fact, highly glycosylated molecules (e.g., MUC1) can form brush structures in cells when such proteins are expressed at very high densities. (Shurer et al., 2019). In these cells, ………. Such membrane deformation results in the increase of total surface area to reduce the density of glycoproteins, indicating that there is strong intermolecular repulsion between glycoproteins. In any case, the free energy of the system is determined by the balance between protein binding and insertion into the membrane, protein deformation, and repulsive forces between proteins, which determine the density of proteins depending on the configuration of the system. Thus, although strong intermolecular repulsions were prominently observed in our simplified system, this may not be the case in other systems. ……

      (4) The authors showed that glycoprotein expression inversely correlated with viral infection and link viral entry inhibition to steric hindrance caused by the glycoprotein. Alternative explanations would be that the glycoprotein expression (a) reroutes endocytosed viral particles or (b) lowers cellular endocytic rates and via either mechanism reduce viral infection. The authors should provide evidence that these alternatives are not occurring in their system. They could for example experimentally test whether non-specific endocytosis is still operational at similar levels, measured with fluid-phase markers such as 10kDa dextrans.

      The results of the experiment suggested by the reviewer are shown in the new Supplementary Figure 3B. (This results in generation of a new Supplementary Figure 3, and previous Supplementary Figures 4-5 are now renumbered as Supplementary Figures 5-6). Endocytosis of 10KDa dextran was attenuated by the expression of several large-sized molecules, but was not affected by the expression of many other glycoproteins that have the ability to inhibit infection. These results were clearly different from the results in which virus infection was inhibited more by the amount of glycan than by molecular weight. Therefore, it was found that many glycoproteins inhibit virus infection through processes other than endocytosis. Based on the above, we have added the following to the manuscript: (p9 New paragraph:)

      We also investigated the effect of membrane glycoproteins on membrane trafficking, another process involved in viral infection. Expression of MUC1 with higher number of tandem repeats reduced the dextran transport in the fluid phase, while expression of multiple membrane glycoproteins that have infection inhibitory effects, including truncated MUC1 molecules, showed no effect on fluid phase endocytosis, indicating a molecular weight-dependent effect (Supplementary Figure 3B). The molecular weight-dependent inhibition of endocytosis may be due to factors such as steric inhibition of the approach of dextran molecules or a reduction in the transportable volume within the endosome. In any case, it is clear that many low molecular weight glycoproteins inhibit infection by disturbing processes other than endocytosis. Based on the above, we focus on the effect of glycoproteins on the formation of the interface between the virus and the cell membrane.

      (5) The authors approach their system with the goal of generalizing the cell membrane (the cumulative effect of all cell membrane molecules on viral entry), but what about the inverse? How does the nature of the molecule seeking entry affect the interface? For example, a lipid nanoparticle vs a virus with a short virus-cell distance vs a virus with a large virus-cell distance?

      Thank you for your interesting comment. If the molecular size of the ligand is large, it should affect virus adsorption and molecular exclusion from the interface. In lipid nanoparticle applications, controlling this parameter may contribute to efficiency. In addition, a related discussion is the influence of virus shell molecules that are not bound to the receptor. I will revise the text based on the above.

      Discussion (as a new paragraph after the paragraph added in Q1):

      In this study, we attempted to generalize the surface structure on the cell side, but the surface structure on the virus side may also have an effect. The efficiency of virus adsorption and the efficiency of cell membrane protein exclusion from the interface will change depending on the molecular length of the receptor-ligand, although receptor priming also has an effect. In addition, free ligands of the viral envelope or other coexisting glycoproteins may also have an effect as they are also required for exclusion from the virus-cell interface. In fact, there are reports that expression of CD43 and PSGL-1 on the virus surface reduces virus infection efficiency (Murakami et al., 2020). Such interface structure may be one of the factors that determine the infection efficiency that differs depending on the virus strain. More generally, modification of the surface structure may be effective for designing materials such as lipid nanoparticles that construct the interface with cell.

      SPECIFIC QUESTIONS:

      (1) The proposed mechanism indicates that glycosylation status does not produce an effect in the "trapping" of virus, but in later stages of the formation of the virus/membrane interface due to the high energetic costs of displacing highly glycosylated molecules at the vicinity of the virus/membrane interface. It is suggested to present a correlation between the levels of glycans in the Calu-3 cell monolayers and the number of viral particles bound to cell surface at different pulse times. Results may be quantified following the same method as shown in Figure 2 for the correlation between glycosylation levels and viral infection (in this case the resulting output could be number of viral particles bound as a function of glycan content).

      The results of this experiment are now shown as Supplementary Figure 2F and 2G. We compared the amount of virus bound after incubation for 10 minutes or for 3 hours as in the infection experiment, but no negative correlation was found between the total amount of glycans on the surface of the Calu3 monolayer and the amount of virus bound. Interestingly, there was a sight positive correlation was detected, which may be due to concentrated virus receptor expressions in glycan-enriched cells. This result shows that glycoproteins do not strongly inhibit virus binding. We will amend the text as follows (see also Q6).

      (Page 10)

      Glycans could be one of the biochemical substances ……We found that a large number of SARS-CoV2-PP can still bind to cells even when cells expressed sufficient amounts of the glycoprotein that could account for the majority of glycans within these cells and inhibit viral infection (Figure 3A). Similarly, on the two-dimensional culture surface of Calu-3 cells, no negative correlation was observed between the number of viruses bound and the total amount of glycans on the cell surface (Supplementary Figure 2F-G). The slight positive correlation between bound virus and glycans may be due to higher expression levels of viral receptors in glycan-rich cells. ….

      (2) The use of the purified glycosylated and non-glycosylated ectodomains of MUC1 and CD-43 to establish a relationship between glycosylation and protein density into lipid bilayers on silica beads is an elegant approach. An assessment of the impact of glycosylation in the structural conformation of both proteins, for instance determining the Flory radius of the glycosylated and non-glycosylated ectodomains by the FRET-FLIM approach used in Figure 4 would serve to further support the hypothesis of the article.

      Unfortunately, the proposed experiment did not provide a strong enough FRET signal for analysis. This was due in part to the difficulty in constructing a bead-coated bilayer incorporating PlasMem Bright Red, which established a good FRET pair in cell experiments. We also tried other fluorescent molecules, but were unable to obtain a strong and stable FRET signal. Another reason may be that the curvature of the beads is larger than that of the cells, making it difficult to obtain a sufficient cumulative FRET effect from multiple membrane dyes. We plan to improve the experimental system in the future.

      On the other hand, we also found that in this system, the signal changes were very subtle, making it difficult to detect molecular conformational changes using FRET. After reconsidering general questions (2) and (3), we speculated that the molecular density in the experiment, even at saturation binding, was below or at most equivalent to the brush transition point. In other words, proteins on the bead-coated bilayer may not be significantly extended in the vertical direction. Therefore, the conformational changes of these proteins may not be large enough to be detected by the FRET assay. We updated Figure 3C and Figure 5D (model description) to better reflect the above discussion and introduced the following discussion in the manuscript.

      (page11)

      We introduced the framework of conventional polymer brush theory to study the structure of viruscell interfaces containing proteins……. Numerous experimental measurements of the formation of polymer brushes have also been reported (Overney et al., 1996; Wu et al., 2002; Zhao and Brittain, 2000). In these measurements, the transition to a brush typically occurs at a density higher than that required to pack a surface with hemispherical polymers of diameter R<sub>F</sub>. This is the point at which the energy loss due to repulsive forces between adjacent molecules (f<sub>int</sub>) exceeds the energy required to stretch the polymer perpendicularly into a brush (f<sub>el</sub>).

      (page16)

      The molecular structural state of these proteins needs to be further discussed to estimate the contribution of f<sub>el</sub>, which represents resistance to molecular elongation. Our results suggest that these densely packed nonglycosylated molecules are no longer in a free mushroom state. However, their saturation density was several times lower than previously reported brush transition densities, such as 65000 µm<sup>-2</sup> for 17 kDa polyacrylamide (R<sub>F</sub> ~ 15 nm) on a solid surface (Wu et al., 2002). To compare our data on fluid bilayers with previously reported data on solid surfaces, we performed additional experiments with lipid bilayers that lost fluidity. No significant changes in protein binding between fluid and nonfluid bilayers were observed for both b-MUC1 and g-MUC1 molecules (Supplementary Figure 5B). This result suggests that membrane fluidity does not affect the average intermolecular distance or other relevant parameters that control molecular binding in the reconstituted system. Based on these, we speculate that the saturated protein density observed in our experiments is lower than or at most comparable to the actual brush transition density. Thus, although these crowded proteins may be restricted from free random motion, they are not significantly extended as in the condensed brush state, in which the contribution of resistance to molecular extension f<sub>el</sub> is expected to be small relative to the overall free energy of the system.

      Note that this does not mean that glycoproteins cannot form condensed brush structures: in fact, highly glycosylated molecules (e.g., MUC1) can form brush structures in cells when such proteins are expressed at very high densities. (Shurer et al., 2019). In these cells, ………. Such membrane deformation results in the increase of total surface area to reduce the density of glycoproteins, indicating that there is strong intermolecular repulsion between glycoproteins. In any case, the free energy of the system is determined by the balance between protein binding and insertion into the membrane, protein deformation, and repulsive forces between proteins, which determine the density of proteins depending on the configuration of the system. Thus, although strong intermolecular repulsions were prominently observed in our simplified system, this may not be the case in other systems. ……

      (3) The MUC1 glycoprotein is reported to have a dramatic effect in reducing viral infection shown in Fig 1F. On the contrary, in a different experiment shown in Fig2D and Fig2H MUC1 has almost no effect in reducing viral infection. It is not clear how these two findings can be compatible.

      The immunostaining results show that the density of MUC1 molecules is very low in the experimental system in Figure 2 (Figure 2C), which is supported by the SC-RNASeq data (as shown in Supplementary Figure 2A, MUC1 is not listed as a top molecule). In other words, the MUC1 expression level in this experiment is too low to affect virus infection inhibition. On the other hand, the Pearson correlation function represents the strength of the linear relationship between two variables, so it is not the most appropriate indicator for seeing the correlation with the MUC1 expression level, which has little change (Figure 2D, 2F). In fact, even TOS analysis, which can see the correlation by focusing on the cells with the highest expression level, cannot detect the correlation (Figure 2H).Therefore, the MUC1 data in Figure 2DFH will be annotated and corrected in the figure legend.

      Figure2 Legend:

      MUC1 has a small mean expression level and variance, and is more affected by measurement noise than other molecules when calculating the Pearson correlation function (Figure 2C-2F). In addition, the number of cells in which expression can be detected is small, so no significant correlation was detected by TOS analysis (Figure 2H).

      (4) Why is there a shift in the use of the glycan marker? How does this affect the conclusions? For the infection correlation relating protein expression with glycan content the PNA-lectin was used together with flow cytometry. For imaging the infection and correlating with glycan content the SSA-lectin is used.

      For each cell line, we selected the lectin that could be measured over the widest dynamic range. This lectin is thought to recognize the predominant glycan species in the cell line (Fig. S1C, Fig. 2D). In our model, we believe that viral infection inhibition is not specific to the type of sugar, but is highly dependent on the total amount of glycans. If this hypothesis is correct, the reason we used different lectins in each experiment is simply to select the lectin that recognizes the most predominant glycan species that is most convenient for predicting the total amount of glycans in cells. This hypothesis is consistent with our observations, where the total amount of glycans estimated by different lectins could explain the infection inhibition in a similar way in the experiments in Figures 1 and 2, and the TOS analysis in Figure 2 showed that minor glycans also have an infection inhibitory effect. On the other hand, it is of course possible to predict the total amount of glycans more accurately by obtaining as much information on glycans as possible (related to Q5). Based on the above discussion, the manuscript will be revised as follows.

      Page5

      Using HEK293T cell lines exogenously expressing genes of these proteins tagged with fluorescent markers, their glycosylation was measured by binding of a lectin from Arachis hypogaea (PNA), and the number of these proteins in the cells was measured simultaneously. PNA was used for the measurement because it has a wider dynamic range than other lectins (Supplementary Figure 1C). This suggests that GalNAc recognized by PNA is predominantly present on glycans of HEK293T cells, especially on the termini of glycans that are amenable to lectin binding, compared to other saccharides.. …

      page9  

      Our findings suggest that membrane glycoproteins nonspecifically inhibit viral infection, and we hypothesize that their inhibitory function is also nonspecific depending on the type of glycan. Our hypothesis is consistent with the observations in the TOS analysis. Although minor saccharide species in the system (such as GlcNAc and GalNAc recognized by DSA, WGA, or PNA) showed anticolocalization with infection, their scores were much lower than those of major saccharide species. This suggests that all major and minor saccharide species have an infection inhibitory effect, but cells enriched with minor type glycans are only partially present in the system, and the contribution of these cells to virus inhibition is also partial. It is also consistent with the observation that the amount of GalNAc recognized by PNA determines the virus infection inhibition in HEK 293T cells (Figure 1). Therefore, we believe that our assay using a single type of predominantly expressed lectin is still useful for estimating the total glycan content. Nevertheless, the virus infection rate may show a better correlation with a more accurately estimated total glycan in each cell. For example, the use of multiple lectins with appropriate calibration to integrate multiple signals to simultaneously detect a wider range of saccharide species would allow for more accurate estimation. It should be noted that the amount of bound lectin does not necessarily measure the overall glycan composition but likely reflects the sugar population at the free end of the glycan chain to which the lectin binds most.

      (5) The authors in several instances comment on the relevance and importance of the total glycan content. Nevertheless, these conclusions are often drawn when using only one glycan-binding lectin. In fact, the anti-correlation with viral infection is distinct for the various lectins (Fig 2D and Fig 2H). Would it make more sense to use a combination of lectins to get a full glycan spectrum?

      As stated in the answer to Q4, we believe that we were able to detect the infection-suppressing effect of the total glycan amount by using the measurement value of the major component glycan as an approximation. However, as you pointed out, if we could accurately measure the minor glycan components and add up their values, we believe that we could measure the total glycan amount more accurately. In order to measure multiple glycans simultaneously and with high accuracy, some kind of biochemical calibration may be necessary to compare the measurements of lectin-glycan pairs with different binding constants. We believe that these are very useful techniques, and would like to consider them as a future challenge. The corrections listed in Q4 are shown below.

      (Page 9)

      Nevertheless, the virus infection rate may show a better correlation with a more accurately estimated total glycan in each cell. For example, the use of multiple lectins with appropriate calibration to integrate multiple signals to simultaneously detect a wider range of glycans would allow for more accurate estimation. …….

      (6) Fig 3A shows virus binding to HEK cells upon MUC1 expression. Please provide the surface expression of the MUC1 so that the data can be compared to Fig 1F. Nevertheless, it is not clear why the authors used MUC expression as a parameter to assess virus binding. Alternatively, more conclusive data supporting the hypothesis would be the absence of a correlation between total glycan content and virus binding capacity.

      The relationship between the expression level of MUC1 in each cell and the amount of virus binding is shown in Supplementary Figure 3A. There is no correlation between the two. In HEK293T cells, many glycans are modified with MUC1, so MUC1 was used as the indicator for analysis (Supplementary Figure 1C). As you pointed out, it is better to use the amount of glycan as an indicator, so we analyzed the relationship between the amount of bound virus and the amount of glycan on the surface on the Calu-3 monolayer (Supplementary Figure 2F, 2G, introduced in the answer to Specific (Q1)). In any case, no correlation was found between virus binding and surface glycans. I will correct the manuscript as follows.

      (page 9)

      Glycans could be one of the biochemical substances that link the intracellular molecular composition and macroscopic steric forces at the cell surface. To clarify this connection, we further investigated the mechanism by which membrane glycoproteins inhibit viral infection. First, we measured viral binding to cells to determine which step of infection is inhibited. We found that a large number of SARS-CoV2-PP can still bind to cells even when cells expressed sufficient amounts of the glycoprotein that could account for the majority of glycans within these cells and inhibit viral infection (Figure 3A). Similarly, on the two-dimensional culture surface of Calu-3 cells, no correlation was observed between the number of viruses bound and the total amount of glycans on the cell surface (Supplementary Figure 2F-G). These results indicate that glycoproteins do not inhibit virus binding to cells, but rather inhibit the steps required for subsequent virus internalization.

      (7) While the use of the Flory model could provide a simplification for a (disordered) flexible structure such as MUC1, where the number of amino acids equals N in the Flory model, this generalisation will not hold for all the proteins. Because folding will dramatically change the effective polypeptide chain-length and reduce available positioning of the amino acids, something the authors clearly measured (Fig 4G), this generalisation is not correct. In fact, the generalisation does not seem to be required because the authors provide an estimation for the effective Flory radius using their FRET approach

      Current theories generalizing the Flory model to proteins are incomplete, and it is certainly not possible to accurately estimate the size of individual molecules undergoing different folding. However, we found such a generalized model to be useful in understanding the overall properties of membrane proteins. In our experiments, we were indeed able to obtain the R<sub>F</sub>s of some individual molecules by FRET measurements. However, this modeling made it possible to estimate the distribution range of the RFs, including for larger proteins that cannot be measured by FRET. For example, from our results, we can estimate that the upper limit of the RFs of the longest membrane proteins is about 10.5 nm, assuming that the proteins follow the Flory model in all respects except for the shortening of the effective length due to folding. These analyses are useful for physical modeling of nonspecific phenomena, as in our case.

      In order to discuss the balance between such theoretical validity and the convenience of practical handling, we revise the manuscript as follows.

      (page 13) 

      This shift in ν indicates that glycosylation increases the size of the protein at equilibrium, but the change in R<sub>F</sub> is slight, e.g., a 1.3-fold increase for one of the longest ectodomains with N = 4000 when these values of ν are applied. This calculation also gives a rough estimate of the upper limit of the R<sub>F</sub> of the extracellular domains of all membrane proteins in the human genome (approximately 10.5 nm). Physically, this change in ν by glycosylation may be caused by the increased intramolecular exclusion induced sterically between glycan chains. This estimated ν are much smaller than that of 0.6 for polymers in good solvents, possibly due to protein folding or anchoring effects on the membrane. In fact, the ν of an intrinsically disordered protein in solution has been reported to be close to 0.6 (Riback et al., 2019; Tesei et al., 2024). Overall, these analyses using the Flory model provide information on the size distribution of membrane proteins and the influence of glycans, although the model cannot predict the exact size of each protein due to its specific folding.

      MINOR COMMENTS/EDITS:

      (1) In Figures 2A and 2C, as well as Supplemental Figure 2C, the fluorescent images indicate that GFP expression differs among the various groups. Ideally, these should be at the same GFP expression level, as the glycan and antibody staining occurred post-viral infection. For instance, ACE2 is a well-known positive control and should enhance SARS-CoV-2 infection. Yet, based on the findings presented in Supplemental Figure 2C, ACE2 appears to correlate with the lowest infection rate. The relationship between the infection rate and key glycoproteins needs clearer quantification.

      We measured the virus inhibition effect specific to each molecule using a cell line expressing low levels of viral receptors and glycoproteins (Fig. 1). On the other hand, the system in Fig. 2 contains diverse viral receptors and glycoproteins and has not been genetically manipulated. (We apologize that there was a typo in our description of experiment, which will be corrected, as shown below). The variation in infection rate between samples was caused by multiple factors but was not related to the molecule for which the correlation was measured. The receptor-based normalization used in the experiment in Fig. 1 cannot be applied in this system in Fig.2 due to the complexity of the gene expression profile. Therefore, instead of such parameter-based normalization, we applied Pearson correlation and TOS analysis. In the calculation of Pearson correlation, intensities are normalized. TOS analysis allows the analysis of colocalization between the groups with the highest fluorescence intensity. Therefore, in both cases of variation in overall infection rate and variation in the distribution of infected populations, samples with large variations can be reasonably compared by Pearson correlation and TOS analysis, respectively. We extend the discussion on statistics and revise the manuscript as follows.

      (page 8-9)

      To test this hypothesis, we infected a monolayer of epithelial cells endogenously expressing highly heterogeneous populations of glycoproteins with SARS-CoV-2-PP, and measured viral infection from cell to cell visually by microscope imaging. …

      Pearson correlation is effective for comparing samples with varying scales of data because it normalizes the data values by the mean and variance. However, as observed in our experiments, this may not be the case when the distribution of data within a sample varies between samples. In addition, as has already been reported, the distribution of infected cells often deviates significantly from the normal distribution of data that is the premise of Pearson correlation (Russell et al., 2018) (Figure 2B). To further analyze data in such nonlinear situations, we applied the threshold overlap score (TOS) analysis (Figure 2G-H, Supplementary Figure 2E). This is one statistical method for analyzing nonlinear correlations, and is specialized for colocalization analysis in dual color images (Sheng et al., 2016). TOS analysis involves segmentation of the data based on signal intensity, as in other nonlinear statistics (Reshef et al., 2011). The computed TOS matrix indicates whether the number of objects classified in each region is higher or lower than expected for uniformly distributed data, which reflects co-localization or anti-localization in dual-color imaging data. For example, calculated TOS matrices show strong anti-localization for infection and glycosylation when both signals are high (Figure 2GH). This confirms that high infection is very unlikely to occur in cells that express high levels of glycans. The TOS analysis also yielded better anti-localization scores for some of the individual membrane proteins, especially those that are heterogeneously distributed across cells (Figure 2H). This suggests that TOS analysis can highlight the inhibitory function of molecules that are sparsely expressed among cells, reaffirming that high expression of a single type of glycoprotein can create an infection-protective surface in a single cell and that such infection inhibition is not protein-specific. In contrast, for more uniformly distributed proteins such as the viral receptor ACE2, TOS analysis and Pearson correlation showed similar trends, although the two are mathematically different (Figure 2D, 2H). Because glycoprotein expression levels and virus-derived GFP levels were treated symmetrically in these statistical calculations, the same logic can be applied when considering the heterogeneity of infection levels among cells. Therefore, it is expected that TOS analysis can reasonably compare samples with different virus infection level distributions by focusing on cells with high infection levels in all samples.

      (2) For clarity, the authors should consider separating introductory and interpretive remarks from the presentation of results. These seem to get mixed up. The introduction section could be expanded to include more details about glycoproteins, their relevance to viral infection, and explanations of N- and O-glycosylation.

      Following the suggestion, (1) we added an explanation of the relationship between glycoproteins and viral infection, and N-glycosylation and O-glycosylation to the Introduction section, and (2) moved the introductory parts in the Results section to the Introduction section, as follows.

      (1; page3)

      While there are known examples of glycans that function as viral receptors (Thompson et al., 2019), these results demonstrate that a variety of glycoproteins negatively regulate viral infection in a wide range of systems. These glycoprotein groups have no common amino acid sequences or domains. The glycans modified by these proteins include both the N-type, which binds to asparagine, and the O-type, which binds to serine and threonine. Furthermore, there have been no reports of infection-suppressing effects according to the specific monosaccharide type in the glycan. All of these results suggest that bulky membrane glycoproteins nonspecifically inhibit viral infection.

      (2 : Page 4-5)

      To confirm that glycans are a general chemical factor of steric repulsion, an extensive list of glycoproteins on the cell membrane surface would be useful. The wider the range of proteins to be measured, the better. Therefore, we collect information on glycoproteins on the genome and compile them into a list that is easy to use for various purposes. Then, by analyzing sample molecules selected from this list, it may be possible to infer the effect of the entire glycoprotein population on the steric inhibition of virus infection, despite the complexity and diversity of the Glycome (Dworkin et al., 2022; Huang et al., 2021; Moremen et al., 2012; Rademacher et al., 1988). Elucidation of the mechanism of how glycans regulate steric repulsion will also be useful to quantitatively discuss the relationship between steric repulsion and intracellular molecular composition. For this purpose, we apply the theories of polymer physics and interface chemistry.

      Results

      List of membrane glycoproteins in human genome and their inhibitory effect on virus infection

      To test the hypothesis that glycans contribute to steric repulsion at the cell surface, we first generate a list of glycoproteins in the human genome and then measure the glycan content and inhibitory effect on viral infection of test proteins selected from the list (Figure 1A). To compile the list of glycoproteins, we ….

      (3) In the sentence, "glycoproteins expressed lower than CD44 or other membrane proteins including ERBB2 did not exhibit any such correlation, although ERBB2 expressed ~4 folds higher amount than CD44 and shared ~7% among all membrane proteins," it is unclear which protein has a higher expression level: CD44 or ERBB2? Furthermore, the use of the word "although" needs clarification.

      Corrected as follows:

      (page 8)

      ……showed a weak inverse correlation with viral infection; even such a weak correlation was not observed with other proteins, including ERBB2, which is approximately four-fold more highly expressed than CD44

      (4) In Supplementary Figure 5, please provide an explanation of the data in the figure legend, particularly what the green and red signals represent.

      Corrected as follows:

      STORM images of all analyzed cells, expressing designated proteins. The detected spots of SNAPsurface Alexa 647 bound to each membrane protein are shown in red, and the spots of CF568conjugated anti-mouse IgG secondary antibody that recognizes Spike on SARS-CoV2-PP are shown in green. For cells, a pair of two-color composite images and a CF658-only image are shown. Numbers on axes are coordinates in nanometer.

      (5) It would be good to see a comprehensive demonstration of the exact method for estimation of membrane protein density (in the SI), since this is an integral part of many of the analyses in this paper. The method is detailed in the Methods section in text and is generally acceptable, but this methodology can vary quite widely and would be more convincing with calibration data provided.

      We added flow cytometry and fluorometer data for calibration (Supplementary Figure 1L,M) and introduced a sentence explaining the procedure for obtaining the values used for calibration as follows:

      (page 54)

      …….Liposome standards containing fluorescent molecules (0.01– 0.75 mol% perylene (Sigma), 0.1– 1.25 mol% Bodipy FL (Thermo), and 0.005– 0.1% DiD) as well as DOPC (Avanti polar lipids) were measured in flow cytometry (Supplmentary Figure 1L). Meanwhile, by fluorimeter, fluorescence signals of these liposomes and known concentrations of recombinant mTagBFP2, AcGFP and TagRFP-657 proteins and SNAP-Surface 488 and Alexa 647 dyes (New England Biolabs) were measured in the same excitation and emission ranges as in flow cytometry assays (Supplementary Figure 1M). Ratios between the integral of fluorescent intensities in this range between two dyes of interest are used for converting the signals measured in flow cytometry. Additional information needed for calibration is the size difference between liposomes and cells. The average diameter of liposomes is measured to be 130 nm, and the diameter of HEK 293T cells is estimated to be 13 µm (Furlan et al., 2014; Kaizuka et al., 2021b; Ushiyama et al., 2015). From these data, the signal from cells acquired by flow cytometry can be calibrated to molecular surface density. For example, the Alexa 647 signal acquired by flow cytometry can be converted to the signal of the same concentration of DID dye using fluorometer data, but the density of the dye is unknown at this point. This converted DID signal can then be calibrated to the density on liposomes rather than cells using liposome flow cytometry data. Finally, adjusted for the size difference between liposomes and cells, the surface molecular density on cells is determined. By going through one cycle of these procedures, we could obtain calibration unit, such as 1 flow cytometry signal for a cell in the designated illumination and detection setting = 0.0272 mTagBFP2 µm<sup>-2</sup> on cell surface.

      (Figure legend, Supporting Figure 1: )

      … L. Flow cytometry measurements for liposomes containing serially diluted dye-conjugated lipids and fluorescent membrane incorporating molecules (Bodipy-FL, peryelene, and DID) with indicated mol%. Linear fitting shown was used for calibration.  M. Fluorescence emission spectrum for equimolar molecules (50µM for green and far-red channels, and 100µM for blue channel), excited at 405 nm, 488 nm, and 638 nm, respectively. Membrane dyes were measured as incorporated in liposomes. Purified recombinant mTagBFP2 was used.

      (6) Fig 2A: The figure legend should describe the microscopy method for a quick and easy reference.

      Corrected as follows:

      (Figure legend, Figure 2)

      A. Maximum projection of Z-stack images at 1 µm intervals taken with a confocal microscope. SARSCoV2-pp-infected, air-liquid interface (ALI)-cultured Calu-3 cell monolayers were chemically fixed and imaged by binding of Alexa Fluor 647-labeled Neu5AC-specific lectin from Sambucus sieboldiana (SSA) and GFP expression from the infecting virus.

      (7) Fig 2B: what is the color bar supposed to represent? Is it the pixel density per a particular value? Units and additional description are required. In addition, these are "arbitrary units" of fluorescence, but you should tell us if they've been normalized and, if so, how. They must have been normalized, since the values are between 0 and 1, but then why does the scale bar for SSA only go to 0.5?

      The color bar shows the number of pixels for each dot, resulting in the scale for density scatter plot. The scale on the X-axis was incorrect. All these issues have been fixed in this revision, in the figure and in the legend as follows.

      (Figure legend, Figure 2)

      B. Density scatter plot of normalized fluorescence intensities in all pixels in Figure 2A in both GFP and SSA channels. Color indicates the pixel density.  

      (8) Fig 3D has a typo: this should most likely be "grafted polymer."

      (9) Fig 3E has a suspected typo: in the text, the author uses the word "exclusion" instead of "extrusion." The former makes more sense in this context.

      (10) Fig 5A has a typo: "Suppoorted" instead of Supported Lipid Bilayer.

      (11) Fig 7E-F has a suspected typo: Again, this should most likely be the word "exclusion" instead of "extrusion."

      Thank you so much for pointing out these mistakes, I have corrected them all as suggested.

      (12) Which other molecules are referred to, on page 6 (middle), that do not have an inhibitory effect? Please specify.

      We specified the molecules that have inhibitory effects, and revised as follows: 

      These proteins include those previously reported (MUC1, CD43) as well as those not yet reported (CD44, SDC1, CD164, F174B, CD24, PODXL) (Delaveris et al., 2020; Murakami et al., 2020). In contrast, other molecules (VCAM-1, EPHB1, TMEM123, etc.) showed little inhibitory effect on infection within the density range we used.

      (13) Fig 2 B: the color LUT is not labelled nor explained.

      Corrected as described in (7)

      (14) Please provide the scale bars for figures Fig 2A, C, E and Suppl Fig 2C, D.

      Corrected. 

      (15) Please provide the name for the example of a 200 aa protein that is meant to inhibit viral infection but is not bigger than ACE2. Also providing the densities in Fig 3A would help to correlate the data to Fig 1F.

      Corrected as follows: 

      (page 10)

      We found that a large number of SARS-CoV2-PP can still bind to cells even when cells expressed sufficient amounts of the glycoprotein (mean density ~50 µm<sup>-2</sup>) that could account for the majority of glycans within these cells and inhibit viral infection (Figure 3A). …..

      In our measurements, a protein with extracellular domain of ~200 amino acids (e.g. CD164 (138aa)) at a density of ~100 μm-2 showed significant inhibition in viral infection. This molecule is shorter than the receptor ACE2 (722 aa),

      (16) In the experiments conducted in HeK cells expressing the different glycoproteins studies it is mentioned that results of infection were normalised by the amount ACE2 expression. Is the expression of receptor homogenous in the experiments conducted in Figure 2? Clarify in the methods if the expression of receptor has been quantified and somehow used to correct the intensity values of GFP used to determine infection.

      As also explained for Q1, the system in Fig. 2 contains diverse viral receptors and glycoproteins, and the receptor-based normalization used in the experiment in Fig. 1 cannot be applied. Instead, we applied Pearson correlation and TOS analysis. In the calculation of Pearson correlation, intensities are normalized. TOS analysis allows the analysis of colocalization between the groups with the highest fluorescence intensity. Therefore, in both cases of variation in overall infection rate and variation in the distribution of infected populations, samples with large variations can be reasonably compared by Pearson correlation and TOS analysis, respectively. We extend the discussion on statistics and revise the manuscript as follows.

      (page 8-9)

      Pearson correlation is effective for comparing samples with varying scales of data because it normalizes the data values by the mean and variance. However, as observed in our experiments, this may not be the case when the distribution of data within a sample varies between samples. In addition, as has already been reported, the distribution of infected cells often deviates significantly from the normal distribution of data that is the premise of Pearson correlation (Russell et al., 2018) (Figure 2B). To further analyze data in such nonlinear situations, we applied the threshold overlap score (TOS) analysis (Figure 2G-H, Supplementary Figure 2E). This is one statistical method for analyzing nonlinear correlations, and is specialized for colocalization analysis in dual color images (Sheng et al., 2016). TOS analysis involves segmentation of the data based on signal intensity, as in other nonlinear statistics (Reshef et al., 2011). The computed TOS matrix indicates whether the number of objects classified in each region is higher or lower than expected for uniformly distributed data, which reflects co-localization or anti-localization in dual-color imaging data. For example, calculated TOS matrices show strong anti-localization for infection and glycosylation when both signals are high (Figure 2GH). This confirms that high infection is very unlikely to occur in cells that express high levels of glycans. The TOS analysis also yielded better anti-localization scores for some of the individual membrane proteins, especially those that are heterogeneously distributed across cells (Figure 2H). This suggests that TOS analysis can highlight the inhibitory function of molecules that are sparsely expressed among cells, reaffirming that high expression of a single type of glycoprotein can create an infection-protective surface in a single cell and that such infection inhibition is not protein-specific. In contrast, for more uniformly distributed proteins such as the viral receptor ACE2, TOS analysis and Pearson correlation showed similar trends, although the two are mathematically different (Figure 2D, 2H). Because glycoprotein expression levels and virus-derived GFP levels were treated symmetrically in these statistical calculations, the same logic can be applied when considering the heterogeneity of infection levels among cells. Therefore, it is expected that TOS analysis can reasonably compare samples with different virus infection level distributions by focusing on cells with high infection levels in all samples.

      (17) Can you provide additional details about the method of thresholding to eliminate "background" localisations in STORM?

      Method section was corrected as follows: 

      (page 59)

      …Viral protein spots not close to cell membranes were eliminated by thresholding with nearby spot density for cell protein. Specifically, the entire image was pixelated with a 0.5µm square box and all viral protein signals within the box that had no membrane protein signals were removed. Also, viral protein spots only sparsely located were eliminated by thresholding with nearby spot density for viral protein. This thresholding process removed any detected viral protein spot that did not have more than 100 other viral protein spots within 1µm.

      (18) The article says "It was shown that the number of bound lectins correlated with the amount of glycans, not with number of proteins (Figure 1E)". Figure 1E correlates experimental PNA/mol with predicted glycosylation sites, not with the number of expressed proteins. Correct sentence with the right Figure reference.

      As you pointed out, the meaning of this sentence was not clear. We have amended it as follows to clarify our intention:

      (page 8)

      Since a wide range of glycoproteins inhibit viral infection, it is possible that all types of glycoproteins have an additive effect for this function. ……. In this cell line, this inverse correlation was most pronounced when quantifying N-acetylneuraminic acid (Neu5AC, recognized by lectins SSA and MAL) compared to the various types of glycans, while some other glycans also showed weak correlations (Supplementary Figure 2C). These results showed that the amount of virus infection in cell anticorrelated with the amount of total glycans on the cell surface. As amount of glycans is determined by the total population of glycocalyx, infection inhibitory effect can be additive by glycoprotein populations as we hypothesized.

      If the inhibitory effect is nonspecific and additive, the contribution of each protein is likely to be less significant. To confirm this, we also measured the correlation between the density of each glycoprotein and viral infection. CD44, which was shown to…….. Our results demonstrate that total glycan content is a superior indicator than individual glycoprotein expression for assessing infection inhibition effect generated by cell membrane glycocalyx. These results are consistent with our hypothesis regarding the additive nature of the nonspecific inhibitory effects of each glycoprotein.

    1. eLife Assessment

      Endothelial cell-specific loss of TGF-beta signaling in mice leads to CNS vascular defects, specifically impairing retinal development and promoting immune cell infiltration. The data are solid, showing that loss of TGF-beta signaling triggers vascular inflammation and attracts immune cells specific to CNS vasculature. These findings are important, highlighting TGF-beta's role in maintaining vascular-immune homeostasis and its therapeutic potential in neurovascular inflammatory diseases.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript analyses the effects of deleting the TgfbR1 and TgfbR2 receptors from endothelial cells at postnatal stages on vascular development and blood-retina barrier maturation in the retina. The authors find that deletion of these receptors affects vascular development in the retina but importantly it affects the infiltration of immune cells across the vessels in the retina. The findings demonstrate that Tgf-beta signaling through TgfbR1/R2 heterodimers regulates primarily the immune phenotypes of endothelial cells in addition to regulating vascular development, but has minor effects on the BRB maturation. The data provided by the authors provides a solid support for their conclusions.

      Strengths:

      (1) The manuscript uses a variety of elegant genetic studies in mice to analyze the role of TgfbR1 and TgfbR2 receptors in endothelial cells at postnatal stages of vascular development and blood-retina barrier maturation in the retina.

      (2) The authors provide a nice comparison of the vascular phenotypes in endothelial-specific knockout of TgfbR1 and TgfbR2 in the retina (and to a lesser degree in the brain) with those from Npd KO mice (loss of Ndp/Fzd4 signaling) or loss of VEGF-A signaling to dissect the specific roles of Tgf-beta signaling for vascular development in the retina.

      (3) The snRNAseq data of vessel segments from the brains of WT versus TgfbR1 -iECKO mice provides a nice analysis of pathways and transcripts that are regulated by Tgf-beta signaling in endothelial cells.

      Weaknesses (Original Submission):

      (1) The authors claim that choroidal neovascular tuft phenotypes are similar in TgfbrR1 KO and TgfbrR2 KO mice. However, the phenotypes look more severe in the TgfbrR1 KO rather than TgfbrR2 KO mice. Can the authors show a quantitative comparison of the number of choroidal neovascular tufts per whole eye cross-section in both genotypes?

      (2) In the analysis of Sulfo-NHS-Biotin leakage in the retina to assess blood-retina barrier maturation, the authors claim that there is increased vascular leakage in the TgfbR1 KO mice. However, there does not seem like Sulfo-NHS-biotin is leaking outside the vessels. Therefore, it cannot be increased vascular permeability. Can the authors provide a detailed quantification of the leakage phenotype?

      (3) The immune cell phenotyping by snRNAseq seems premature as the number of cells is very small. The authors should sort for CD45+ cells and perform single cell RNA sequencing.

      (4) The analysis of BBB leakage phenotype in TgfbR1 KO mice needs to be more detailed and include some tracers in addition to serum IgG leakage.

      (5) A previous study (Zarkada et al., 2021, Developmental Cell) showed that EC-deletion of Alk5 affects the D tip cells. The phenotypes of those mice look very similar to those shown for TgfbrR1 KO mice. Are D tip cells lost in these mutants by snRNAseq?

      Comments on revisions:

      The authors have addressed the major weaknesses that I raised with the original submission adequately in the revised manuscript.

    3. Reviewer #2 (Public review):

      Summary:

      The authors meticulously characterized EC-specific Tgfbr1, Tgfbr2, or double knockout in the retina, demonstrating through convincing immunostaining data that loss of TGF-β signaling disrupts retinal angiogenesis and choroidal neovascularization. Compared to other genetic models (Fzd4 KO, Ndp KO, VEGF KO), the Tgfbr1/2 KO retina exhibits the most severe immune cell infiltration. The authors proposed that TGF-β signaling loss triggers vascular inflammation, attracting immune cells - a phenotype specific to CNS vasculature, as non-CNS organs remain unaffected.

      Strengths:

      The immunostaining results presented are clear and robust. The authors performed well-controlled analyses against relevant mouse models. snRNA-seq corroborates immune cell leakage in the retina and vascular inflammation in the brain.

      Comments on revisions:

      The authors have revised the manuscript and addressed all my questions.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Weaknesses: 

      (1) The authors claim that choroidal neovascular tuft phenotypes are similar in TgfbrR1 KO and TgfbrR2 KO mice. However, the phenotypes look more severe in the TgfbrR1 KO rather than TgfbrR2 KO mice. Can the authors show a quantitative comparison of the number of choroidal neovascular tufts per whole eye cross-section in both genotypes? 

      Thank you for asking about this.  Each VE-cad-CreER;TGFBR1 CKO/- and VE-cad-CreER;TGFBR2 CKO/- retina exhibits multiple zones of choroidal neovascularization.  The examples in Figures 1 and Figure 1 – Figure supplements 1 and 2 are mostly from retinas with loss of TGFBR1, but we could have chosen similar examples from retinas with loss of TGFBR2.  The quantification in the original version of Figure 1- Figure supplement 1 panel C had a labeling error.  It actually showed the quantification choroidal neovascularization (CNV) in the sum of both VE-cad-CreER;TGFBR1 CKO/- and VE-cad-CreER;TGFBR2 CKO/- retinas, not only in VE-cad-CreER;TGFBR1 CKO/- retinas as originally labeled.  The point that it made is that CNV is seen with loss of TGF-beta signaling but not in control retinas or retinas with loss of Norrin signaling.  We have now updated that plot by separating the data points for VE-cad-CreER;TGFBR1 CKO/- and VE-cad-CreER;TGFBR2 CKO/- retinas, so that they can be compared to each other.   The result shows ~2.5-fold more CNV in VE-cad-CreER;TGFBR2 CKO/- retinas compared to VE-cad-CreER;TGFBR1 CKO/-.  We think it likely that a more extensive sampling would show little or no difference between these two genotypes – but the data is what it is. This is now described in the Results section. 

      We have also added a panel D to Figure 1- Figure supplement 1, which shows a retina flatmount analysis of CNV.  This is done by mounting the retina with the photoreceptor side up so that the outer retina can be optimally imaged. 

      (2) In the analysis of Sulfo-NHS-Biotin leakage in the retina to assess blood-retina barrier maturation. The authors claim that there is increased vascular leakage in the TgfbR1 KO mice. However, it does not seem like Sulfo-NHS-biotin is leaking outside the vessels. Therefore, it cannot be increased vascular permeability. Can the authors provide a detailed quantification of the leakage phenotype? 

      Thank you for raising this point.  Your comment prompted us to look at this question in greater depth with more experiments.  We have expanded Figure 2 to show and quantify a comparison between control (i.e. phenotypically WT), NdpKO, and TGFBR1 endothelial KO and we have expanded the associated part of the Results section (Figure 2C and D).  In a nutshell, control retinas show little Sulfo-NHS-biotin accumulation in or around the vasculature or in the parenchyma; NdpKO retinas show Sulfo-NHS-biotin accumulation in the vasculature and in the parenchyma (i.e., the area between the vessels); and VEcadCreER;Tgfbr1CKO/- retinas show Sulfo-NHS-biotin accumulation in the vascular tufts with minimal accumulation in the non-tuft vasculature and minimal leakage into the parenchyma.   The conclusion is that the bulk of the retinal vasculature in TGFBR1 endothelial KO mice is minimally or not at all leaky – very different from the situation with loss of Norrin/Frizzled4 signaling.

      (3) The immune cell phenotyping by snRNAseq is premature, as the number of cells is very small. The authors should sort for CD45+ cells and perform single-cell RNA sequencing. 

      Thank you for raising this point.  For the revised manuscript, we have performed additional snRNAseq analyses using the same tissue processing protocol as for our original snRNAseq data.  We have opted to homogenize the tissue and prepare nuclei (our original method) rather than dissociate the tissue and FACS sorting for CD45+ cells because the nuclear isolation approach is unbiased – we assume that nuclei from all cell types are present after tissue homogenization.  By contrast, we cannot be certain that CD45 FACS will capture the full range of immune cells since some cells may not express CD45, may express CD45 at low level, or may be tightly adherent to other cells, such as vascular endothelial cell.  Additionally, by following the original protocol, we can combine the original snRNAseq dataset and the new snRNAseq dataset.  In the revised manuscript we present the snRNAseq data from the combination of the original and the more recent snRNAseq datasets (revised Figure 4; N=628 immune cell nuclei).  The new analysis comes to the same conclusions as the original analysis: the immune cell infiltrate in the mutant retinas is composed of a wide variety of immune cells.

      (4) The analysis of BBB leakage phenotype in TgfbR1 KO mice needs to be more detailed and include tracers as well as serum IgG leakage. 

      As described in our response to query 2, we have conducted additional experiments to look at vascular leakage in control, VE-cad-CreER;TGFBR1 CKO/-, and NdpKO retinas.  We have also looked at Sulfo-NHS-biotin leakage in the VE-cadCreER;TGFBR1 CKO/- brain, and it is indistinguishable from WT controls.  Since Sulfo-NHS-biotin is a low MW tracer (<1,000 kDa), this implies that loss of TGF-beta signaling does not increase non-specific diffusion of either low or high MW molecules.  Therefore, the elevated levels of IgG in the brain parenchyma in young VE-cad-CreER;TGFBR1 CKO/- mice (Figure 8A) likely represents specific transport of IgG across the BBB.  Such transport is known to occur via Fc receptors expressed on vascular endothelial cells, although it is normally greater in the brain-to-blood direction than in the blood-to-brain direction.  For example, see Lafrance-Vanasse et al (2025) Leveraging neonatal Fc receptor (FcRn) to enhance antibody transport across the blood brain barrier.  Nat Commun. 16:4143.  This is now described in greater detail in the Results section.

      (5) A previous study (Zarkada et al., 2021, Developmental Cell) showed that EC-deletion of Alk5 affects the D tip cells. The phenotypes of those mice look very similar to those shown for TgfbrR1 KO mice. Are D-tip cells lost in these mutants by snRNAseq? 

      Please note: Alk5 is another name for TGFBR1.  This is noted in the second sentence of paragraph 4 of the Introduction.  The reviewer is correct: there are a lot of similarities because these are exactly the same KO mice.  Also, Zarkada and we used the same VEcadCreER to recombine the CKO allele.  The proposed snRNAseq analysis would serve as an independent check on the diving (D) tip vs stalk cell analyses published in Zarkada et al (2021) Specialized endothelial tip cells guide neuroretina vascularization and blood-retina-barrier formation. Dev Cell 56:2237-2251.  We have not gone in this direction because the question of tip vs. stalk cells and of subtypes of tip cells in WT vs. mutant retinas is beyond our focus on choroidal neovascularization and the role of immune cells and vascular inflammation.  The proposed snRNAseq analysis would also require a major effort since tip cells are rare and must be harvested from large numbers of early postnatal retinas followed by FACS enrichment for vascular endothelial cells.  Finally, we have no reason to doubt the results of Zarkada et al.

      Reviewer #2 (Public review): 

      Summary:

      The authors meticulously characterized EC-specific Tgfbr1, Tgfbr2, or double knockout in the retina, demonstrating through convincing immunostaining data that loss of TGF-β signaling disrupts retinal angiogenesis and choroidal neovascularization. Compared to other genetic models (Fzd4 KO, Ndp KO, VEGF KO), the Tgfbr1/2 KO retina exhibits the most severe immune cell infiltration. The authors proposed that TGF-β signaling loss triggers vascular inflammation, attracting immune cells - a phenotype specific to CNS vasculature, as non-CNS organs remain unaffected. 

      Strengths: 

      The immunostaining results presented are clear and robust. The authors performed well-controlled analyses against relevant mouse models. snRNA-seq corroborates immune cell leakage in the retina and vascular inflammation in the brain. 

      Weaknesses: 

      The causal link between TGF-β loss, vascular inflammation, and immune infiltration remains unresolved. The authors' model posits that EC-specific TGF-β loss directly causes inflammation, which recruits immune cells. However, an alternative explanation is plausible: Tgfbr1/2 KO-induced developmental defects (e.g., leaky vessels) permit immune extravasation, subsequently triggering inflammation. The observations that vein-specific upregulation of ICAM1 staining and the lack of immune infiltration phenotypes in the non-CNS tissues support the alternative model. Late-stage induction of Tgfbr1/2 KO (avoiding developmental confounders) could clarify TGF-β's role in retinal angiogenesis versus anti-inflammation. 

      Thank you for raising this point.  Your comment prompted us to look at this question in greater depth with more experiments.  We have expanded Figure 2 to show and quantify a comparison between control (i.e. phenotypically WT), NdpKO, and TGFBR1 endothelial KO and we have expanded the associated part of the Results section (Figure 2C and D).  In a nutshell, control retinas show little Sulfo-NHS-biotin accumulation in or around the vasculature or in the parenchyma; NdpKO retinas show Sulfo-NHS-biotin accumulation in the vasculature and in the parenchyma (i.e., the area between the vessels); and VEcadCreER;Tgfbr1CKO/- retinas show Sulfo-NHS-biotin accumulation in the vascular tufts with minimal accumulation in the non-tuft vasculature and minimal leakage into the parenchyma.   The conclusion is that the bulk of the retinal vasculature in TGFBR1 endothelial KO mice is minimally or not at all leaky – very different from the situation with loss of Norrin/Frizzled4 signaling.

      In the revised manuscript, we have expanded the Discussion section to address the two alternative hypotheses raised by the reviewer.  Here are the relevant data in a nutshell: (1) vascular leakage into the parenchyma, as measured with sulfo-NHSbiotin, in TGFBR1 endothelial CKO retinas is far less than in NdpKO retinas, where nearly all ECs convert to a fenestration+ (PLVAP+) phenotype and there is leakage of sulfo-NHS-biotin, (2) ICAM1 in ECs in TGFBR1 endothelial CKO retinas increases several-fold more than in NdpKO or Frizzled4KO retinas, (3) TGFBR1 endothelial CKO retinas have more infiltrating immune cells than NdpKO or Frizzled4KO retinas, and (4) in TGFBR1 endothelial CKO retinas large numbers of immune cells are observed within and adjacent to blood vessels.  We think that the simplest explanation for these data is that loss of TGFbeta signaling in ECs causes an endothelial inflammatory state with enhanced immune cell extravasation.  That said, the case for this model is not water-tight, and there could be less direct mechanisms at play.  In particular, this model does not explain why the inflammatory phenotype is limited to CNS (and especially retinal) vasculature.

      Regarding the last sentence of the reviewer’s comment (“Late stage induction…”), we have tried activating CreER recombination at different ages and we observe a large reduction in the inflammatory phenotype when recombination is initiated after vascular development is complete.   This observation suggests that the vascular developmental/anatomic defect – and perhaps the resulting retinal hypoxia response – is required for the inflammatory phenotype.  In the revised manuscript we have expanded the Results and Discussion sections to describe this observation.

      Reviewer #1 (Recommendations for the authors): 

      Suggestions for experiments: 

      (1) The authors need to show a quantitative comparison of the number of choroidal neovascular tufts per whole eye crosssection in both genotypes (TgfbR1 and TgfbR2 KO mice). 

      Thank you for raising this point.  The quantification in the original version of Figure 1- Figure supplement 1 panel C was mis-labeled.  It quantifies choroidal neovascularization (CNV) in both VE-cad-CreER;TGFBR1 CKO/- and VE-cadCreER;TGFBR2 CKO/- retinas, not VE-cad-CreER;TGFBR1 CKO/- retinas only as originally labeled.  The point it makes is that CNV is seen with loss of TGF-beta signaling but not in control retinas or retinas with loss of Norrin signaling.  We have now corrected that plot by separating the data points for VE-cad-CreER;TGFBR1 CKO/- and VE-cad-CreER;TGFBR2 CKO/- retinas, so that they can be compared to each other.   The result shows ~2.5-fold more CNV in VE-cad-CreER;TGFBR2 CKO/- retinas compared to VE-cad-CreER;TGFBR1 CKO/-.  This is now described in the Results section. 

      (2) In the analysis of Sulfo-NHS-Biotin leakage in the retina to assess blood-retina barrier maturation. The authors should provide a detailed quantification of the leakage phenotype outside the vessels into the CNS parenchyma, both in the retina and brain, in TgfbR1 KO mice. 

      Thank you for raising this point.  There is no detectable Sulfo-NHS-biotin leakage into the brain parenchyma in VE-cadCreER;TGFBR1 CKO/- mice.  We have expanded Figure 2 to show and quantify the data for retinal vascular leakage (Figure 2C and D).  The data show that in VE-cad-CreER;TGFBR1 CKO/- mice there is accumulation of Sulfo-NHS-biotin in the vascular tufts but minimal accumulation elsewhere in the retinal vasculature and minimal leakage of Sulfo-NHS-biotin into the retinal parenchyma.

      (3) The immune cell phenotyping by snRNAseq is premature, as the number of cells is very small. The authors should sort for CD45+ cells and perform single-cell RNA sequencing to ascertain these preliminary data. 

      Thank you for raising this point.  We have performed additional snRNAseq analyses using the same tissue processing protocol as for our original snRNAseq data to increase the numbers of cells.  We have opted to homogenize the tissue and prepare nuclei (our original method) rather than dissociating the cells and FACS sorting for CD45+ cells because the nuclear isolation approach is unbiased – we assume that nuclei from all cell types are present.  By contrast, we cannot be certain that CD45 FACS will capture the full range of immune cells, since some cells may not express CD45, may express CD45 at low level, or may be tightly adherent to other cells, such as vascular endothelial cell.  Additionally, by following the original protocol, we can combine the original snRNAseq dataset of and the new snRNAseq dataset.  In the revised manuscript we present the snRNAseq data from the combination of the original and the more recent snRNAseq datasets (revised Figure 4; N=628 immune cell nuclei).  The new analysis comes to the same conclusion as in the original submission, namely that the immune cell infiltrate in the mutant retinas is composed of a wide variety of immune cells.  The Results section has been expanded to describe this new data and analysis.    

      (4) The analysis of BBB leakage phenotype in TgfbR1 KO mice needs to be more detailed and include tracers as well as serum IgG leakage. 

      Sulfo-NHS biotin leakage in the VE-cad-CreER;TGFBR1 CKO/- brain is minimal, and it is indistinguishable from WT controls.  Since Sulfo-NHS biotin is a low MW tracer (<1,000 kDa), this implies that loss of TGF-beta signaling does not increase non-specific diffusion of either low or high MW molecules.  Therefore, the elevated levels of IgG in the brain parenchyma in young VE-cad-CreER;TGFBR1 CKO/- mice (Figure 8A) likely represents specific transport of IgG across the BBB.  Such transport is known to occur via Fc receptors expressed on vascular endothelial cells, although it is normally greater in the brain-to-blood direction than in the blood-to-brain direction.  For example, see Lafrance-Vanasse et al (2025) Leveraging neonatal Fc receptor (FcRn) to enhance antibody transport across the blood brain barrier.  Nat Commun. 16:4143.  This is now described in greater detail in the Results section.

      (5) The authors should perform a more detailed RNAseq analysis of tip and stack (stalk) cells in TgfbrR1 KO mice to determine whether D tip cells are lost in these mutants by snRNAseq. 

      The proposed snRNAseq analysis would serve as an independent check on the diving (D) tip vs stalk cell analyses published by Zarkada et al, who analyzed the same VE-cad-CreER;TGFBR1 CKO/- mutant mice, although they refer to the TGFBR1 gene by its alternate name ALK5 [Zarkada et al (2021) Specialized endothelial tip cells guide neuroretina vascularization and blood-retina-barrier formation. Dev Cell 56:2237-2251].  We have not gone in this direction because the question of tip vs. stalk cells and of subtypes of tip cells in WT vs. mutant retinas is beyond our focus on choroidal neovascularization and the role of immune cells and vascular inflammation.  The proposed snRNAseq analysis would also require a major effort since tip cells are rare and must be harvested from large numbers of early postnatal retinas followed by FACS enrichment for vascular endothelial cells.

      Suggestions for improving the manuscript:  

      (6) The statement that ECs acquire properties of immune cells (Page 2, Line 90) is incorrect. Endothelial cells may acquire characteristics of antigen presenting cells. 

      Thank you for that correction.  Based on the review from Amersfoort et al (2022) (Amersfoort J, Eelen G, Carmeliet P. (2022) Immunomodulation by endothelial cells - partnering up with the immune system? Nat Rev Immunol 22:576-588) and the articles cited in it, we have changed the sentence to “Although vascular endothelial cells (ECs) are not generally considered to be part of the immune system, in some locations and under some conditions they acquire properties characteristic of immune cells, including secretion of cytokines, surface display of co-stimulatory or co-inhibitory receptors, and antigen presentation in association with MHC class II proteins (Pober and Sessa, 2014; Amersfoort et al., 2022).”  

      (7) The statement in Page 3, Line 100-101 [In CNS ECs, quiescence is maintained in part by the actions of astrocyte-derived Sonic Hedgehog, with the result that few immune cells other than resident microglia are found within the CNS (Alvarez et al., 2011).] is incomplete. Wnt signaling also suppresses the expression of leukocyte adhesion molecules from endothelial cells and therefore helps with immune cell quiescence. 

      Thank you for raising that point.  We have expanded that sentence to include Wnt signaling in CNS endothelial cells, as described in the following reference: Lengfeld JE, Lutz SE, Smith JR, Diaconu C, Scott C, Kofman SB, Choi C, Walsh CM, Raine CS, Agalliu I, Agalliu D. (2017) Endothelial Wnt/beta-catenin signaling reduces immune cell infiltration in multiple sclerosis. Proc Natl Acad Sci USA 114:E1168-E1177.

      (8) It may be beneficial for the reader to separate the results of the vascular phenotypes related to choroidal neovascularization compared to retinal vascular development. 

      Thank you for this suggestion.  The two topics are partly overlapping: choroidal neovascularization is described in Figure 1, and retinal development is described in Figures 1 and 2.  The challenge is that some of same images illustrate both phenotypes as in Figure 1, so the topics cannot be easily separated.

      (9) In addition to comparing the phenotypes in Tgfb signaling mutant mice with Wnt signaling and VEGF-A signaling mutants, the authors should compare and contrast their data with those found in Alk5 KO mice, as there are a lot of similarities. 

      The reviewer has alerted us to a nomenclature challenge which we will try to resolve in the introduction: Alk5 is just another name for TGFBR1.  The reviewer is correct: there are a lot of similarities between the present study and that of Zarkada et al (2021) because both use the same TGFBR1(=Alk5) CKO mice.

      Reviewer #2 (Recommendations for the authors): 

      Figure 2 

      For 2B, the authors should clarify whether the two regions shown in the Tgfbr1 KO retina (P14) represent central vs. peripheral areas, as phenotype severity varies. 

      For 2C, does the uneven biotin accumulation reflect developmental gradients (e.g., central-peripheral maturation timing)? 

      Thank you for raising these points.  Regarding Figure 2B, these images are all from the mid-peripheral retina, where the phenotype is moderately severe.  This is now noted in the figure legend.

      Regarding Figure 2C, the reviewer is correct that the pattern of Sulfo-NHS-biotin is uneven in VEcadCreER;Tgfbr1CKO/- retinas – it accumulates only in the tufts.  We have expanded Figure 2C to show a comparison between control (i.e.

      phenotypically WT), NdpKO, and TGFBR1 endothelial KO retinas, and we have expanded the associated part of the Results section.  In a nutshell, control retinas show little Sulfo-NHS-biotin accumulation in the vasculature or in the parenchyma; NdpKO retinas show Sulfo-NHS-biotin accumulation in the vasculature and in the parenchyma (i.e., the area between the vessels); and VEcadCreER;Tgfbr1CKO/- retinas show Sulfo-NHS-biotin accumulation in the vascular tufts with minimal accumulation in the non-tuft vasculature and minimal leakage into the parenchyma.   The conclusion is that the bulk of the retinal vasculature in TGFBR1 endothelial KO mice is not leaky – very different from the situation with loss of Norrin/Frizzled4 signaling.

      Figure 6 

      The claim that PECAM1+ rings on veins reflect EC-immune cell binding is uncertain, as PECAM1 is also known to be expressed by immune cells. The complete correlation of PECAM1 and CD45 staining signals suggests that a subset of immune cells upregulates PECAM1. The VEcadCreER;Tgfbr1 flox/-; SUN1:GFP reporter would be helpful to delineate ECimmune cell proximity. Super-resolution imaging with Z-stacks could also resolve spatial relationships (luminal vs. abluminal immune cell adhesion). 

      Thank you for this comment.  The reviewer is correct that, at the resolution of these images, we cannot determine whether the PECAM1 immunostaining signal is derived from ECs, from leukocytes, or from both.  This is now stated in the Results section.  The PECAM1-rich endothelial ring structure associated with leukocyte extravasation has been characterized in various publications, for example in (1) Carman CV, Springer TA. (2004) A transmigratory cup in leukocyte diapedesis both through individual vascular endothelial cells and between them. J Cell Biol 167:377-388 and (2) Mamdouh Z, Mikhailov A, Muller WA. (2009) Transcellular migration of leukocytes is mediated by the endothelial lateral border recycling compartment. J Exp Med 206:2795-2808.  The ring structures visualized in Figure 6D by PECAM1 immunostaining conform to the ring structures described in these and other papers.  In showing these structures, our point is simply that they likely represent sites of leukocyte extravasation.  This is now clarified in the text.  We have also added some additional references on leukocyte extravasation and the ring structures.

      Figure 7 

      A time-course analysis of ICAM1 would strengthen the mechanistic model. Does ICAM1 upregulation precede immune infiltration (supporting inflammation as the primary defect)? Given that immune cells appear by P14 (per snRNA-seq), is ICAM1 elevated earlier? 

      This is an interesting idea, but based on what is known about leukocyte adhesion and extravasation we predict that there will not be a clean temporal separation between ICAM1 induction and leukocyte adhesion/infiltration.  That is, if the proinflammatory state causes an increase in the number of leukocytes, then as ICAM1 levels increase, leukocyte adhesion would also increase.  Similarly, if the presence of leukocytes increases the pro-inflammatory state, then as the number of leukocytes increases, the levels of ICAM1 would be predicted to increase.  Thus, we think that a time course analysis is unlikely to provide a definitive conclusion.

      Figure 8-SF1 

      In brain slices, a transient pan-IgG accumulation suggests a self-resolving defect in the BBB. However, this BBB impairment appears to be spatiotemporally distinct from ICAM1 upregulation. ICAM1 staining is restricted to the lesion site, aligning with immune cell-driven inflammation. 

      Thank you for raising these points.  The reviewer is correct that these observations don’t fit together in a clear way.  There does not appear to be a general increase in brain vascular permeability in VE-cad-CreER;TGFBR1 CKO/- mice, as shown by sulfo-NHS-biotin.  However, there is a large and transient increase in IgG in the brain parenchyma, suggestive of a general vascular alteration, and – as the reviewer correctly notes – it is not accompanied by a generalized increase in ICAM1 vascular immunostaining.  At this point, we don’t have any real insight into the mechanistic basis of the transient IgG increase.

      Thank you for handling this manuscript.

    1. eLife Assessment

      This cleverly designed and potentially important work supports our understanding regarding how and whether social behaviours promoting egalitarianism can be learned, even when implementing these norms entails a cost for oneself. However, the evidence supporting the major claims is currently incomplete, with the major limitation being whether Ps truly learn egalitarianism from a teacher or instead exhibit reduced guilt across time that is reduced when observing others behaving more selfishly. With a strengthening of the supporting evidence, this work will be of interest to a wide range of fields, including cognitive psychology/neuroscience, neuroeconomics, and social psychology, as well as policy making.

    2. Reviewer #1 (Public review):

      Summary:

      Zhang et al. addressed the question of whether advantageous and disadvantageous inequality aversion can be vicariously learned and generalized. Using an adapted version of the ultimatum game (UG), in three phases, participants first gave their own preference (baseline phase), then interacted with a "teacher" to learn their preference (learning phase), and finally were tested again on their own (transfer phase). The key measure is whether participants exhibited similar choice preference (i.e., rejection rate and fairness rating) influenced by the learning phase, by contrasting their transfer phase and baseline phase. Through a series of statistical modeling and computational modeling, the authors reported that both advantageous and disadvantageous inequality aversion can indeed be learned (Study 1), and even be generalised (Study 2).

      Strengths:

      This study is very interesting, that directly adapted the lab's previous work on the observational learning effect on disadvantageous inequality aversion, to test both advantageous and disadvantageous inequality aversion in the current study. Social transmission of action, emotion, and attitude have started to be looked at recently, hence this research is timely. The use of computational modeling is mostly appropriate and motivated. Study 2 that examined the vicarious inequality aversion on conditions where feedback was never provided is interesting and important to strengthen the reported effects. Both studies have proper justifications to determine the sample size.

      Weaknesses:

      Despite the strengths, a few conceptual aspects and analytical decisions have to be explained, justified, or clarified.

      INTRODUCTION/CONCEPTUALIZATION

      (1) Two terms seem to be interchangeable, which should not, in this work: vicarious/observational learning vs preference learning. For vicarious learning, individuals observe others' actions (and optionally also the corresponding consequence resulted directly by their own actions), whereas, for preference learning, individuals predict, or act on behalf of, the others' actions, and then receive feedback if that prediction is correct or not. For the current work, it seems that the experiment is more about preference learning and prediction, and less so about vicarious learning. But the intro and set are heavily around vicarious learning, and late the use of vicarious learning and preference learning is rather mixed in the text. I think either tone down the focus on vicarious learning, or discuss how they are different. Some of the references here may be helpful: Charpentier et al., Neuron, 2020; Olsson et al., Nature Reviews Neuroscience, 2020; Zhang & Glascher, Science Advances, 2020

      EXPERIMENTAL DESIGN

      (2) For each offer type, the experiment "added a uniformly distributed noise in the range of (-10 ,10)". I wonder how this looks like? With only integers such as 25:75, or even with decimal points? More importantly, is it possible to have either 70:30 or 90:10 option, after adding the noise, to have generated an 80:20 split shown to the participants? If so, for the analyses later, when participants saw the 80:20 split, which condition did this trial belong to? 70:30 or 90:10? And is such noise added only to the learning phase, or also to the baseline/transfer phases? This requires some clarification.

      (3) For the offer conditions (90:10, 70:30, 50:50, 30:70, 10:90) - are they randomized? If so, how is it done? Is it randomized within each participants, and/or also across participants (such that each participant experienced different trial sequences)? This is important, as the order especially for the leanring phase can largely impact on the preference learning of the participants.

      STATISTICAL ANALYSIS & COMPUTATIONAL MODELING

      (4) In Study 1 DI offer types (90:10, 70:30), the rejection rate for DI-AI averse looks consistently higher than that for DI averse (ie, blue line is above the yellow line). Is this significant? If so, how come? Since this is a between-subject design, I would not anticipate such a result (especially for the baseline). Also, for the LME results (eg, Table S3), only interactions were reported but not the main results.

      (5) I do not particularly find this analysis appealing: "we examined whether participants' changes in rejection rates between Transfer and Baseline, could be explained by the degree to which they vicariously learned, defined as the change in punishment rates between the first and last 5 trials of the Learning phase." Naturally, participants' behavior in the first 5 trials in the learning phase will be similar to those in the baseline; and their behavior in the last 5 trials in the learning phase would echo those at the transfer phase. I think it would be stronger to link the preference learning results to the chance between baseline and transfer phase, eg, by looking at the difference between alpha (beta) at the end of the learning phase and the initial alpha (beta).

      (6) I wonder if data from the baseline and transfer phases can also be modeled, using a simple Fehr-Schimdt model? This way, the change in alpha/beta can also be examined between the baseline and transfer phase.

      (7) I quite liked Study 2 that tests the generalization effect, and I expected to see an adapted computational modeling to directly reflect this idea. Indeed, the authors wrote "[...] given that this model [...] assumes the sort of generalization of preferences between offer types [...]". But where exactly did the preference learning model assumed the generalization? In the methods, the modeling seems to be only about Study 1; did the authors advise their model to accommodate Study 2? The authors also ran simulation for the learning phase in Study 2 (Figure 6), and how did the preference updated (if at all) for offers (90:10 and 10:90) where feedback was not given? Extending/Unpacking the computational modeling results for Study2 will be very helpful for the paper.

      Comments on revisions:

      I kept my original public review, so that future readers can see the progress and development of the manuscript.

      The authors have largely addressed my original questions/concerns, and I have two outstanding comments.

      (a) Related to my original comment #6, where I suggested to apply the F-S model also to the baseline and transfer phase. The authors were inclined not to do it, but in fact later in comment #7 and in the manuscript they opted to use a more complex F-S-based model to their learning phase. I agree that the rejection rate is indeed a clear indication, but for completeness, it'd be more consistent and compelling if the paper follows a model-free (model-agnostic) and model-based approach in all phases of the experiment.

      (b) Related to my original comment #4, I appreciate that the authors have provided more details of their LMM models. But I don't think it is accurate regardless. First, all offer levels (50:50, 30:70, 10:90), should not be coded as pure categorical levels. In fact, they have an ordinal meaning, a single ordinal predictor with three levels should be used. This also avoids the excessive number of interactions the authors have pointed out.

      Second, running a model with only interactions without main effects is flawed. All textbooks on stats emphasize that without the presence of the main effects, the interpretation of interaction only is biased.

      So these LMMs needs to be revised before the manuscript eventually gets to a version of record.

    3. Reviewer #2 (Public review):

      Summary:

      This study investigates whether individuals can learn to adopt egalitarian norms that incur a personal monetary cost, such as rejecting offers that benefit them more than the giver (advantageous inequitable offers). While these behaviors are uncommon, two experiments aim to demonstrate that individuals can learn to reject such offers by observing a "teacher" who follows these norms. The authors use computational modelling to argue that learners adopt these norms through a sophisticated process, inferring the latent structure of the teacher's preferences, akin to theory of mind.

      Strengths:

      This paper is well-written and tackles an important topic relevant to social norms, morality, and justice. The findings are promising (though further control conditions are necessary to support the conclusions). The study is well-situated in the literature, with a clever experimental design and a computational approach that may offer insights into latent cognitive processes. In the revision, the authors clarified some questions related to the initial submission.

      Weaknesses:

      Despite these strengths, I remain unconvinced that the current evidence supports the paper's central claims. Below, I outline several issues that, in my view, limit the strength of the conclusions.

      (1) Experimental Design and Missing Control Condition:

      The authors set out to test whether observing a "teacher" who is averse to advantageous inequity (Adv-I) will affect observers' own rejection of Adv-I offers. However, I think the design of the task lacks an important control condition needed to address this question. At present, participants are assigned to one of two teachers: DIS or DIS+ADV. Behavioral differences between these groups can only reveal relative differences in influence; they cannot establish whether (and how) either teacher independently affects participants' own behavior. For example, a significant difference between conditions can emerge even if participants are only affected by the DIS teacher and are not affected at all by the DIS+ADV teacher. What is crucially missing here is a no-teacher control condition, which can then be compared with each teacher condition separately. This control condition would also control for pure temporal effects unrelated to teacher influence (e.g., increasing Adv-I rejections due to guilt build-up).

      While this criticism applies to both experiments, it is especially apparent in Experiment 2. As shown in Figure 4, the interaction for 10:90 offers reflects a decrease in rejection rates following the DIS teacher, with no significant change following the DIS+ADV teacher. Ignoring temporal effects, this pattern suggests that participants may be learning NOT to reject from the DIS teacher, rather than learning to reject from the DIS+ADV teacher. On this basis, I do not see convincing evidence that participants' own choices were shaped by observing Adv-I rejections.

      In the Discussion, the authors write that "We found that participants' own Adv-I-averse preferences shifted towards the preferences of the Teacher they just observed, and the strength of these contagion effects related to the degree of behavior change participants exhibited on behalf of the Teachers, suggesting that they internalized, at least somewhat, these inequity preferences." However, there is no evidence that directly links the degree of behaviour change (on the teacher's behalf) to contagion effects (own behavioural change). I think there was a relevant analysis in the original version, but it was removed from the current version.

      (2) Modelling Efforts: The modelling approach is underdeveloped. The identification of the "best model" lacks transparency, as no model-recovery results are provided. Additionally, behavioural fits for the losing models are not shown, leaving readers in the dark about where these models fail. Readers would benefit from seeing qualitative/behavioural patterns that favour the winning model. Moreover, the reinforcement learning (RL) models used are overly simplistic, treating actions as independent when they are likely inversely related. For example, the feedback that the teacher would have rejected an offer provides evidence that rejection is "correct" but also that acceptance is "an error," and the latter is not incorporated into the modelling. In other words, offers are modelled as two-armed bandits (where separate values are learned for reject and accept actions), but the situation is effectively a one-armed bandit (if one action is correct, the other is mistaken). It is unclear to what extent this limitation affects the current RL formulations. Can the authors justify/explain their reasoning for including these specific variants? The manuscript only states Q-values for reject actions, but what are the Q-values for accept actions? This is unclear.

      In Experiment 2, only the preferred model is capable of generalization, so it is perhaps unsurprising that this model "wins." However, this does not strongly support the proposed learning mechanism, lacking a comparison with simpler generalizing mechanisms (see following comments).

      (3) Conceptual Leap in Modelling Interpretation: The distinction between simple RL models and preference-inference models seems to hinge on the ability to generalize learning from one offer to another. Whereas in the RL models, learning occurs independently for each offer (hence no cross-offer generalization), preference inference allows for generalization between different offers. However, the paper does not explore "model-free" RL models that allow generalization based on the similarity of features of the offers (e.g., payment for the receiver, payment for the offer-giver, who benefits more). Such models are more parsimonious and could explain the results without invoking a theory of mind or any modelling of the teacher. In such model versions, a learner acquires a functional form that allows prediction of the teacher's feedback based on offer features (e.g., linear or quadratic weighting). Because feedback for an offer modulates the parameters of this function (feature weights), generalization occurs without necessarily evoking any sophisticated model of the other person. This leaves open the possibility that RL models could perform just as well or even outperform the preference learning model, casting doubt on the authors' conclusions.

      Of note: even the behaviourists knew that when Little Albert was taught to fear rats, this fear generalized to rabbits. This could occur simply because rabbits are somewhat similar to rats. But this doesn't mean Little Albert had a sophisticated model of animals that he used to infer how they behave.

      In their rebuttal letter, the authors acknowledge these possibilities, but the manuscript still does not explore or address alternative mechanisms.

      (4) Limitations of the Preference-Inference Model: The preference-inference model struggles to capture key aspects of the data, such as the increase in rejection rates for 70:30 DI offers during the learning phase (e.g., Fig. 3A, AI+DI blue group). This is puzzling. Thinking about this, I realized the model makes quite strong, unintuitive predictions which are not examined. For example, if a subject begins the learning phase rejecting the 70:30 offer more than 50% of the time (meaning the starting guilt parameter is higher than 1.5), then, over learning, the tendency to reject will decrease to below 50% (the guilt parameter will be pulled down below 1.5). This is despite the fact that the teacher rejects 75% of the offers. In other words, as learning continues, learners will diverge from the teacher. On the other hand, if a participant begins learning by tending to accept this offer (guilt < 1.5), then during learning, they can increase their rejection rate but never above 50%. Thus, one can never fully converge on the teacher. I think this relates to the model's failure in accounting for the pattern mentioned above. I wonder if individuals actually abide by these strict predictions. In any case, these issues raise questions about the validity of the model as a representation of how individuals learn to align with a teacher's preferences (given that the model doesn't really allow for such an alignment).

      In their rebuttal letter, the authors acknowledged these anomalies and stated that they were able to build a better model (where anomalies are mitigated, though not fully eliminated). But they still report the current model and do not develop/discuss alternatives. A more principled model may be a Bayesian model where participants learn a belief distribution (rather than point estimates) regarding the teacher's parameters.

      (5) Statistical Analysis: The authors state in their rebuttal letter that they used the most flexible random effect structure in mixed-effects models. But this seems not to be the case in the model reported in Table SI3 (the very same model was used for other analyses too). Indeed, here it seems only intercepts are random effects. This left me confused about which models were used.

    1. eLife Assessment

      This important study provides solid evidence for new insights into the role of Type-1 nNOS interneurons in driving neuronal network activity and controlling vascular network dynamics in awake, head-fixed mice. The authors use an original strategy based on the ablation of Type-1 nNOS interneurons with local injection of saporin conjugated to a substance P analogue into the somatosensory cortex. They show that ablation of type I nNOS neurons has surprisingly little effect on neurovascular coupling, although it alters neural activity and vascular dynamics.

    2. Reviewer #1 (Public review):

      Turner et al. present an original approach to investigate the role of Type-1 nNOS interneurons in driving neuronal network activity and in controlling vascular network dynamics in awake head-fixed mice. Selective activation or suppression of Type-1 nNOS interneurons has previously been achieved using either chemogenetic, optogenetic or local pharmacology. Here, the authors took advantage of the fact that Type-1 nNOS interneurons are the only cortical cells that express the tachykinin receptor 1 to ablate them with a local injection of saporin conjugated to substance P (SP-SAP). SP-SAP causes cell death in 90 % of type1 nNOS interneurons without affecting microglia, astrocytes and neurons. The authors report that the ablation has no major effects on sleep or behavior. Refining the analysis by scoring neural and hemodynamic signals with electrode recordings, calcium signal imaging and wide field optical imaging, they observe that Type-1 nNOS interneuron ablation does not change the various phases of the sleep/wake cycle. However, it does reduce low-frequency neural activity, irrespective of the classification of arousal state. Analyzing neurovascular coupling using multiple approaches, they report small changes in resting-state neural-hemodynamic correlations across arousal states, primarily mediated by changes in neural activity. Finally, they show that nNOS type 1 interneurons play a role in controlling interhemispheric coherence and vasomotion.

      In conclusion, these results are interesting, use state-of-the-art methods and are well supported by the data and their analysis. I have only a few comments on the stimulus-evoked haemodynamic responses that can be easily addressed:

      Comments on revisions:

      As I mentioned in my initial review, this study is important. In my opinion, it could be published as is. Nonetheless, I am still somewhat dissatisfied with the authors' responses to my earlier comments. I understand that the same animals were not used for both stimulation paradigms, which is unfortunate. Nonetheless, I would have appreciated it if the authors had provided a couple of experiments illustrating GCaMP7 signals during brief stimulation in their reply to the reviewers. I am still unconvinced by the authors' suggestion that the GCaMP7 signal would remain stable during removal of the vascular undershoot. Since the absence of the undershoot is notable, I anticipate that a significant part of the initial response to prolonged stimulation is influenced by processes that occur during the 0.1-second stimulation, processes that may involve a change in the bulk neuronal response.

      In short, the data could support or refute the following statement: "Loss of type-I nNOS neurons drove minimal changes in the vasodilation elicited by brief stimulation..."

    3. Reviewer #2 (Public review):

      Summary:

      This important study by Turner et al., examines the functional role of a sparse but unique population of neurons in the cortex that express Nitric oxide synthase (Nos1). To do this, they pharmacolologically ablate these neurons in focal region of whisker related primary somatosensory (S1) cortex using a saponin-Substance P conjugate. Using widefield and 2-photon microscopy, as well as field recordings, they examine the impact of this cell specific lesion on blood flow dynamics and neuronal population activity. Within primary somatosensory cortex after Nos1 ablation, they find changes in neural activity patterns, decreased delta band power, reduced sensory evoked changes in blood flow (specifically eliminates the sustained blood flow change after stimulation) and decreased vasomotion.

      Strengths:

      This was a technically challenging study and the experiments were executed in an expert manner. The manuscript was well written and I appreciated the cartoon summary diagrams included in each figure. The analysis was rigorous and appropriate. Their discovery that Nos1 neurons can have significant effects on blood flow dynamics and neural activity is quite novel that should seed many follow up, mechanistic experiments to explain this phenomenon. The conclusions were justified by the convincing data presented.

      Weaknesses:

      I did not find any major flaws with the study. I originally noted some potential issues with the authors' characterization of the lesion and its extent, but that has been resolved in the revised manuscript.

      Comments on revisions:

      The authors have thoughtfully addressed the relatively minor concerns I had originally raised. Congratulations to the authors for producing this important paper.

    1. eLife Assessment

      This paper addresses a significant question regarding the low overlap between genetic discoveries for human complex diseases and those for gene expression by emphasizing the contribution of cell-type-specific chromatin accessibility QTLs. The analyses supporting the main claims are convincing, and the key conclusions are valuable and of interest to readers in the fields of human genetics and functional genomics.

    2. Reviewer #1 (Public review):

      Most human traits and common diseases are polygenic, influenced by numerous genetic variants across the genome. These variants are typically non-coding and likely function through gene regulatory mechanisms. To identify their target genes, one strategy is to examine if these variants are also found among genetic variants with detectable effects on gene expression levels, known as eQTLs. Surprisingly, this strategy has had limited success, and most disease variants are not identified as eQTLs, a puzzling observation recently referred to as "missing regulation".

      In this work, Jeong and Bulyk aimed to better understand the reasons behind the gap between disease-associated variants and eQTLs. They focused on immune-related diseases and used lymphoblastoid cell lines (LCLs) as a surrogate for the cell types mediating the genetic effects. Their main hypothesis is that some variants without eQTL evidence might be identifiable by studying other molecular intermediates along the path from genotype to phenotype. They specifically focused on variants that affect chromatin accessibility, known as caQTLs, as a potential marker of regulatory activity.

      The authors present data analyses supporting this hypothesis: several disease-associated variants are explained by caQTLs but not eQTLs. They further show that although caQTLs and eQTLs likely have largely overlapping underlying genetic variants, some variants are discovered only through one of these mapping strategies. Notably, they demonstrate that eQTL mapping is underpowered for gene-distal variants with small effects on gene expression, whereas caQTL mapping is not dependent on the distance to genes. Additionally, for some disease variants with caQTLs but no corresponding eQTLs in LCLs, they identify eQTLs in other cell types.

      Altogether, Jeong and Bulyk convincingly demonstrate that for immune-related diseases, discovering the missing disease-eQTLs requires both larger eQTL studies and a broader range of cell types in expression assays. It remains to be seen what fractions of the missing disease-eQTLs will be discovered with either strategy and whether these results can be extended to other diseases or traits.

      It should be noted that the problem of "missing regulation" has been investigated and discussed in several recent papers, notably Umans et al., Trends in Genetics 2021; Connally et al., eLife 2022; Mostafavi et al., Nat. Genet. 2023. The results reported by Jeong and Bulyk are not unexpected in light of this previous work (all of which they cite), but they add valuable empirical evidence that mostly aligns with the model and discussions presented in Mostafavi et al.

    3. Reviewer #2 (Public review):

      eQTLs have emerged as a method for interpreting GWAS signals. However, some GWAS signals are difficult to explain with eQTLs. In this paper, the authors demonstrated that caQTLs can explain these signals. This suggests that for GWAS signals to actually lead to disease phenotypes, they must be accessible in the chromatin. This implies that for GWAS signals to translate into disease phenotypes, they need to be accessible within the chromatin.

      However, fundamentally, caQTLs, like GWAS, have the limitation of not being able to determine which genes mediate the influence on disease phenotypes. This limitation is consistent with the constraints observed in this study.

      (1) Reproducibility / Methods. The concrete numbers provided in the authors' response (e.g., 20 YRI LCL ATAC‑seq samples used only for peak discovery; caQTL mapping restricted to 100 GBR LCLs; 99,320 ATAC peaks tested vs 14,872 genes for eQTL; 373 European RNA‑seq samples, with clarification of overlap) do not appear to be reflected in the Methods. These specifics should be incorporated directly into the Methods sections.

      (2) Experimental evidence demonstrating transcription factor binding at representative caQTL peaks would strengthen causal interpretation of these loci.

      (3) Tissue/cell‑type specificity of caQTLs: Prior work supports that chromatin‑level effects are broadly shared across cellular states, whereas expression effects are more context‑specific; thus, caQTLs are generally less "state‑specific" than eQTLs. However, this does not imply equivalence across distinct cell types: caQTLs derived from different cell types may yield different results, particularly where accessibility is cell‑type restricted.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Most human traits and common diseases are polygenic, influenced by numerous genetic variants across the genome. These variants are typically non-coding and likely function through gene regulatory mechanisms. To identify their target genes, one strategy is to examine if these variants are also found among genetic variants with detectable effects on gene expression levels, known as eQTLs. Surprisingly, this strategy has had limited success, and most disease variants are not identified as eQTLs, a puzzling observation recently referred to as "missing regulation". 

      In this work, Jeong and Bulyk aimed to better understand the reasons behind the gap between disease-associated variants and eQTLs. They focused on immune-related diseases and used lymphoblastoid cell lines (LCLs) as a surrogate for the cell types mediating the genetic effects. Their main hypothesis is that some variants without eQTL evidence might be identifiable by studying other molecular intermediates along the path from genotype to phenotype. They specifically focused on variants that affect chromatin accessibility, known as caQTLs, as a potential marker of regulatory activity. 

      The authors present data analyses supporting this hypothesis: several disease-associated variants are explained by caQTLs but not eQTLs. They further show that although caQTLs and eQTLs likely have largely overlapping underlying genetic variants, some variants are discovered only through one of these mapping strategies. Notably, they demonstrate that eQTL mapping is underpowered for gene-distal variants with small effects on gene expression, whereas caQTL mapping is not dependent on the distance to genes. Additionally, for some disease variants with caQTLs but no corresponding eQTLs in LCLs, they identify eQTLs in other cell types. 

      Altogether, Jeong and Bulyk convincingly demonstrate that for immune-related diseases, discovering the missing disease-eQTLs requires both larger eQTL studies and a broader range of cell types in expression assays. It remains to be seen what fractions of the missing diseaseeQTLs will be discovered with either strategy and whether these results can be extended to other diseases or traits. 

      We thank the reviewer for their accurate summary of our study and positive review of our findings for immune-related diseases.

      It should be noted that the problem of "missing regulation" has been investigated and discussed in several recent papers, notably Umans et al., Trends in Genetics 2021; Connally et al., eLife 2022; Mostafavi et al., Nat. Genet. 2023. The results reported by Jeong and Bulyk are not unexpected in light of this previous work (all of which they cite), but they add valuable empirical evidence that mostly aligns with the model and discussions presented in Mostafavi et al. 

      We thank the reviewer for their positive review of our results and manuscript. As Reviewer #1 noted, whether our and others' observation extends to other diseases or traits is an open question. For instance, Figure 2b in Mostafavi et al., Nat. Genet. (2023) demonstrated that there was a spectrum of depletion of eQTLs and enrichment of GWAS signals in constrained genes across various tissues and traits, respectively. Therefore, gene expression constraint may play a larger or smaller role in different diseases or traits. That immune cell types and cell states are extremely diverse (Schmiedel et al., Cell (2018) and Calderon et al., Nat. Genet. (2019), just to name a few) likely adds to the complexity of gene regulation that contributes to immune-mediated disease.

      Reviewer #2 (Public Review): 

      Summary: 

      eQTLs have emerged as a method for interpreting GWAS signals. However, some GWAS signals are difficult to explain with eQTLs. In this paper, the authors demonstrated that caQTLs can explain these signals. This suggests that for GWAS signals to actually lead to disease phenotypes, they must be accessible in the chromatin. This implies that for GWAS signals to translate into disease phenotypes, they need to be accessible within the chromatin. 

      However, fundamentally, caQTLs, like GWAS, have the limitation of not being able to determine which genes mediate the influence on disease phenotypes. This limitation is consistent with the constraints observed in this study. 

      We thank the reviewer for their accurate summary of our results.

      (1) For reproducibility, details are necessary in the method section.

      Details about adding YRI samples in ATAC-seq: For example, how many samples are there, and what is used among public data? There is LCL-derived iPSC and differentiated iPSC (cardiomyocytes) data, not LCL itself. How does this differ from LCL, and what is the rationale for including this data despite the differences?

      Banovich et al., Genome Research (2018) (PMID: 29208628), who generated data using LCLderived iPSCs and differentiated iPSCs (cardiomyocytes), also generated ATAC-seq data from 20 YRI LCL samples. We analyzed those data to identify open chromatin regions (i.e., ATACseq peaks) in LCLs and merged the regions with open chromatin regions identified with 100 GBR LCL samples from two studies by Kumasaka et al. (Nature Genetics (2016)

      PMID: 26656845 and Nature Genetics (2019) PMID: 30478436). However, we restricted the caQTL analysis to only the 100 GBR samples because of possible ancestry effects and batch effects. We attempted caQTL analysis with the 20 YRI samples as well, but the result was noisy, likely due to smaller sample size and lower read depth of the ATAC-seq data.

      caQTL is described as having better power than eQTL despite having fewer samples. How does the number of ATAC peaks used in caQTL compare to the number of gene expressions used in eQTL?

      The number of ATAC peaks used in caQTL (99,320) is ~6.7 times greater than the number of genes (14,872) used in the eQTL analysis. Therefore, there is a higher chance of detecting a significant caQTL signal and a significant colocalization signal than there is for eQTLs. However, we reasoned that since distal eQTLs are more easily detected as caQTLs and since increasing the sample size of eQTLs through meta-analysis uncovered additional eQTL colocalization at loci with caQTL colocalization only, colocalized caQTLs are likely capturing disease-relevant regulatory effects.

      Details about RNA expression data: In the method section, it states that raw data (ERP001942) was accessed, and in data availability, processed data (E-GEUV-1) was used. These need to be consistent.

      Thank you for pointing this out. We used the processed data from Expression Atlas (https://www.ebi.ac.uk/gxa/experiments/E-GEUV-1/Results), and that's what we meant by "We downloaded RNA expression level data of the LCL samples from the Expression Atlas." We have revised the “RNA expression data preparation” section in our manuscript to make the text clearer.

      How many samples were used (the text states 373, but how was it reduced from the original 465, and the total genotype is said to be 493 samples while ATAC has n=100; what are the 20 others?), and it mentions European samples, but does this exclude YRI?

      We thank the reviewer for pointing out these points of confusion. Our reported count of 493 samples included YRI samples with RNA-seq data or ATAC-seq data that we ultimately did not use for QTL analyses. There were 373 European samples with RNA-seq data that we used for eQTL analysis, and 100 GBR samples (including some that overlap with the 373 European samples) that we used for caQTL analysis. We have revised the text to clarify these points.

      (2) Experimental results determining which TFs might bind to the representative signals of caQTL are required.

      We agree that caQTL colocalization is just the start of elucidating the regulatory mechanism of a GWAS locus. Determining which TFs are bound and which TFs' binding is altered would be necessary to describe the causal regulatory mechanism. For this, we utilized the Cistrome database to search for TFs whose binding overlaps the colocalized caQTL peaks. We present the results of this analysis in Supplementary Table 3 and Supplementary Figure 4, both of which we have added in our revised manuscript. Overall, protein factors associated with active transcription, such as POL2RA, and several immune cell TFs, including RUNX3, SPI1, and RELA, were frequently detected in those peaks. Detecting these factors in most peaks supports the likelihood that the colocalized caQTL peaks are active cis-regulatory elements. These results are consistent with our observation of enriched caQTL-mediated heritability in regions with active histone marks (Figure 1).

      (3) It is stated that caQTL is less tissue-specific compared to eQTL; would caQTL performed with ATAC-seq results from different cell types, yield similar results?

      We thank the reviewer for the question. Calderon et al. (PMID: 31570894) observed that "most effects on allelic imbalance (of ATAC-seq) were shared regardless of lineage or condition". Yet, there were regions where a different cell type or state would show inaccessibility (Figure 4d in Calderon et al.). Thus, we expect that ATAC-seq results from different cell types (e.g., T cells, B cells, monocytes, etc.) would lead to additional caQTLs showing colocalization at cell-typespecific open chromatin. However, if a region is accessible in both cell types, caQTL may be detected in both. Moreover, Alasoo et al., Nature Genetics (2018) (PMID: 29379200) observed that “many disease-risk variants affect chromatin structure in a broad range of cellular states, but their effects on expression are highly context specific.” In both studies, the authors investigated immune cell types, and there could be different observations in non-immune cell types and other diseases and traits.

      Reviewer #1 (Recommendations For The Authors): 

      I think it would strengthen the paper to explore gene-level differences in the discovery of caQTLs and eQTLs. For example, complex disease-relevant genes, on average, have more/longer regulatory domains (as shown by Wang and Goldstein, AJHG 2020; Mostafavi et al., Nat. Genet. 2023). Therefore, it is plausible that for such genes, caQTLs are much more easily discoverable than eQTLs due to (i) a larger mutational target size for caQTLs, and (ii) dispersion of expression heritability across multiple domains, which hampers the discovery of eQTLs but not caQTLs, which are studied independently of other domains in the region. In other words, discovered caQTLs and eQTLs likely vary in terms of their distance to genes (as the authors report), as well as their target genes.

      We thank the reviewer for the suggestion to explore gene-level differences. We expect that the effects of complex disease-relevant genes having more / longer regulatory domains, on average, to explain our observations. We agree on both of your points that there are many more regulatory elements that are captured as accessible regions than expressed genes and that genes often have multiple independent eQTLs leading to dispersion of heritability. The genelevel trend that we described was the distance of the regulatory element from the genes. Additional analyses would be a relevant future direction.

      Also considering gene-level analysis, Mostafavi et al. show that the types of biases they report for eQTLs also apply to other molecular QTLs. It would be valuable to compare GWAS hits with versus without caQTL colocalization. Similarly, it would be insightful to compare GWAS hits with both colocalized caQTLs and eQTLs to GWAS hits with colocalized caQTLs but no eQTLs in any of the cell types. 

      We thank the reviewer for the comment. Investigating for potential biases in the colocalized caQTL would be useful, but we considered it beyond the scope of this work. In terms of biological factors, we demonstrated through mediated heritability analyses that more accessible chromatin (based on ATAC-seq read coverage) and regions with active histone marks were enriched for autoimmune disease associations (Figure 1). Furthermore, as greater distance of the regulatory variant from the transcription start site significantly reduced the cis-heritability, we would expect that distance would play a major role, similar to Mostafavi et al.’s conclusions.

      I don't think the argument for the role of natural selection contributing to the "missing regulation" is presented accurately. Specifically, large eQTLs acting on top trait-relevant genes are under stronger selection and thus, on average, segregate at lower frequencies. This makes them difficult to discover in eQTL assays. However, if not lost, they contribute as much, if not more, to trait heritability than weaker eQTLs at the same gene because their larger effects compensate for their lower frequency. At the most extreme, selection should have a "flattening" effect (e.g., see Simons et al., PLOS Biol 2018; O'Connor et al., AJHG 2019): weak and strong eQTLs at the same gene are expected to contribute equally to heritability. Therefore, the statement "Consequently, only weak eQTL variants, often in regions distal to the gene's promoter, may remain and affect traits" is not correct. If this turns out to be empirically true, other models, such as pleiotropic selection, need to explain it. 

      We thank the reviewer for the correction. We agree with the comment and have revised the sentences in the introduction accordingly.

      It is worth speculating why caQTLs may be more consistent across cell types than cis-eQTLs. Additionally, readers may infer from the paper that the focus should shift from eQTLs to caQTLs, which may not be the authors' intention. Perhaps these approaches are complementary: caQTLs can help with TSS-distal disease variants, while finding the target gene and regulatory context is more straightforward with eQTL colocalization. Addressing these points in the discussion will be helpful.

      We appreciate the reviewer's suggestion to clarify the advantages of incorporating cis-eQTLs and caQTLs. Our argument is exactly as you put it, and we added a paragraph on this in the Discussion.

      I believe the authors could do more to contextualize their findings within the existing literature on the subject, particularly Umans et al., Trends in Genetics 2021; Connally et al., eLife 2022; and Mostafavi et al., Nat. Genet. 2023. For instance, Umans et al. suggest that "if most standard eQTLs are generally benign, increasing sample size and adding more tissue types in an effort to identify even more standard eQTLs may not help us to explain many more disease risk mutations". Conversely, Mostafavi et al. argue for a multipronged approach, which appears more aligned with the authors' conclusions.

      We followed the reviewer’s suggestion to place our work in the context of existing literature on this topic. Moreover, we clarified what our recommendations for future data generation are.

      I thought Figures 1C-D were unclear. 

      We added a sentence in the figure legend describing that stronger and more significant enrichment indicate that mediated heritability is concentrated in that subset.

      Reviewer #2 (Recommendations For The Authors): 

      Complete workflow figures for caQTL calling and eQTL calling are required. 

      To improve clarity of the caQTL and eQTL calling workflow, we added Supplementary Figure 1.

    1. eLife Assessment

      This study reports important findings about the nature of feedback to primary visual cortex (V1) during object recognition. The state-of-the-art functional MRI evidence for the main claims is solid, and the combination of high-resolution fMRI with MEG yields significant insight into neural mechanisms. The findings presented here are relevant to a number of scientific fields such as object recognition, categorisation and predictive coding.

    2. Reviewer #1 (Public review):

      This study examines the spatiotemporal properties of feedback signals in the human brain during an object discrimination task. Using 7T fMRI and MEG, the authors show that task-relevant object category information can be decoded from both deep and superficial layers of V1, originating from occipito-temporal and posterior parietal cortices. In contrast, task-irrelevant category feedback does not appear in V1, even when the same objects are foveally presented. Low-level orientation information, however, is decodable from V1 regardless of task relevance and is supported by recurrence with occipito-temporal regions. These findings suggest that category decoding in V1 depends on task-driven feedback rather than feedforward visual features.

      Strengths

      This study leverages two advanced neuroimaging modalities attempting to connect object recognition across cortical layer and whole-brain levels. The revised manuscript strengthens the connection between the fMRI and MEG components.<br /> It also demonstrates that a peripheral object discrimination task is effective for isolating feedforward and feedback signals using 7T fMRI.<br /> It is particularly notable that no low-level features were fed back to V1's superficial layers in the peripheral object discrimination task. The authors further show that high- and low-level feedback to the foveal V1 are comparable in strength, supporting the idea that the superficial layer in V1 selectively represents task-relevant content.

      Weaknesses

      One alternative explanation for the absence of task-irrelevant category decoding in the foveal task could be that feedback enhancement may be required to decode complex features from V1 (compared to a coarse orientation feature). It would be informative to test whether the findings hold if the categorical boundary were defined through a low level feature other than orientation (e.g., frequency) (e.g. Ester, Sprague and Serences, 2020).

      I would like to echo the concerns raised by the other reviewer regarding multiple comparisons correction. It is important to apply correction procedures, especially given the number of statistical tests performed across brain regions where strict a priori hypotheses are unlikely. In the case of cluster-based statistics, the manuscript should clearly specify both the cluster-forming threshold and the significance threshold used for comparing true cluster masses to the shuffled distribution.

      Conclusion

      Overall, the results support the study's conclusions. This work addresses a timely question in object categorization and predictive coding-specifically, how feedback signals vary in content and timing across cortical layers.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript reports high-resolution functional MRI data and MEG data revealing additional mechanistic information about an established paradigm studying how foveal regions of primary visual cortex (V1) are involved in processing peripheral visual stimuli. Because of the retinotopic organization of V1, peripheral stimuli should not evoke responses in the regions of V1 that represent stimuli in the center of the visual field (the fovea). However, functional MRI responses in foveal regions do reflect the characteristics of peripheral visual stimuli - this is a surprising finding first reported in 2008. The present study uses fMRI data with sub-millimeter resolution to study the how responses at different depths in the foveal gray matter do or don't reflect peripheral object characteristics during 2 different tasks: one in which observers needed to make detailed judgments about object identity, and one in which observers needed to make more coarse judgments about object orientation. FMRI results reveal interesting and informative patterns in these two conditions. A follow-on MEG study yields information about the timing of these responses. Put together, the findings settle some questions in the field and add new information about the nature of visual feedback to V1.

      Strengths:

      (1) Rigorous and appropriate use of "laminar fMRI" techniques.

      (2) The introduction does an excellent job of contextualizing the work.

      (3) Control experiments and analyses are designed and implemented well

      Weaknesses:

      (1) The use of the term "low order" to describe object orientation is potentially confusing. During review, the authors considered this issue and responded that they would continue with the use of the term low-order to describe object orientation because a low-pass spatial frequency filter would provide object orientation information. This is certainly a reasonable perspective; nonetheless, this reviewer thinks spatial frequencies that low are not readily represented by neurons in early visual cortex and it is common to use "low-order" to refer to features extracted in early visual areas, so I think this causes confusion.

      (2) The methods contain a nice description of the methods for "correcting the vascular-related signals". I'm guessing this is the method that removed, e.g., 22% of foveal voxels (previous paragraph), but it's not entirely clear whether the voxel selection methods described in the "correcting the vascular-related signals" are describing the same processing step referred to in the previous paragraph as "a portion of voxels was removed based on large vein distribution".

      (3) It is quite difficult to perform laminar analyses across multiple visual areas because distortion compensation is not perfect and registration of functional to anatomical data will always be a bit better in some places and a bit worse in others. An ideal manuscript would include some images showing registration quality in V1, LOC, and IPS regions for a few different participants, or include some kind of quality metric indicating the confidence in depth assignments in different regions.

      (4) For the decoding analysis, it would be helpful to have more information about how samples were defined for each condition -- were the beta values for entire blocks used as samples for each condition, or were separate timepoints during a block used in the SVM as repeated samples for each condition?

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      (1.1) The authors argue that low-level features in a feedback format could be decoded only from deep layers of V1 (and not superficial layers) during a perceptual categorization task. However, previous studies (Bergman et al., 2024; Iamshchinina et al., 2021) demonstrated that low-level features in the form of feedback can be decoded from both superficial and deep layers. While this result could be due to perceptual task or highly predictable orientation feature (orientation was kept the same throughout the experimental block), an alternative explanation is a weaker representation of orientation in the feedback (even before splitting by layers there is only a trend towards significance; also granger causality for orientation information in MEG part is lower than that for category in peripheral categorization task), because it is orthogonal to the task demand. It would be helpful if the authors added a statistical comparison of the strength of category and orientation representations in each layer and across the layers.

      We agree that the strength of feedback information is related to task demand. Specifically, we would like to highlight the relationship between task demand and feedback information in the superficial layer. Previous studies have shown that foveal feedback information is observed only when the task requires the identity information of the peripheral objects (Williams et al., 2008; Fan et al., 2016; Yu and Shim, 2016). In this study, we found that the deep layer represented both orientation and categorical feedback information, while the superficial layer only represented categorical information. This suggests that feedback information in the superficial layer may be related to (or enhanced by) the task demands. In other words, if the experimental design required participants to discriminate orientation rather than object identity, we would expect stronger orientation information in foveal V1 and significant decoding performance of orientation feedback information in the superficial layer of foveal V1. This assumption is consistent with the anatomical connections of the superficial layer, which not only receives feedback connections but also sends outputs to higher-level regions for further processing. This is also consistent with Iamshchinina et al.’s observation that, when orientation information had to be mentally rotated and reported (i.e., task-relevant), it was observed in both the superficial and deep layers of V1. Bergmann et al. observed illusory color information in the superficial layer of V1, which may reflect a combination of lateral propagation and feedback mechanisms in the superficial layer that support visual filling-in phenomena. We have revised the discussion in the manuscript: In other words, if the experimental design required participants to discriminate orientation rather than object identity, we would expect stronger orientation information in foveal V1 and significant decoding performance of orientation feedback information in the superficial layer of foveal V1. Recent studies (Iamshchinina et al., 2021; Bergman et al., 2024) have also highlighted the relationship between feedback information and neural representations in V1 superficial layer.

      To further demonstrate the laminar profiles of low- and high-order information, we have re-analyzed the data and added more fine-scale laminar profiles with statistical comparisons in the revised manuscript. The results again showed significant neural decoding performances in the deep layer of both category and orientation information, and only significant decoding performances of category information in the superficial layer.

      (1.2) The authors argue that category feedback is not driven by low-level confounding features embedded in the stimuli. They demonstrate the ability to decode orientations, particularly well represented by V1, in the absence of category discrimination. However, the orientation is not a category-discriminating feature in this task. It could be that the category-discriminating features cannot be as well decoded from V1 activity patterns as orientations. Also, there are a number of these category discriminating features and it is unclear if it is a variation in their representational strength or merely the absence of the task-driven enhancement that preempts category decoding in V1 during the foveal task. In other words, I am not sure whether, if orientation was a category-specific feature (sharpies are always horizontal and smoothies are vertical), there would still be no category decoding.

      The low-order features mentioned in the manuscript refer to visual information encoded intrinsically in V1, independent of task demands. In the foveal experiment, the task is to discriminate the color of fixation, which is unrelated to the category or orientation of the object stimuli. The results showed that only orientation information could be decoded from foveal V1. This indicates that low-order information, such as orientation, is strongly and automatically encoded in V1, even when it is irrelevant to the task. Meanwhile, category information could not be decoded, indicating that category information relies on feedback signals driven by attention or the task to the objects, both of which are absent in the fixation task. Other evidence indicates that category feedback is not driven by low-level features intrinsically encoded in V1. First, the laminar profiles of these two types of feedback information differ considerably (see response to 1.1). Second, only category feedback information was correlated with behavioral performance (MEG experiment). These findings demonstrate that category feedback information is task-driven and differs from the automatically encoded low-order information in foveal V1. The reviewer expressed some uncertainty that, whether “if orientation was a category-specific feature (sharpies are always horizontal and smoothies are vertical), there would still be no category decoding”. Our data showed that orientation could be automatically decoded in V1, regardless of task demand. Thus, if orientation was a category-specific feature in the foveal task (i.e., sharpies are always horizontal and smoothies are always vertical), category decoding would be successful in V1. However, in this scenario, the orientation and other shape features are not independent, thus preventing us to find out whether non-orientation shape features could be decoded in V1.  

      Reviewer #2 (Public review):

      (2.1) While not necessarily a weakness, I do not fully agree with the description of the 2 kinds of feedback information as "low-order" and "high-order". I understand the motivation to do this - orientation is typically considered a low-level visual feature. But when it's the orientation of an entire object, not a single edge, orientation can only be defined after the elements of the object are grouped. Also, the discrimination between spikies and smoothies requires detecting the orientations of particular edges that form the identifying features. To my mind, it would make more sense to refer to discrimination of object orientation as "coarse" feature discrimination, and orientation of object identity as "fine" feature discrimination. Thus, the sentence on line 83, for example, would read "Interestingly, feedback with fine and coarse feature information exhibits different laminar profiles.".

      We agree that the object orientation (invariant to object category or identity) is defined on a larger spatial scale than the local orientation features such as local edges, however, in this sense, the object orientation is a coarse feature. In contrast, the category-defining information is mainly contributed by the local shape information (i.e., little cubes vs. bumps), which is more fine-scale information. One way to look at this difference is that the object orientation information is mainly carried by low-spatial frequency information and will survive low-pass filtering, hence “coarse”; while the object category information would largely be lost if the objects underwent low-pass spatial filtering.

      We believe the labeling words “low-order” and “high-order” are consistent with the typical use of these terms in the literature, referring to features intrinsically encoded in early visual cortex vs. in high level object sensitive cortical regions. The more important aspects of our results are in their differential engagement in feedforward vs. feedback processing, with low-order features automatically represented in the early visual cortex during feedforward processing while high-order features represented due to feedback processing. Results from the foveal fMRI experiment (Exp. 2) strongly support this assumption that, when objects were presented at the fovea and the task was a fixation color task irrelevant to object information, foveal V1 could only represent orientation information, not category information. Notably, there was a dramatic difference in decoding performance in foveal V1 between Exp.1 and Exp.2, which ruled out the argument that both orientation and category information were driven by local edge information represented in V1.

      (2.2) Figure 2 and text on lines 185, and 186: it is difficult to interpret/understand the findings in foveal ROIs for the foveal control task without knowing how big the ROI was. Foveal regions of V1 are grossly expanded by cortical magnification, such that the central half-degree can occupy several centimeters across the cortical surface. Without information on the spatial extent of the foveal ROI compared to the object size, we can't know whether the ROI included voxels whose population receptive fields were expected to include the edges of the objects.

      The ROI of foveal V1 was defined using data from independent localizer runs. In each localizer run, flashing checkerboards of the same size as the objects in the task runs were presented at the fovea or in the periphery. The ROI of foveal V1 was identified as the voxels responsive to the foveal checkerboards. In other words, The ROI of foveal V1 included the voxels whose population receptive fields covered the entire object in the foveal visual field.

      We included a figure in the revised manuscript comparing the activation maps induced by the foveal object stimulus in the task runs with the ROI coverage defined by the localizer runs. 

      (2.3) Line 143 and ROI section of the methods: in order for the reader to understand how robust the responses and analyses are, voxel counts should be provided for the ROIs that were defined, as well as for the number (fraction) of voxels excluded due to either high beta weights or low signal intensity (lines 505-511).

      In the revised manuscript, we have included the number of voxels in each ROI and the criteria for voxel selection:

      For each ROI, the number of voxels depended on the size of the activated region, as estimated from the localizer data. The numbers are as follows: foveal V1, 2185 ± 389; peripheral V1, 1294± 215; LOC, 3451 ± 863; and pIPS, 5154 ± 1517. To avoid the signals of large vessels, a portion of voxels was removed based on the distribution of large vessels: V1 foveal, 22.5% ± 6.6%; V1 peripheral, 6.8% ± 3.9%; LOC, 16.1% ± 8.1% ; and pIPS, 5.1% ± 3.2%. For the decoding analysis, the top 500 responsive voxels in each ROI were selected to balance the voxel numbers across different ROIs for training and testing the decoder.

      (2.4) I wasn't able to find mention of how multiple-comparisons corrections were performed for either the MEG or fMRI data (except for one Holm-Bonferonni correction in Figure S1), so it's unclear whether the reported p-values are corrected.

      For the fMRI results, there is strong evidence showing that feedback information is sent to the foveal V1 during a peripheral object task (Williams et al., 2008; Fan et al., 2016; Yu and Shim, 2016). In addition, anatomical and functional evidence shows that the superficial and deep layers of V1 receive feedback information during visual processing. Therefore, in the current study, we specifically examined two types of feedback information in the superficial and deep layers of foveal V1, and did not apply multiple-comparison correction to the decoding results.

      Regarding the MEG results, since we did not have a strong prior about when feedback information would arrive in the foveal V1, a cluster-based permutation method was used to correct for multiple comparisons in each time course. Specifically, for each time point, the sign of the effect for each participant was randomly flipped 50000 times to obtain the null hypothesis distribution for each time point. Clusters were defined as continuous significant time points in the real and flipped time series, and the effects in each cluster were summed to create a cluster-based effect. The most significant cluster-based effect in each flipped time series was then used to generate the corrected null hypothesis distribution.

      We included these clarifications in Significance testing part of the revised manuscript.

      Reviewer #1 (Recommendations for the authors):

      It would be helpful if the authors could elaborate more on the fMRI decoding results in higher-order visual areas in the Discussion (there are recent studies also investigating higher-order visual areas (Carricarte et al., 2024) and associative areas (Degutis et al., 2024)) and relate it to the MEG information transmission results between the areas overlapping with the regions recorded in the fMRI part of the study.

      We have discussed the fMRI decoding results in the LOC and IPS in the revised manuscript: 

      In the current study, fMRI signals from early visual cortex and two high-level brain regions (LOC and pIPS) were recorded. Neural dynamics of these regions were extracted from MEG signals. Decoding analyses based on fMRI and MEG signals consistently showed that object category information could be decoded from both regions. These findings raise an important question:  Further Granger causality analysis indicates that the feedback information in foveal V1 was mainly driven by signals from the LOC. Layer-specific analysis showed that category information could be decoded in the middle and superficial layers of the LOC. A reasonable interpretation of this result is that feedforward information from the early visual cortex was received by the LOC’s middle layer, then the category information was generated and fed back to foveal V1 through the LOC’s superficial layer. A recent study (Carricarte et al., 2024) found that, in object selective regions in temporal cortex, the deep layer showed the strongest fMRI responses during an imagery task. Together, the results suggest that the deep and superficial layers correspond to different feedback mechanisms. It is worth noting that other cortical regions may also generate feedback signals to the early visual cortex. The current study did not have simultaneously recorded fMRI signals from the prefrontal cortex, but it has been shown that feedback signals can be traced back to the prefrontal cortex during complex cognitive tasks, such as working memory (Finn et al., 2019; Degutis et al., 2024). Further fMRI studies with submillimeter resolution and whole-brain coverage are needed to test other potential feedback pathways during object processing.

      The behavioral performance seems quite low (67%), could authors explain the reasons for it?

      We designed the object stimuli to be difficult to distinguish on purpose. Some of our pilot data showed that the more involved the participants were in the peripheral object task, the easier the foveal feedback information was to decoded. It is reasonable to assume that if the peripheral objects were easily distinguishable, the feedback mechanism may not be fully recruited during object processing. Furthermore, since we were decoding category and orientation information rather than identity information, the difficulty of distinguishing two objects from the same category and with the same orientation would not affect the decoding of category and orientation information in the neural signals.

      Reviewer #2 (Recommendations for the authors):

      (1) Line 52: the meaning of the sentence starting with "However, ..." is not entirely clear. Maybe the word "while" is missing after the first comma?

      (2) Line 224. If I'm understanding the rationale for the MEG analysis correctly, it was not possible to localize foveal regions, but the cross-location decoding analysis was used to approximate the strength and timing of feedback information. If this is the case, "neural representations in the foveal region" were not extracted.

      (3) Figure 4. The key information is too small to see. The lines indicating where decoding performance was significant are quite thin but very important, and the text next to them indicating onset times of significant decoding is in such a small font size I needed to zoom in to 300% to read it (yes, my eyes are getting old and tired). Increasing the font size used to represent key information would be nice.

      (4) Figure 4 caption. Line 270 describes the line color in the plots as yellow, but that color is decidedly orange to my eye.

      (5) Line 340/341: Papers that define and describe feedback-receptive fields seem important to cite here:

      Keller, A. J., Roth, M. M., & Scanziani, M. (2020). Feedback generates a second receptive field in neurons of the visual cortex. Nature, 582(7813), 545-549.

      Kirchberger, L., Mukherjee, S., Self, M. W., & Roelfsema, P. R. (2023). Contextual drive of neuronal responses in mouse V1 in the absence of feedforward input. Science advances, 9(3), eadd2498.

      (6) Lines 346-350: this sentence seems to have some missing or misused words, because the syntax isn't intact.

      (7) Line 367: supports should be support.

      We thank the reviewers for the comments and have corrected them in the manuscript.

    1. eLife Assessment

      This important study identifies a plant-derived metabolite, betulin, as an effective natural insecticide against aphids and uncovers its specific molecular target. The evidence is compelling, combining greenhouse and field efficacy trials with rigorous molecular, genetic, and electrophysiological approaches that converge on a conserved binding site in the aphid GABA receptor. While additional work is needed to fully assess potential off-target effects and ecological safety, the study provides a strong mechanistic foundation. These findings will be of interest to researchers in plant biology, chemical ecology, and sustainable pest management.

    2. Reviewer #1 (Public review):

      Wang, Junxiu et al. investigated the underlying molecular mechanisms of the insecticidal activity of betulin against the peach aphid, Myzus persicae. There are two important findings described in this manuscript: (a) betulin inhibits the gene expression of GABA receptor in the aphid, and (b) betulin binds to the GABA receptor protein, acting as an inhibitor. The first finding is supported by RNA-Seq and RNAi, and the second one is convinced with MST and electrophysiological assays. Further investigations on the betulin binding site on the receptor protein provided a fundamental discovery that T228 is the key amino acid residue for its affinity, thereby acting as an inhibitor, backed up by site-directed mutagenesis of the heterologously-expressed receptor in E. coli and by CRISPR-genome editing in Drosophila.

      Comments on revisions:

      All of my review comments have been addressed, and the manuscript has been revised accordingly.

    3. Reviewer #2 (Public review):

      Summary:

      This important study shows that betulin from wild peach trees disrupts neural signaling in aphids by targeting a conserved site in the insect GABA receptor. The authors present a nicely integrated set of molecular, physiological, and genetic experiments to establish the compound's species-specific mode of action. While the mechanistic evidence is solid, the manuscript would benefit from a broader discussion of evolutionary conservation and potential off-target ecological effects.

      Strengths:

      The main strengths of the study lie in its mechanistic clarity and experimental rigor. The identification of a betulin-binding single threonine residue was supported by (1) site-directed mutagenesis and (2) functional assays. These experiments strongly support the specificity of action. Furthermore, the use of comparative analyses between aphids and fruit flies demonstrates an important effort to explore species specificity, and the integration of quantitative data further enhances the robustness of the conclusions.

      Comments on revisions:

      The revision satisfactorily addresses my concerns on evolutionary context, methodological clarity, and ecological risk.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Wang, Junxiu et al. investigated the underlying molecular mechanisms of the insecticidal activity of betulin against the peach aphid, Myzus persicae. There are two important findings described in this manuscript: (a) betulin inhibits the gene expression of GABA receptor in the aphid, and (b) betulin binds to the GABA receptor protein, acting as an inhibitor. The first finding is supported by RNA-Seq and RNAi, and the second one is convinced with MST and electrophysiological assays. Further investigations on the betulin binding site on the receptor protein provided a fundamental discovery that T228 is the key amino acid residue for its affinity, thereby acting as an inhibitor, backed up by site-directed mutagenesis of the heterologously-expressed receptor in E. coli and by CRISPR-genome editing in Drosophila.

      Although the manuscript does have strengths in principle, the weaknesses do exist: the manuscript would benefit from more comprehensive analyses to fully support its key claims in the manuscript. In particular:

      (1) The Western blotting results in Figure 5A & B appear to support the claim that betulin inhibits GABR gene expression (L26), as a decrease in target protein levels is often indicative of suppressed gene expression. The result description for Figure 5A & B is found in L312-L316, within Section 3.6 ("Responses of MpGABR to betulin"), where MST and voltage-clamp assays are also presented. It seems the observed decrease in MpGABR protein content is due to gene downregulation, rather than a direct receptor protein-betulin interaction. However, this interpretation lacks discussion or analysis in either the corresponding results section or the Discussion. In contrast, Figures 5C-F are specifically designed to illustrate protein-betulin interactions. Presenting Figure 5A & B alongside these panels might lead to confusion, as they support distinct claims (gene expression vs. protein binding/inhibition). Therefore, I recommend moving Figure 5A & B either to the end of Figure 3 or to a separate figure altogether to improve clarity and logical flow. A minor point in the Western blotting experiment is that although GAPDH was used as a reference protein, there is no explanation in the corresponding M&M section.

      We thank the reviewer for the concise and accurate summary and appreciate the constructive feedback on the article’s strengths and weaknesses.

      (A) According to your suggestion, the original Figure 5A and B have been inserted into Figure 3, following Figure 3D. The original Figure 3E-I has been saved as a new figure, to illustrate the RNAi assay.

      (b) “GAPDH was used as a reference protein” has been supplied in the M&M section, see

      Line 209.

      (2) The description of the electrophysiological recording experiment is unclear regarding the use of GABA. I didn't realize that GABA, the true ligand of the GABA receptor, was used in this inhibition experiment until I reached the Results section (L321), which states, "In the presence of only GABA, a fast inward current was generated." Crucially, no details are provided on the experiment itself, including how GABA was applied (e.g., concentration, duration, whether GABA was treated, followed by betulin, or vice versa). This information is essential for reproducibility. Please ensure these details are thoroughly described in the corresponding M&M section.

      We thank the reviewer for the valuable comments.

      (a) Detailed information on how to apply GABA has been added to the corresponding M&M section (Lines 260-263): After 3 days of incubation, the oocytes were used for electrophysiological recording. GABA was dissolved in 1 × Ringer's solution to prepare 100 µM GABA solution. Subsequently, the 100 µM GABA solutions containing different concentrations of betulin (0, 5, 10, 20, 40, 80, 160, 320 µM) were used to perfuse the oocytes.

      (b) Additionally, we also checked other contents of M&M section to ensure that sufficient detail has been supplied.

      (3) The phylogenetic analysis, particularly concerning Figures 4 and 6B, needs significant attention for clarity and representativeness. First, your claim that MpGABR is only closely related to CAI6365831.1 (L305-L310) is inconsistent with the provided phylogenetic tree, which shows MpGABR as equally close to Metopolophium dirhodum (XP_060864885.1) and Acyrthosiphon pisum (XP_008183008.2). Therefore, singling out only Macrosiphum euphorbiae (CAI6365831.1) is not supported by the data. Second, the representation of various insect orders is insufficient. All 11 sequences in the Hemiptera category (in both Figure 4 and Figure 6B) are exclusively from the Aphididae family. This small subset cannot represent the highly diverse Order Hemiptera. Consequently, statements like "only THR228 was conserved in Hemiptera" (L338), "The results of the sequence alignment revealed that only THR228 was conserved in Hemiptera" (L430), or "THR228... is highly conserved in Hemiptera" (L486) are not adequately supported. Third, similar concerns apply to the Diptera order, which includes 10 Drosophila and 2 mosquito samples (not diverse or representative enough), and likely to other orders as well. Thereby, the Figure 6B alignment should be revised accordingly to reflect a more accurate representation or to clarify the scope of the analysis. Fourth, there's a discrepancy in the phylogenetic method used: the M&M section (L156) states that MEGA7, ClustalW, and the neighbor-joining method were used, while the Figure 4 caption mentions that MEGA X, MUSCLE, and the Maximum likelihood method were employed. This inconsistency needs to be clarified and made consistent throughout the manuscript. Fifth, I have significant concerns about the phylogenetic tree itself (Figure 4). A small glitch was observed at the Danaus plexippus node, which raises suspicion regarding potential manipulation after tree construction. More critically, the tree, especially within Coleoptera, does not appear to be clearly resolved. I am highly concerned about whether all included sequences are true GABR orthologs or if the dataset includes partial or related sequences that could distort the phylogeny. Finally, for Figure 6B, both protein (XP_) and nucleotide (XM_) sequences were mix used. I recommend using the protein sequences instead of nucleotide sequences in this figure panel, as protein sequences are more directly informative.

      We thank the reviewer for the careful reading and valuable comments.

      (a) Firstly, according to your comments, phylogenetic analysis has been re-performed with more represent species from each Order (Fig. 5 and Fig. 7B). The results revealed that only THR228 was conserved across 11 species in the Aphididae family of Hemiptera. Therefore, the expressions like "only THR228 was conserved in Hemiptera" have been revised to “among the four residues, only THR228 was conserved across 11 species in the Aphididae family of Hemiptera” (Line 106, Line 369, Line 477, and Lines 563-564).

      (b) We have modified the description of Fig. 5 (the original Fig. 4): MpGABR  (XP_022173711.1) was found to be genetically closely related to CAI6365831.1 from Macrosiphum euphorbiae, XP 008183008.2 from Acyrthosiphon pisum, and XP 060864885.1 from Metopolophium dirhodum (Fig. 5 and Table S6). See Lines 342-346.

      (c) Phylogenetic analysis was performed using MEGA7 with multiple amino acid sequence alignment (ClustalW) and the neighbor-joining method. We have revised the Fig. 5 (the original Fig. 4) caption to make it accurate and consistent throughout the manuscript.

      (d) We are sorry about the small glitch at the Danaus plexippus node. Actually, after the phylogenetic tree was constructed, it was imported in Adobe Illustration for coloring and classification annotation. There may have been operational errors during the process of resizing the image, resulting in the occurrence of the small glitch. Besides, the unclear clustering of Coleoptera may be due to improper regulation of distance (pixels) of branch from nodes. Again, thanks for your careful reading. We have rebuilt the phylogenetic tree.

      (e) Based on your suggestion, the sequence IDs have been unified as the protein sequence IDs (Fig. 5, Fig. 7B and Table S6)

      (4) The Discussion section requires significant revision to provide a more insightful and interpretative analysis of the results. Currently, much of the section primarily restates findings rather than offering deeper discussion. For instance, L409-L419 restate the results, followed by the short sentence "Collectively, these results suggest that betulin may have insecticidal effects on aphids by inhibiting MpGABR expression". It could be further expanded to make it beneficial to elaborate on proposed mechanisms by which gene expression might be suppressed, including any potential transcription factors involved. In contrast, while L422-L442 also initially summarize results, the subsequent paragraph (L445-L472) effectively discusses the potential mechanisms of inhibitory action and how mortality is triggered, which is a good model for other parts of the section. However, all the discussion ends up with a short statement, "implying that betulin acts as a CA of MpGABR" (L472), which appears to be a leap. The inference that betulin acts as a competitive antagonist (CA) is solely based on the location of its extracellular binding site, which does not exactly overlap with the GABA binding site. It needs stronger justification or actually requires further experimental validation. The authors should consider rephrasing this statement to acknowledge the need for additional studies to definitively confirm this mechanism of action.

      We appreciate the reviewer's careful reading and valuable feedback, which will certainly enhance the quality of our manuscript.

      (a) Possible reasons for the effect of betulin on MpGABR expression have been discussed in our manuscript (Lines 455-466): The regulation of gene expression is sophisticated and delicate (Pope and Medzhitov 2018). The regulatory network controlling GABR expression remains unclear. In adult rats, epileptic seizures has been reported to increase the levels of brain-derived neurotrophic factor (BDNF), which in turn prompted the transcription factors CREB and ICER to reduce the gene expression of the GABR α1 subunit (Lund et al. 2008). In Drosophila, it has been demonstrated that WIDE AWAKE, which regulated the onset of sleep, interacted with the GABR and upregulated its expression level (Liu et al. 2014). In Drosophila brain, circular RNA circ_sxc was found to inhibit the expression of miR-87-3p in the brain through sponge adsorption, thereby regulating the expression of neurotransmitter receptor ligand proteins, including GABR, and ensuring the normal function of synaptic signal transmission in brain neurons (Li et al. 2024). However, it remains unclear how betulin reduces the expression of MpGABR, and further research is needed.

      (b) In the Discussion section, we acknowledged the need for further research to ultimately confirm the mechanism by which betulin competes with GABA for binding to MpGABR (Lines 532-535): Although the mechanism by which betulin competes with GABA for binding to MpGABR requires further experimental validation, our work may have provided a novel target for developing insecticides.

      (c) Besides, we have added the discussion of the sensitivity of GABA receptor to betulin in Discussion section (Lines 491-501): Studies on key amino acids that are crucial for GABR function has primarily focused on transmembrane regions. For instance, based on the mutational research and Drosophila GABR modeling approach, multiple key amino acids were identified as insecticide targets in the transmembrane domain (Nakao and Banba 2021). Guo et al. proposed that amino acid substitutions in the transmembrane domain 2 contribute to terpenoid insensitivity during plant-insect coevolution (Guo et al. 2023). However, these studies have neglected the extracellular domain. Our study signified that betulin targets the THR228 site in the extracellular domain of MpGABR, which is conserved only in the Aphididae family. Therefore, betulin is speculated to be a specific insecticidal substance evolved by plants in response to aphid infestation. Besides, further verification is needed to determine whether betulin is toxic to other insect species.

      (d) Furthermore, the discussion of potential ecological risks of deploying betulin as a bioinsecticide has been elaborated in our manuscript (Lines 538-553): The development of bioinsecticides should not only focus on the toxic effects of active substance on target organisms, but also on their influence on the ecosystem (Haddi et al. 2020). Although our results indicate that betulin has specific toxicity to aphids, previous studies have reported that betulin and its derivatives had effects on Plutella xylostella L. (Huang et al. 2025), Aedes aegypti (de Almeida Teles et al. 2024), and Drosophila melanogaster (Lee and Min 2024). Therefore, further research is needed to determine whether there are other insecticidal mechanisms or off target effects of betulin. Additionally, betulin exhibits a wide range of pharmacological activities (Amiri et al. 2020), which have been used to treat various diseases, such as cancer (Lv 2023), glioblastoma (Li et al. 2022), inflammation (Szlasa et al. 2023) and hyperlipidemia (Tang et al. 2011). Before applying betulin in the field, it is necessary to fully verify and consider whether betulin has any impact on farmers' health. Furthermore, will betulin cause residue or diffusion in the process of field application? Will long-term application promote the evolution of resistance to aphids or other insects? These issues also need further experimental verification. In summary, before any field application, further research is needed on the environmental behavior, degradation process, and safety of betulin.

      Reviewer #2 (Public review):

      Summary:

      This important study shows that betulin from wild peach trees disrupts neural signaling in aphids by targeting a conserved site in the insect GABA receptor. The authors present a nicely integrated set of molecular, physiological, and genetic experiments to establish the compound's species-specific mode of action. While the mechanistic evidence is solid, the manuscript would benefit from a broader discussion of evolutionary conservation and

      potential off-target ecological effects.

      Strengths:

      The main strengths of the study lie in its mechanistic clarity and experimental rigor. The identification of a betulin-binding single threonine residue was supported by (1) site-directed mutagenesis and (2) functional assays. These experiments strongly support the specificity of action. Furthermore, the use of comparative analyses between aphids and fruit flies demonstrates an important effort to explore species specificity, and the integration of quantitative data further enhances the robustness of the conclusions.

      Weaknesses:

      There are several important limitations that need to be addressed. The manuscript does not explore whether the observed sensitivity to betulin reflects a broadly conserved feature of GABA receptors across animal lineages or a more lineage-specific adaptation. This evolutionary context is crucial for understanding the broader significance of the findings.

      In addition, while the compound's aphicidal effect is well established, the potential for off-target effects in non-target organisms - especially vertebrates - remains unaddressed, despite prior evidence that betulin interacts with mammalian GABAa receptors. There is little discussion on the ecological or environmental safety of exogenous betulin application, such as persistence, degradation, or exposure risks.

      We sincerely thank the reviewer for the time and effort dedicated to our manuscript's detailed review and assessment. The revision suggestions were constructive, and we have provided a point-by-point response to address them.

      (a) Briefly introduce the evolutionary conservation of GABA receptors has been added in the Introduction (Lines 90-98): Previous study has proposed that vertebrate and human GABR genes maintain a broad and conservative gene clustering pattern, while in invertebrates, this pattern is missing, indicating that these gene clusters formed early in vertebrate evolution and were established after diverging from invertebrates. Notably, invertebrates each possess a unique GABR gene pair, which are homologous with human GABR α and β subunits, suggesting that the existing GABR gene cluster evolved from an ancestral α - β subunit gene pair (Tsang et al. 2006). During the coevolution of plants and insects, the duplications and amino acid substitutions in GABR may be beneficial for the adaptation to insecticides and terpenoid compounds (Guo et al. 2023).

      (b) We have added the discussion of the sensitivity of GABA receptor to betulin in Discussion section (Lines 491-501): Studies on key amino acids that are crucial for GABR function has primarily focused on transmembrane regions. For instance, based on the mutational research and Drosophila GABR modeling approach, multiple key amino acids were identified as insecticide targets in the transmembrane domain (Nakao and Banba 2021). Guo et al. proposed that amino acid substitutions in the transmembrane domain 2 contribute to terpenoid insensitivity during plant-insect coevolution (Guo et al. 2023). However, these studies have neglected the extracellular domain. Our study signified that betulin targets the THR228 site in the extracellular domain of MpGABR, which is conserved only in the Aphididae family. Therefore, betulin is speculated to be a specific insecticidal substance evolved by plants in response to aphid infestation. Besides, further verification is needed to determine whether betulin is toxic to other insect species.

      (c) The discussion of potential ecological risks of deploying betulin as a bioinsecticide has been elaborated in our manuscript (Lines 538-553): The development of bioinsecticides should not only focus on the toxic effects of active substance on target organisms, but also on their influence on the ecosystem (Haddi et al. 2020). Although our results indicate that betulin has specific toxicity to aphids, previous studies have reported that betulin and its derivatives had effects on Plutella xylostella L. (Huang et al. 2025), Aedes aegypti (de Almeida Teles et al. 2024), and Drosophila melanogaster (Lee and Min 2024). Therefore, further research is needed to determine whether there are other insecticidal mechanisms or off target effects of betulin. Additionally, betulin exhibits a wide range of pharmacological activities (Amiri et al. 2020), which have been used to treat various diseases, such as cancer (Lv 2023), glioblastoma (Li et al. 2022), inflammation (Szlasa et al. 2023) and hyperlipidemia (Tang et al. 2011). Before applying betulin in the field, it is necessary to fully verify and consider whether betulin has any impact on farmers' health. Furthermore, will betulin cause residue or diffusion in the process of field application? Will long-term application promote the evolution of resistance to aphids or other insects? These issues also need further experimental verification. In summary, before any field application, further research is needed on the environmental behavior, degradation process, and safety of betulin.

      Reviewer #1 (Recommendations for the authors):

      (1) L28 Provide the full name of MST.

      Thanks for your suggestion. The full name of MST, microscale thermophoresis, has been supplied.

      (2) L87 in the Order Hemiptera.

      Thanks for your suggestion. Corrected.

      (3) L99 "Leaf bioassay" would be better to differentiate the greenhouse and field bioassays.

      Thanks for your suggestion. Corrected.

      (4) L104 It should be 7 doses, including the "0 mg/mL" control.

      Thanks for your suggestion. Corrected.

      (5) L104 Since the LC50 of pymetrozine is 1.0612 mg/mL, a wider range of doses should have been tested compared to the dose range of betulin.

      Thanks for your comment.

      (a) Firstly, seven doses (0, 0.0625, 0.125, 0.25, 0.5, 1, and 2 mgmL<sup>-1</sup>) were set to calculate the LC50 of betulin and pymetrozine. Since the LC50 values of betulin and pymetrozine are 0.1641 and 1.0612 mgmL<sup>–1</sup>, respectively, which are within the set range, indicating that the set dose range is reasonable and the LC50 values of betulin and pymetrozine are reliable.

      (b) To compare the control effects of betulin and pymetrozine against M. persicae, LC50 of betulin (0.1641 mgmL<sup>-1</sup>) and pymetrozine (1.0612 mgmL<sup>-1</sup>) were used to treat M. persicae.

      (6) L109 Greenhouse and field bioassays.

      Thanks for your suggestion. Corrected.

      (7) L112 Tween-80 and acetone in L103. Keep the order consistent throughout the manuscript.

      Thanks for your suggestion. Corrected.

      (8) L122 Mortality was recorded at 1, 5, 9, and 14 days after treatment. Revise the other similar mistakes throughout the manuscript (e.g. L250, L254, L255, L256, L259, etc.).

      Thanks for your suggestion. Corrected.

      (9) L126 apterous instead of wingless (keep a consistent expression).

      Thanks for your suggestion. Corrected.

      (10) L138 Primer Premier?

      Thanks for your comment. Corrected.

      (11) L141 Add RPS18 primers in Table S2.

      Thanks for your comment. Corrected.

      (12) L155 MEGA7 vs. MEGAX (as described in the Figure 4 caption).

      Thanks for your comment. Corrected.

      (13) L156 NJ method vs. ML method (as described in the Figure 4 caption).

      Thanks for your comment. Corrected.

      (14) L157 2.7. RNAi assay (Remove "In vitro" and re-number the following M&M sections accordingly).

      Thanks for your comment. Corrected.

      (15) L163 Add dsGFP primers in Table S2.

      Thanks for your comment. Corrected.

      (16) L166 apterous instead of wingless (keep a consistent expression).

      Thanks for your comment. Corrected.

      (17) L172 Add the source of pET-B2M vector.

      pET-B2M vector was obtained from BGI (Shenzhen, China), which has been added in our manuscript (Line 194).

      (18) L195 coding sequence instead of cDNA.

      Thanks for your comment. Corrected.

      (19) L198 the mutations of R224A ...

      Thanks for your comment. Corrected.

      (20) L199 TYR), or T228R ...

      Thanks for your comment. Corrected.

      (21) L211 and 90 ng.

      Thanks for your comment. Corrected.

      (22) L213 genomic DNA instead of gDNA, because gDNA may be confused in the context of sgRNA.

      Thanks for your suggestion. Corrected.

      (23) L253 (Fig. 1A-B).

      Thanks for your comment. Corrected.

      (24) L268 Explain why these 15 DEGs were selected for qRT-PCR.

      Thanks for your comment. These 15 DEGs were randomly selected and act as representative DEGs with different expression levels. The reason for selection of these 15 DEGs were added in the manuscript (Lines 295-296).

      (25) L287 What about GABRB? It has a TM domain.

      GABRB refers to “gamma-aminobutyric acid receptor subunit beta-like” annotated on NCBI. Theoretically, it should contain four transmembrane structural domains, while it has only one, indicating that it is incomplete.

      (26) L297 Add dsGFP as another control group.

      Thanks for your comment. Corrected.

      (27) L299 increased by 30.44% (Remove a comma).

      Thanks for your comment. Corrected.

      (28) L308 XM_022318019.1 (or protein accession number with XP_).

      Thanks for your comment. Corrected.

      (29) L338 that THR228 was conserved only in Hemiptera.

      Thanks for your comment. Since our original intention was to emphasize that THR228 is the only conserved among the four key amino acid residues, after careful consideration, we retained the expression "only THR228".

      (30) L342 or T228R.

      Thanks for your comment. Corrected.

      (31) L382 Is pyrhidone a general name for pymetrozine?

      Thanks for your comment. Corrected.

      (32) L450 Remove "and so on".

      Thanks for your comment. Corrected.

      (33) Figure 1D: Remove "Environment friendly". Replace the plant pot image on the right side with the one sprayed with pymetrozine, like the one in Figure 1F.

      Thanks for your comment. 

      (a) "Environment friendly" in Figure 1D has been removed.

      (b) We have attempted to modify the Figure 1D according to your suggestion. However, the modified Figure 1D is similar to Figure 1F and appears monotonous. Therefore, we have retained the original framework of Figure 1D.

      (34) Figure 2E 111036117 and 111041856 are in different IDs (XM_). I suggest keeping GeneID in Figure 2E and Table S2, as shown in Table S4.

      Thanks for your comment. Corrected.

      (35) Figure 2H: Add unit of the heatmap values. Or just add the title (e.g., expression level) on top of the bar.

      Thanks for your comment. Corrected.

      (36) Figure 3A: Add "aa" next to 700.

      Thanks for your comment. Corrected.

      (37) Figure 3E-G: Revise the tick marks on Y-axis: 0.0, 0.5, 1.0, and 1.5.

      Thanks for your comment. Corrected.

      (38) Figure 5C: Remove "1" and move "WT" up to the position where "1" was.

      Thanks for your comment. Corrected.

      (39) Figure 5D: Revise the tick marks on the Y-axis: 0.0, 0.5, 1.0, and 1.5.

      Thanks for your comment. Corrected.

      (40) Figure 5E: Remove the decimal. (e.g. 5 uM, 10 uM, 20 uM, etc.).

      Thanks for your comment. Corrected.

      (41) Figure 6B: What are the numbers next to the amino acid sequences? Provide the information in the figure caption.

      Thanks for your comment. The numbers next to the amino acid indicates the site of the last residue of the key amino acids, which was supplied in the figure caption.  

      (42) Figure 6D: Revise the tick marks on the Y-axis: 0.0, 0.5, 1.0, and 1.5. The X-axis title should be betulin (see Figure 5D). In the figure caption at the 5th row from the top, R244A should be R224A.

      Thanks for your comment. Corrected.

      (43) Figure 7E: R122T (not R1272T).

      Thanks for your comment. Corrected.

      (44) Supplementary Figure 1: It should be Figure S1. Add dsGFP in the figure caption.

      Thanks for your comment. Corrected.

      (45) Figure S2: What are the two pink bars and the other bars in brown or blue? Add an appropriate explanation in the figure caption.

      Thanks for your comment. Corrected.

      (46) Table S1: r square?

      Thanks for your comment. It is “r square” and corrected.

      (47) Table S2: (a) Add horizontal lines to separate qPCR, RNAi, cloning, and heterologous expression from each other (b) Replace XM_022318017.1 and XM_022318019.1 with their corresponding GeneIDs, as shown in Table S4. (c) AK340444.1 is a sequence from another aphid (Acyrthosiphon pisum)-Revise it. (d) In the cloning primers, place MpGABR first, followed by MpGABRAP and MpGABRB, as shown in the manuscript and Table S5. (e) Also, in the cloning primers, MpGABRB and MpGABRAP use reverse primers without stop codon, while MpGABR uses stop codon (TCA = TGA in reverse)-Revise it accordingly. Otherwise, provide the reason.

      Thanks for your comment. Corrected.

      (48) Table S3: (a) Add "Drosophila melanogaster" and the target sequence ID in the table caption. Is it KF881792.1, as shown in Table S6? (b) Align the sequences to the left side. 

      Thanks for your comment. 

      (a) The GenBank number of target sequence is KF881792.1 (Drosophila melanogaster). We have added this information in the Table S3 note.

      (b) It has been adjusted according to your suggestion.

      (49) Table S5: (a) Replace the accession numbers with GeneID, as shown in Table S4. K340444.1 is a sequence from another aphid (Acyrthosiphon pisum), (b) Coding sequences with stop codon are 2082, 357, and 753, respectively, while the sequences without stop codon are 2079, 354, and 750, respectively. The lengths of the deduced amino acids are 693, 118, and 250. Revise accordingly.

      Thanks for your comment. Corrected.

      (50) Table S6: (a) Use GenBank No for protein sequences. There is no Gene ID in this table. (b) Order (instead of Class). (c) See my comment on the phylogenetic analysis above.

      Thanks for your comment. Corrected.

      (51) Table S7 (a) Add unit under "Binding Energy". (b) There are two ALA226 [Alkyl] with two different distances. (c) PHE227 at the bottom should be THR228?

      Thanks for your comment.

      (a) The unit of "Binding Energy" was kcalmol<sup>–1</sup>, and it was added in the table caption.

      (b) Refer to Figure 6A, there were two Alkyl interaction between ALA226 and betulin. Therefore, there were two ALA226 [Alkyl] with two different distances.

      (c) Similarly, there were two Pi-Alkyl interactions between PHE227 and betulin. Thus, there were two rows of PHE227 in the table.

      (52) Table S9 (a) R117T should be R122T. (b) r square?

      Thanks for your comment. a and b Corrected.

      Reviewer #2 (Recommendations for the authors):

      (1) Introduction

      (a) It lacks a deeper biological and evolutionary framing of the GABA receptor system. As GABA receptors are highly conserved across animal taxa, the observed interaction between betulin and the aphid GABA receptor could have broader implications. This possibility is not addressed in the current version, which limits the reader's appreciation of the relevance of this mode of action.

      (b) Previous reports of betulin activity in mammalian systems are not mentioned in the introduction, even though they are directly relevant to concerns about off-target toxicity. Therefore, the introduction should be revised to (i) briefly introduce the evolutionary conservation of GABA receptors, and (ii) acknowledge that betulin may affect a broader range of organisms, which sets up the need for caution in its application.

      Thanks for your important suggestions.

      (a) Briefly introduce the evolutionary conservation of GABA receptors has been added in the Introduction (Lines 90-98): Previous study has proposed that vertebrate and human GABR genes maintain a broad and conservative gene clustering pattern, while in invertebrates, this pattern is missing, indicating that these gene clusters formed early in vertebrate evolution and were established after diverging from invertebrates. Notably, invertebrates each possess a unique GABR gene pair, which are homologous with human GABR α and β subunits, suggesting that the existing GABR gene cluster evolved from an ancestral α - β subunit gene pair (Tsang et al. 2006). During the coevolution of plants and insects, the duplications and amino acid substitutions in GABR may be beneficial for the adaptation to insecticides and terpenoid compounds (Guo et al. 2023).

      (b) The possible effects of betulin on a broader range of organisms have been acknowledged in the Introduction section (Lines 68-77): An immune stimulant, Ir-Bet, was prepared using iridium complex and betulin, which evoked ferritinophagy-enhanced ferroptosis, thereby activating anti-tumor immunity (Lv 2023). The anti-inflammatory effect of betulin has been reported in macrophages at lymphoma site in mice (Szlasa et al. 2023). Betulin has been found to improve hyperlipidemia and insulin resistance and decrease atherosclerotic plaques by inhibiting the maturation of sterol regulatory element-binding protein (Tang et al. 2011). Besides, betulin and its derivatives have been found to exhibit insecticidal activity against Plutella xylostella L. (Huang et al. 2025), Aedes aegypti (de Almeida Teles et al. 2024), and Drosophila melanogaster (Lee and Min 2024).

      (c) At the end of the introduction, we remind that betulin should be used with caution (Lines 111-112): However, given that betulin may affect a wider range of organisms, it should be used with caution.

      (2) Method

      Number of biological replicates in all assays and justification of thresholds used for significance in RNAi and survival experiments are not addressed in the manuscript.

      Thanks for your careful reading. We have checked Materials and Methods section and added corresponding number of biological replicates in all assays. Besides, the p-values for the corresponding significance analyses of RNAi and survival experiments have been added to our Manuscript.

      (2)  Discussion

      (a) Consistent with the comments on the Introduction, the absence of discussion on (i) the evolutionary conservation of GABA receptor sensitivity to betulin, (ii) potential off-target effects in non-target insects and vertebrates (if so, this cannot be use for "eco-friendly pesticide" as the authors stated in the manuscript), and (iii) ecological risks associated with the exogenous application of betulin limits both the interpretive depth and applied relevance of the study.

      (b) To strengthen the Discussion, the authors should consider addressing: (i) whether the observed sensitivity reflects a conserved pharmacological vulnerability across animal taxa or a lineage-specific adaptation; (ii) the potential ecological risks of deploying betulin as a bioinsecticide, and (iii) the need for future research into the environmental fate, degradation, and safety profile of betulin prior to any field-level application.

      Thank you for your valuable comments.

      (a) We have added the discussion of the sensitivity of GABA receptor to betulin in Discussion section (Lines 491-501): Studies on key amino acids that are crucial for GABR function has primarily focused on transmembrane regions. For instance, based on the mutational research and Drosophila GABR modeling approach, multiple key amino acids were identified as insecticide targets in the transmembrane domain (Nakao and Banba 2021). Guo et al. proposed that amino acid substitutions in the transmembrane domain 2 contribute to terpenoid insensitivity during plant-insect coevolution (Guo et al. 2023). However, these studies have neglected the extracellular domain. Our study signified that betulin targets the THR228 site in the extracellular domain of MpGABR, which is conserved only in the Aphididae family. Therefore, betulin is speculated to be a specific insecticidal substance evolved by plants in response to aphid infestation. Besides, further verification is needed to determine whether betulin is toxic to other insect species.

      (b) The discussion of potential ecological risks of deploying betulin as a bioinsecticide has been elaborated in our manuscript (Lines 538-551): The development of bioinsecticides should not only focus on the toxic effects of active substance on target organisms, but also on their influence on the ecosystem (Haddi et al. 2020). Although our results indicate that betulin had specific toxicity to aphids, previous studies have reported that betulin and its derivatives had effects on Plutella xylostella L. (Huang et al. 2025), Aedes aegypti (de Almeida Teles et al. 2024), and Drosophila melanogaster (Lee and Min 2024). Therefore, further research is needed to determine whether there are other insecticidal mechanisms or off target effects of betulin. Additionally, betulin exhibits a wide range of pharmacological activities (Amiri et al. 2020), which have been used to treat various diseases, such as cancer (Lv 2023), glioblastoma (Li et al. 2022), inflammation (Szlasa et al. 2023) and hyperlipidemia (Tang et al. 2011). Before applying betulin in the field, it is necessary to fully verify and consider whether betulin has any impact on farmers' health. Furthermore, will betulin cause residue or diffusion in the process of field application? Will long-term application promote the evolution of resistance to aphids or other insects? These issues also need further experimental verification. 

      (c) Additionally, at the end of the Discussion, we remind that more research is needed before any field application of betulin (Lines 551-553): In summary, before any field application, further research on the environmental behavior, degradation process, and safety of betulin is needed.

      Reference

      Amiri S, Dastghaib S, Ahmadi M, Mehrbod P, Khadem F, Behrouj H, Aghanoori M, Machaj F, Ghamsari M, Rosik J, Hudecki A, Afkhami A, Hashemi M, Los M, Mokarram P, Madrakian T, Ghavami S. 2020. Betulin and its derivatives as novel compounds with different pharmacological effects. Biotechnology Advances 38: 107409.

      de Almeida Teles AC, dos Santos BO, Santana EC, Durço AO, Conceição LSR, Roman Campos D, de Holanda Cavalcanti SC, de Souza Araujo AA, dos Santos MRV. 2024.

      Larvicidal activity of terpenes and their derivatives against Aedes aegypti: a systematic review and meta-analysis. Environmental Science and Pollution Research 31: 64703-64718.

      Guo L, Qiao X, Haji D, Zhou T, Liu Z, Whiteman NK, Huang J. 2023. Convergent resistance to GABA receptor neurotoxins through plant–insect coevolution. Nature Ecology & Evolution 7: 1444-1456.

      Haddi K, Turchen LM, Viteri Jumbo LO, Guedes RN, Pereira EJ, Aguiar RW, Oliveira EE. 2020. Rethinking biorational insecticides for pest management: unintended effects and consequences. Pest Management Science 76: 2286-2293.

      Huang X, Hao N, Shu L, Wei Z, Shi J, Tian Y, Chen G, Yang X, Che Z. 2025. Preparation and insecticidal activities of betulin-cinnamic acid-related hybrid compounds and insights into the stress response of Plutella xylostella L. Pest Management Science 81: 4243-4255.

      Lee HY, Min KJ. 2024. Betulinic acid increases the lifespan of Drosophila melanogaster via Sir2 and FoxO activation. Nutrients 16: 441.

      Li Q, Wang L, Tang C, Wang X, Yu Z, Ping X, Ding M, Zheng L. 2024. Adipose tissue exosome circ_sxc mediates the modulatory of adiposomes on brain aging by inhibiting brain dme-miR-87-3p. Molecular Neurobiology 61: 224-238.

      Li Y, Wang Y, Gao L, Tan Y, Cai J, Ye Z, Chen A, Xu Y, Zhao L, Tong S, Sun Q, Liu B, Zhang S, Tian D, Deng G, Zhou J, Chen Q. 2022. Betulinic acid self-assembled nanoparticles for effective treatment of glioblastoma. Journal of Nanobiotechnology 20: 39.

      Liu S, Lamaze A, Liu Q, Tabuchi M, Yang Y, Fowler M, Bharadwaj R, Zhang J, Bedont J,

      Blackshaw S, Lloyd Thomas E, Montell C, Sehgal A, Koh K, Wu Mark N. 2014. WIDE AWAKE mediates the circadian timing of sleep onset. Neuron 82: 151-166.

      Lund IV, Hu Y, Raol YH, Benham RS, Faris R, Russek SJ, Brooks Kayal AR. 2008. BDNF selectively regulates GABAA receptor transcription by activation of the JAK/STAT pathway. Science Signaling 1: ra9.

      Lv M, Zheng Y, Wu J, Shen Z, Guo B, Hu G, Huang Y, Zhao J, Qian Y, Su Z, Wu C, Xue X, Liu H, Mao Z. 2023. Evoking ferroptosis by synergistic enhancement of a cyclopentadienyl iridium-betulin immune agonist. Angewandte Chemie International Edition 62: e202312897.

      Nakao T, Banba S. 2021. Important amino acids for function of the insect Rdl GABA receptor. Pest Management Science 77: 3753-3762.

      Pope SD, Medzhitov R. 2018. Emerging principles of gene expression programs and their regulation. Molecular Cell 71: 389-397.

      Szlasa W, Ślusarczyk S, Nawrot Hadzik I, Abel R, Zalesińska A, Szewczyk A, Sauer N, Preissner R, Saczko J, Drąg M, Poręba M, Daczewska M, Kulbacka J, Drąg Zalesińska M. 2023. Betulin and its derivatives reduce inflammation and COX-2 cctivity in macrophages. Inflammation 46: 573-583.

      Tang JJ, Li JG, Qi W, Qiu WW, Li PS, Li BL, Song BL. 2011. Inhibition of SREBP by a small molecule, betulin, improves hyperlipidemia and insulin resistance and reduces atherosclerotic plaques. Cell Metabolism 13: 44-56.

      Tsang SY, Ng SK, Xu Z, Xue H. 2006. The evolution of GABAA receptor–like genes. Molecular Biology and Evolution 24: 599-610.

    1. eLife Assessment

      This study presents a valuable finding about how receptor-ligand binding pathways with multi-site phosphorylation can show non-monotonic responses to increasing ligand affinity and to kinase activity. The authors provide compelling evidence through a simple ordinary differential equation model of such signaling networks with the key new ingredient of ligand-induced receptor degradation. The work will be of interest to physicists and biologists working on signal transduction and biological information processing.

    2. Reviewer #1 (Public review):

      Summary:

      The authors study the steady-state solutions of ODE models for molecular signaling involving ligand binding coupled to multi-site phosphorylation at saturating ligand concentrations. Although the results are in principle general, the work highlights the receptor tyrosine kinases (RTK) as model systems. After presenting previous ODE model solutions, the authors present their own "kinetic sorting" model, which is distinguished by ligand-induced phosphorylation-dependent receptor degradation and the property that every phosphorylation state is signaling competent. The authors show that this model recovers the two types of non-monotonicity experimentally reported for RTKs: maximum activity for intermediate ligand affinity and maximum activity for intermediate kinase activity.

      The main contribution of the work is in demonstrating that their model can capture both types of non-monotonicity, whereas previous models could at most capture non-monotonicity of ligand binding.

      Strengths:

      The question of how energy dissipating, and thus non-equilibrium, molecular systems can achieve steady-state solutions not accessible to equilibrium systems is of fundamental importance in biomolecular information processing and self-organization. Although the authors do not address the energy requirements of their non-equilibrium model, their comparative analysis of different alternative non-equilibrium models provides insight into the design choices necessary to achieve non-monotonic control, a property that is inaccessible at equilibrium.

      The paper is succinctly written and easy to follow, and the authors achieve their aims by providing convincing numerical solutions demonstrating non-monotonicity over the range of parameter values encompassing the biologically relevant regime.

      Weaknesses:

      (1) A key motivating framework for this work is the argument that the ability to tune to recognize intermediate ligand affinities provides a control knob for signal selection that is available to non-equilibrium systems. As such, this seems like a compelling type of ligand selectivity, which is a question of broad interest. However, as the authors note in the results, the previously published "limited signaling model" already achieves such non-monotonicity to ligand binding affinity. The introduction and abstract do not clearly delineate the new contributions of the model.

      The novel benefit of the model introduced by the authors is that it also achieves non-monotonic response to kinase activity. Because such non-monotonicity is observed for RTK, this would make the authors' model a better fit for capturing RTK behavior. However, the broad significance of achieving non-monotonicity to kinase activity is not motivated or supported by empirical evidence in the paper. As such, the conceptual significance of the modified model presented by the authors is not clear.

      UPDATE: The authors have now clarified the significance of the model in elucidating how known motifs (multisite phosphorylation and active receptor degradation) could explain the behavior, including non-monotonicity. The authors have also provided compelling arguments for the biological significance of achieving non-monotonic kinase activity response.

      (2) Whereas previous models used in the literature are schematized in Figure 1, the model proposed by the author is missing (See line 97 of page 3). Without the schematic, the text description of the model is incomplete.

      UPDATE: this issue has been resolved.

      (3) The authors use the activity of the first phosphorylation site as the default measure of activity. This choice needs to be justified. Why not use the sum of the activities at all sites?

      UPDATE: This was a non-issue. The potential misunderstanding has been mitigated by clarifications in the text.

      Comments on revisions:

      All issues previously identified were convincingly addressed. I have no additional suggestions.

    3. Reviewer #2 (Public review):

      Summary:

      In classical models of signaling network, the signaling activity increases monotonically with the ligand affinity. However, certain receptors prefer ligands of intermediate affinity. In the paper, the authors present a new minimal model to derive generic conditions for ligand specificity. In brief, this requires multi-site phosphorylation and that high-affinity complexes be more prone to degrade. This particular type of kinetic discrimination allows to overcome equilibrium constraints.

      Strengths:

      The model is simple, and it adds only a few parameters to classical generic models. They moreover vary these additional parameters in ranges based on experimental observations. They explain how the introduction of these new parameters is essential to ligand specificity. Their model quantitatively reproduces the ligand specificity of a certain receptor. They finally provide testable prediction.

      Weaknesses:

      The naming of multiple variables as activity without precise definitions may be confusing to readers.

      Comments on revisions:

      I thank the authors for addressing my comments. One point remains regarding the naming of multiple variables as activity. Besides using other words, the authors may consider giving precise definitions of terms, e.g. by writing "We define kinase activity as the phosphorylation rate $\omega=k_p\tau$." A connection that appears only at line 204 in the present manuscript.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The authors study the steady-state solutions of ODE models for molecular signaling involving ligand binding coupled to multi-site phosphorylation at saturating ligand concentrations. Although the results are in principle general, the work highlights the receptor tyrosine kinases (RTK) as model systems. After presenting previous ODE model solutions, the authors present their own "kinetic sorting" model, which is distinguished by ligand-induced phosphorylationdependent receptor degradation and the property that every phosphorylation state is signaling competent. The authors show that this model recovers the two types of non-monotonicity experimentally reported for RTKs: maximum activity for intermediate ligand affinity and maximum activity for intermediate kinase activity.

      The main contribution of the work is in demonstrating that their model can capture both types of non-monotonicity, whereas previous models could at most capture non-monotonicity of ligand binding.

      Strengths:

      The question of how energy-dissipating, and thus non-equilibrium, molecular systems can achieve steady-state solutions not accessible to equilibrium systems is of fundamental importance in biomolecular information processing and self-organization. Although the authors do not address the energy requirements of their non-equilibrium model, their comparative analysis of different alternative non-equilibrium models provides insight into the design choices necessary to achieve non-monotonic control, a property that is inaccessible at equilibrium.

      The paper is succinctly written and easy to follow, and the authors achieve their aims by providing convincing numerical solutions demonstrating non-monotonicity over the range of parameter values encompassing the biologically relevant regime.

      Weaknesses:

      (1) A key motivating framework for this work is the argument that the ability to tune to recognize intermediate ligand affinities provides a control knob for signal selection that is available to nonequilibrium systems. As such, this seems like a compelling type of ligand selectivity, which is a question of broad interest. However, as the authors note in the results, the previously published "limited signaling model" already achieves such non-monotonicity in ligand binding affinity. The introduction and abstract do not clearly delineate the new contributions of the model.

      We thank the reviewer for this comment. We apologize for any unclear language on our part. The purpose of our work was not to identify the unique reaction scheme to obtain nonmonotonic dependence of network activity on ligand affinity and kinase activity. Rather, we were interested in exploring how such a dependence could arise from the interplay between two ubiquitous network motifs (multisite phosphorylation and active receptor degradation). Notably, as the reviewer later points out, previous models that incorporate only multisite phosphorylation only capture the non-monotonic dependence of network activity on ligand affinity and not kinase/phosphatase activity. We have now clarified this in the abstract (lines 14-16) and the introduction (lines 55-59). 

      The novel benefit of the model introduced by the authors is that it also achieves a nonmonotonic response to kinase activity. Because such non-monotonicity is observed for RTK, this would make the authors' model a better fit for capturing RTK behavior. However, the broad significance of achieving non-monotonicity to kinase activity is not motivated or supported by empirical evidence in the paper. As such, the conceptual significance of the modified model presented by the authors is not clear.

      We thank the reviewer for this comment. We agree that the ability of our model to reproduce non-monotonic dependence on kinase/phosphatase activity was not sufficiently motivated in the original submission. We have now added a brief mention of the biological motivation for nonmonotonic kinase activity in the discussion (lines 229-247) to describe the potential biological significance of this behavior. In particular, non-monotonic kinase/phosphatase dependence may act as a safeguard, filtering out signaling cells with abnormally elevated kinase activity or suppressed phosphatase activity. In the presence of non-monotonic dependence on network activity, downstream signaling would remain contingent on extracellular cues, and cells with extreme kinase/phosphatase imbalances would fail to signal. This could prevent persistent, cueindependent activation, an especially important protective mechanism in pathways regulating metabolically taxing functions such as growth, proliferation, or mounting immune responses. Although direct experimental evidence for the widespread use of this mechanism is currently scarce, our motivation is supported both by the presence of similar regulatory behaviors of phosphatases which arise through distinct mechanisms (such as CD45 in T-cell receptor signaling, (Weiss, 2019)), but highlight the potential biological use of this strategy and by theoretical work on phosphorylation-dephosphorylation cycles, which demonstrates a similar effect in more general settings (Swain, 2013).

      (2) Whereas previous models used in the literature are schematized in Figure 1, the model proposed by the authors is missing (see line 97 of page 3). Without the schematic, the text description of the model is incomplete.

      We thank the reviewer for identifying this oversight, it has been corrected. See Figure 3 in the new text. 

      (3) The authors use the activity of the first phosphorylation site as the default measure of activity. This choice needs to be justified. Why not use the sum of the activities at all sites?

      We thank the reviewer for this comment. We in fact study all sites (Figure 5A in the resubmitted manuscript). Notably, as suggested by the reviewer, the concentration of the first site is indeed represented by the sum of concentrations of all phosphorylated species. The concentration of the 2<sup>nd</sup> site is represented by the sum of concentrations of all species except for the first one and so on (lines 153-155). 

      Reviewer #2 (Public review):

      Summary:

      In classical models of signaling networks, the signaling activity increases monotonically with the ligand affinity. However, certain receptors prefer ligands of intermediate affinity. In the paper, the authors present a new minimal model to derive generic conditions for ligand specificity. In brief, this requires multi-site phosphorylation and that high-anity complexes be more prone to degrade. This particular type of kinetic discrimination allows for overcoming equilibrium constraints.

      Strengths:

      The model is simple, and it adds only a few parameters to classical generic models. Moreover, the authors vary these additional parameters in ranges based on experimental observations. They explain how the introduction of these new parameters is essential to ligand specificity. Their model quantitatively reproduces the ligand specificity of a certain receptor. Finally, they provide a testable prediction.

      Weaknesses:

      The naming of certain variables may be confusing to readers.

      We apologize for the confusion due to unclear presentation. We have clarified our definitions throughout the manuscript. 

      Reviewer #1 (Recommendations for the authors):

      (1) The abstract and introduction present the problem as if this model is solving the fundamental problem of non-monotonic dependence on ligand affinity. However, as the authors noted in their results, this problem has already been solved by a previous phosphorylation model with N-state degradation. What the authors' new model achieves is the additional experimentally observed non-monotonicity of kinase activity dependence. The abstract and introduction should be changed to reflect the actual novel contributions and also to motivate the biological significance of non-montonic kinase activity dependence.

      We thank the reviewer for this comment. We apologize for any unclear language on our part. The purpose of our work was not to identify the unique reaction scheme to obtain nonmonotonic dependence of network activity on ligand affinity and kinase activity. Rather, we were interested in exploring how such a dependence could arise from two ubiquitous network motifs (multisite phosphorylation and active receptor degradation). Notably, as the reviewer later points out, previous models that incorporate only multisite phosphorylation only capture the nonmonotonic dependence of network activity on ligand affinity and not kinase/phosphatase activity. We have now clarified this in the abstract (lines 14-16) and the introduction (lines 55-59). We have also provided biological motivation behind nonmonotonic kinase activity dependance (lines 229-247). 

      (2) It is important to show (in the supplemental materials if needed) that the closest equilibrium analog to the model (for example, reversible rate constants from each of the activated states to an inactive state) does not achieve non-monotonicity with ligand affinity.

      We have added a model in the supplementary materials that represents a detailed balance Markov chain. In the model, we imagine that ligand bound receptors undergo a series of equilibrium transitions, all characterized by the same activation and inactivation rate. We show that at saturating ligand levels, the signaling output only depends on the ratio of the activation to the inactivation rate (i.e., the thermodynamic stability of the active site) (lines 466-488).

      (3) Schematics for earlier models are described in Figure 1. However, no schematic for the actual model proposed by the authors is shown. This should be added as a subpanel to Figure 1.

      We thank the reviewer for identifying our omission of our model schematic. We have included our model schematic as its own figure (Figure 3).

      (4) Minor: Figure 1 is referred to as Figure?? In line 97 of page 3.

      We thank the reviewer for identifying this error, it has been corrected. 

      Reviewer #2 (Recommendations for the authors):

      (1) There is an inconsistency between Figure 2(a) and Equation (1), it suggests that p_N is \omega^N/(\omega+\delta)^N. This makes more sense with the model defined in the supplementary material.

      We thank the reviewer for identifying this error. Equation (1) has been updated to reflect the correct relationship.

      (2) The figure presenting the model of the authors appears to be missing.

      We thank the reviewer for identifying this error, it has been corrected (Figure 3 in the new manuscript). 

      (3) The authors describe phosphorylation as irreversible in the intro, but then consider reversible phosphorylation in their model, which may be confusing to readers.

      We thank the reviewer for identifying this source of possible confusion. We have clarified that dephosphorylation is taken to be a distinct irreversible reaction, see lines 105 - 112.

      (4) The authors reuse similar names, e.g., network activity, kinase activity, signaling activity, activity. This is confusing.

      We apologize for the confusion. We note that, within the context of our model, there are important distinctions between signaling activity (the amount of signaling competent receptors) and kinase activity (value corresponding to the phosphorylation rate). We have attempted to use these different terms correctly and are happy to make clarifying corrections if there are any places where a term is misused.  

      (5) Several parameters are defined only in the captions of the figures, such as \beta and \rho.

      We thank the reviewer for identifying this omission, we have added the definitions of beta and rho to the main text (see line 129). 

      (6) The sentence at line 137 lacks some words: "Below, we kinetic...".

      We thank the reviewer for identifying this error, we have added the missing words (“Below, we show how kinetic…”).

      (7) The sentence at line 183 lacks some words: "When kinase activity...".

      We thank the reviewer for identifying this error. We have now corrected it. 

      (8) Figure 5 is very small.

      We will work with the production team to increase the size of this figure.

    1. eLife Assessment

      This important study characterizes and validates a new activity marker - fast labelling of engram neurons (FLEN) - which is transiently active and driven by cFos, allowing the monitoring of intrinsic and synaptic properties of engram neurons shortly after the learning experience. The results convincingly demonstrate the utility of this novel viral tool for studying early changes in the properties of engram cells. FLEN will provide a beneficial tool for the neuroscience community once it is made available at a plasmid repository.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript by Cupollilo et al describes the development, characterization and application of a novel activity labeling system; fast labelling of engram neurons (FLEN). Several such systems already exist but this study adds additional capability by leveraging an activity marker that is destabilized (and thus temporally active) as well as being driven by the full-length promoter of cFos. The authors demonstrate the activity dependent induction and timecourse of expression, first in cultured neurons and then in vivo in hippocampal CA3 neurons after one trial contextual fear conditioning. In a series of ex vivo experiments the authors perform patch clamp analysis of labeled neurons to determine if these putative engram neurons differ from non-labelled neurons using both the FLEN system as well as the previously characterized RAM system. Interestingly the early labelled neurons at 3 h post CFC (FLEN+) demonstrated no differences in excitability whereas the RAM labeled neurons at 24h after CFC had increased excitability. Examination of synaptic properties demonstrated an increase in sEPCS and mEPSC frequencies as well as those for sIPSCs and mIPSCs which was not due to a change in the mossy fiber input to these neurons.

      Strengths:

      Overall the data is of high quality and the study introduces a new tool while also reassessing some principles of circuit plasticity in the CA3 that have been the focus of prior studies.

      Weaknesses:

      No major weaknesses were noted

    3. Reviewer #2 (Public review):

      Summary:

      Cupollilo et al. investigate the properties of hippocampal CA3 neurons that express the immediate early gene cFos in response to a single foot shock. They compare ex-vivo the electrophysiological properties of these "engram neurons" labeled with two different cFos promoter-driven green markers: Their new virally delivered tool FLEN labels neurons 2-6 h after activity, while RAM contains additional enhancers and peaks considerably later (>24 h). Since the fraction of labeled CA3 cells is comparable with both constructs, it is assumed (but not tested) that they label the same population of activated neurons at different time points. Both FLEN+ and RAM+ neurons in CA3 receive more synaptic inputs compared to non-expressing control neurons, which could be a causal factor for cFos activation, or a very early consequence thereof. Frequency facilitation and E/I ratio of mossy fiber inputs were also tested, but are not different in both cFos+ groups of neurons. One day after foot shock, RAM+ neurons are more excitable than RAM- neurons, suggesting a slow increase in excitability as a major consequence of cFos activation.

      Strengths:

      The study is conducted to high standards and contributes significantly to our understanding of memory formation and consolidation in the hippocampus. Modifications of intrinsic neuronal properties seem to be more salient than overall changes in the total number of (excitatory and inhibitory) inputs, although a switch in the source of the synaptic inputs would not have been detected by the methods employed in this study

      Weaknesses:

      The new tool FLEN is not quantitatively compared to e.g. the TetTag reporter mouse. Nevertheless, the fluorescent images of FLEN+ neurons are quite convincing.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):  

      Summary:

      The manuscript by Cupollilo et al describes the development, characterization, and application of a novel activity labeling system; fast labelling of engram neurons (FLEN). Several such systems already exist but this study adds additional capability by leveraging an activity marker that is destabilized (and thus temporally active) as well as being driven by the full-length promoter of cFos. The authors demonstrate the activity-dependent induction and time course of expression, first in cultured neurons and then in vivo in hippocampal CA3 neurons after one trial of contextual fear conditioning. In a series of ex vivo experiments, the authors perform patch clamp analysis of labeled neurons to determine if these putative engram neurons differ from non-labelled neurons using both the FLEN system as well as the previously characterized RAM system. Interestingly the early labelled neurons at 3 h post CFC (FLEN+) demonstrated no differences in excitability whereas the RAMlabelled neurons at 24h after CFC had increased excitability. Examination of synaptic properties demonstrated an increase in sEPCS and mEPSC frequencies as well as those for sIPSCs and mIPSCs which was not due to a change in the mossy fiber input to these neurons.

      Strengths:

      Overall the data is of high quality and the study introduces a new tool while also reassessing some principles of circuit plasticity in the CA3 that have been the focus of prior studies.

      Weaknesses:

      No major weaknesses were noted.

      Reviewer #2 (Public review): 

      Summary: 

      Cupollilo et al. investigate the properties of hippocampal CA3 neurons that express the immediate early gene cFos in response to a single foot shock. They compare ex-vivo the electrophysiological properties of these "engram neurons" labeled with two different cFos promoter-driven green markers: Their new tool FLEN labels neurons 2-6 h after activity, while RAM contains additional enhancers and peaks considerably later (>24 h). Since the fraction of labeled CA3 cells is comparable with both constructs, it is assumed (but not tested) that they label the same population of activated neurons at different time points. Both FLEN+ and RAM+ neurons in CA3 receive more synaptic inputs compared to non-expressing control neurons, which could be a causal factor for cFos activation, or a very early consequence thereof. Frequency facilitation and E/I ratio of mossy fiber inputs were also tested, but are not different in both cFos+ groups of neurons. One day after foot shock, RAM+ neurons are more excitable than RAM- neurons, suggesting a slow increase in excitability as a major consequence of cFos activation.

      Strengths: 

      The study is conducted to high standards and contributes significantly to our understanding of memory formation and consolidation in the hippocampus. Modifications of intrinsic neuronal properties seem to be more salient than overall changes in the total number of (excitatory and inhibitory) inputs, although a switch in the source of the synaptic inputs would not have been detected by the methods employed in this study

      Weaknesses: 

      With regard to the new viral tool, a direct comparison between the new tool FLEN and existing cFos reporters is missing. 

      Reviewer #1 (Recommendations for the authors):

      I have only minor suggestions for the authors to consider. 

      (1) In the in vitro characterization, the percentage of labelled neurons seems very low after a powerful and prolonged activation. It was somewhat surprising and raised the question of how accurately the FLEN construct reflects endogenous cFOS activity. Could the authors speak to this?

      The reviewer is correct that the level of FLEN positive neurons, as compared to mCherry positive neurons, is low as compared to studies using viral infection with RAM vectors in neuronal cultures (Sorensen et al, 2016, Sun et al, 2020), which is around 70-80% following chemical stimulation. The authors do not provide evidence however for a comparison with endogenous c-Fos activity in cell cultures. The reason for a discrepancy in the effect of chemical stimulation of cultured neurons is not clear, but may depend on culture conditions which may vary between labs. 

      FLEN was constructed using a mouse c-Fos promoter (-355 to +109) (Cen et al, 2003). To answer the reviewer’s question we performed an additional experiment in cultured neurons in which we found that 77.1 % of FLEN positive neurons were also c-fos positive neurons (using immunocytochemistry).

      (2) The authors compare the two labelling strategies and interpret their data with the presumption that both label a similar set of active neurons. This is particularly relevant when they suggest there might be a progressive increase in the excitability of active neurons with time. This is certainly a possibility, but the authors should also consider other possibilities that the two markers might label different populations of neurons. For example, if they require different thresholds for activation, it is possible that one is more sensitive to activity than the other. As these are unknown variables the authors should temper the interpretation accordingly.

      Indeed, the reviewer is correct that this limitation should be discussed. We have added this as a point of discussion in the text (line 355-358). In the article describing the RAM strategy (Sorensen et al, 2016) the authors use RAM to label DG neurons activated during an experience in a context A (Figure 4). Exploiting the fact that engram cells are re-activated when the animal is re-exposed to the same environment of training (memory recall), they performed c-Fos staining 90 minutes following either context A or context B re-exposure. The RAM-c-Fos overlap percentage was higher in A-A rather than A-B (A-A was a bit more than 20%). This means that RAM has captured a group of cells during training that, at least in part, were re-activated during recall. This could in part support the assumption that RAM and c-Fos share a certain overlap. Of course, this was done in DG, while we worked in CA3. In addition, both strategies label in their great majority c-Fos+ neurons (see above answer to point #1). This can not completely rule out the possibility that FLEN and RAM label partly distinct population of activated cells. 

      (3) An increase in the frequency of synaptic events is observed in neurons labelled with both markers. The authors propose that this may be due to an increase in synaptic contacts based on prior studies. However, as this is the first functional assessment why not consider changes in release probability as a mechanism for this finding? 

      We have added this as a possibility in the text (line 362-363).

      (4) It would be useful to include plots of the average frequency of m/sEPSCs and m/sIPSCs in Figures 4 and 5. These figures could also be combined into a single figure.

      We agree with the reviewer that figure 4 and 5 could be merged into a single figure. In the revised version, figure 5A becomes panel C in figure 4. Text and figure descriptions were adjusted accordingly.

      Reviewer #2 (Recommendations for the authors): 

      (1) Abstract, line 24: "In contrast, FLEN+ CA3 neurons show an increased number of excitatory inputs." RAM+ neurons also show an increased number of excitatory inputs, so this is not "in contrast". Also, not just excitatory, but also inhibitory synaptic inputs are more numerous in cFos+ neurons. Please improve the summary of your findings.

      “In contrast” referred to the fact that FLEN+ neurons do not show differences in excitability as compared to FLEN- neurons, as mentioned in the previous sentence. We now provide a more explicit sentence to explain this point: “On the other hand, like RAM+ neurons, FLEN+ CA3 neurons show an increased number of excitatory inputs.”

      (2) Novel tool: Destabilized cFos reporters were introduced 23 years ago and are also part of the TetTag mouse. I am not sure that changing the green fluorescent protein to a different version merits a new acronym (FLEN). To convince the readers that this is more than a branding exercise, the authors should compare the properties (brightness, folding time, stability) of FLEN to e.g. the d2EGFP reporter introduced by Bi et al. 2002 (J Biotechnol. 93(3):231) and show significant improvements.

      We thank the reviewer for this comment which compelled us to evaluate the features of other tools used to label neurons activated following contextual fear conditioing. The key properties of FLEN as compared to other tools used to label engrams is that: (i) it is a viral tool, as opposed to transgenic mice, (ii) a c-fos promoter drives the expression of a brightly fluorescent protein allowing their identification ex vivo for functional analysis, (iii) the fluorescent protein is rapidly destabilized, providing the possibility to label neurons only a few hours after their activation by a behavioural task.

      We did not find any viral tools providing the possibility to label c-fos activated neurons for functional assesment. We have not been able to find references for the use of the d2EGFP reporter introduced by Bi et al. 2002 in a behavioural context. One of the major difference and improvement is certainly the brightness of ZsGreen. In cell cultures, ZsGreen1 showed a 8.6-fold increase in fluorescence intensity as compared with EGFP (Bell et al, 2007).

      Amongst tools with comparable properties, eSARE was developed based on a synthetic Arc promoter driving the expression of a destabilized GFP (dEGFP) (Kawashima et al 2013). We initially used ESARE–dGFP but unfortunately, in our experimental conditions we found that the signal to noise ratio was not satisfactory (number of cells label in the home cage vs. following contextual fear conditining).

      We developed a viral tool to avoid the use of transgenic reporter lines which require laborious breeding and is experimentally less flexible. Nevertheless, many transgenic mice based on the expression of fluorescent proteins under the control of IEG promoters have been developed and used. Some of these mice show a time course of expression of the transgene which is comparable to FLEN. For instance, in organotypic slices from Tet-Tag mice, the time course of expression of EGFP slices follows with a small delay endogenous cFOS expression, and starts decaying after 4 hours (Lamothe-Molina et al, 2022). However, the fluorescence was too weak to visualize neurons in the slice (Christine Gee, personal communication), and imaging is perfomed after immunocytochemistry against GFP. 

      Therefore, we feel that the name given to the FLEN strategy is legitimate. The features of the FLEN strategy were summarized in the discussion (Lines 318-322).

      (3) Line 214: "...FLEN+ CA3 PNs do not show differences in [...] patterns of bursting activity as compared to control neurons." It looks quite different to me (Figure 3E). Just because low n precludes meaningful statistical analysis, I would not conclude there is no difference.

      We agree with the reviewer that the data in Figure 3E are not conclusive due to small sample size, which limits the reliability of statistical comparison. Additionally, the classification of bursting neurons is highly dependent on the specific criteria used, which vary considerably across the literature. To avoid overinterpretation or misleading conclusions, we decided to remove the panel E of Figure 3 showing the fraction of bursting neurons. Nevertheless, we draw the attention to the more robust and interpretable results: RAM⁺ neurons exhibit an increase in firing frequency and a distinct action potential discharge pattern, data which we believe are informative of altered excitability.

      (4) Line 304: Remove the time stamp.

      This was done.

      (5) Line 334: "...results may be explained by an overall increased activity of CA1 neurons..." I don't understand - isn't CA1 downstream of CA3? 

      The reviewer is correct that the sentence was misleading. We removed the reference to CA1, as it was more of a general principle about neuronal activity.

      (6) Line 381: "resolutive", better use "sensitive". 

      This was changed.

      (7) Figure S3: Fear-conditioned animals were 3 days off Dox, controls only 2 days. As RAM expression accumulates over time off Dox, this is not a fair comparison.

      We thank the reviewer for pointing out the incorrect reporting of the experimental design in Figure S3 panel A (bottom), which could lead to misinterpretation of results. In fact, the two groups of mice (CFC vs. HC) underwent all experimental steps in parallel. Specifically, both groups were maintained on and off Doxycycline for the same duration and received viral injection on the same day. 48 hours after Dox withdrawal, the CFC group was trained for contextual conditioning, while the HC group remained in the home cage in the holding room. All animals were thus sacrificed 72 hours after Dox removal. We have corrected the figure to accurately reflect this timeline.

      (8) Please provide sequence information for c-cFos-ZsGreen1-DR. Which regulatory elements of the cFos promoter are included, is the 5' NTR included? This information is very important.

      The information is now provided in the Methods section.

      (9) Please provide the temperature during pharmacological treatments (TTX etc.) before fixation.

      The pharmacological treatment was performed in the incubator at 37°C, this is now indicated in the methods.

    1. eLife Assessment

      This work derives a valuable general theory unifying theories of efficient information transmission in the brain with population homeostasis. The general theory provides an explanation for firing rate homeostasis at the level of neural clusters with firing rate heterogeneity within clusters. Applying this theory to the primary visual cortex, the authors present solid evidence that accounts for stimulus-specific and neuron-specific adaptation. Reviewers have provided additional suggestions for improving the readability of the manuscript, as well as discussing previous results on adapting coding as well as those aspects of experimental data that are not fully explained by the present theory.

    2. Reviewer #1 (Public review):

      This work derives a general theory of optimal gain modulation in neural populations. It demonstrates that population homeostasis is a consequence of optimal modulation for information maximization with noisy neurons. The developed theory is then applied to the distributed distributional code (DDC) model of the primary visual cortex to demonstrate that homeostatic DDCs can account for stimulus specific adaptation.

      Strengths:

      The theory of gain modulation proposed in the paper is rigorous and the analysis is thorough. It does address the issue in an interesting, general setting. The proposed approach separates the question of which bits of sensory information are transmitted (as defined by a specific computation and tuning curve shapes) and how well are they transmitted (as defined by the tuning curve gain optimized to combat noise). This separation permits the application of the developed theory to different neural systems.

      Weaknesses:

      The manuscript effectively consits of two parts: a general theory of optimal gain modulation and a DDC model of the visual cortex. From my perspective it is not entirely clear which components of the developed theory and the model it is applied to are essential to explain the experimental phenomena in the visual cortex (Fig. 12). This "separation" into two parts makes this work, in my view, somewhat diffused.

      Overall, I think this is an interesting contribution and I assess it positively. It has the potential of deepening our understanding of efficient neural representations beyond sensory periphery.

    3. Reviewer #2 (Public review):

      Summary:

      Using the theory of efficient coding, the authors study how neural gains may be adjusted to optimize information transmission by noisy neural populations while minimizing metabolic cost, under the assumption that other aspects of neural activity (i.e. tuning) are determined by the computation performed by the network.

      The manuscript first presents mathematical results for the general case where the computational goals of the neural population are not specified (the computation is implicit in the assumed tuning curves). It then develops the theory for a specific probabilistic coding scheme. The general theory provides an explanation for firing rate homeostasis at the level of neural clusters with firing rate heterogeneity within clusters. The specific application further explains stimulus-specific adaptation in visual cortex.

      The mathematical derivations, simulations and application to visual cortex data are solid as far as I can tell.

      This remains a highly technical manuscript although the authors have improved the clarity of presentation of the general theory (which is the bulk of the work presented) and better motivated/explained modeling assumptions and choices. In the second part, the manuscript focuses on a specific code (homeostatic DDC) showing that this can be implemented by divisive normalization and can explain stimulus-specific adaptation.

      Strengths:

      The problem of efficient coding is a long-standing and important one. This manuscript contributes to that field by proposing a theory of efficient coding through gain adjustments, independent of the computational goals of the system. The main assumption, and insight, is that computational goals and efficiency can be in some sense factorized: tuning curve shapes are determined by the computational goal, whereas gains can be adjusted to optimize transmission of information.

      One key result is a normative explanation for firing rate homeostasis at the level of neural clusters (groups of neurons that perform a similar computation) with firing rate heterogeneity within each cluster. Both phenomena are widely observed, and reconciling them under one theory is important.

      The mathematical derivations are thorough. Although the model of neural activity is artificial, the authors make sure to include many aspects of cortical physiology, while also keeping the models quite general.

      Section 2.5 derives the conditions in which homeostasis would be near-optimal in cortex, which appear to be consistent with many empirical observations in V1. This indicates that homeostasis in V1 might be indeed a close to optimal solution to code efficiently in the face of noise.

      The application to the data of Benucci et al 2013 is the first to offer a normative explanation of stimulus-specific adaptation in V1.

      The novelty and significance of the work are presented clearly in the newly extended Introduction and Discussion.

      Weaknesses:

      The manuscript remains hard to read. The general theory occupies most of the manuscript, as needed to convey it fully. But as a result the second part on homeostatic DDC and adaptation is somewhat underdeveloped and risks having less visibility than it might deserve.

      The paper Benucci et al 2013 shows that homeostasis holds for some stimulus distributions, but not others i.e. when the 'adapter' is present too often. This manuscript, like the Benucci paper, discards those datasets. But from a theoretical standpoint, it seems important to consider why that would be the case, and if it can be predicted by the theory proposed here. The authors now acknowledge this limitation in the Discussion.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1(Public Review):

      Major comments:

      (1) Interpretation of key results and relationship between different parts of the manuscript. The manuscript begins with an information-transmission ansatz which is described as ”independent of the computational goal” (e.g. p. 17). While information theory indeed is not concerned with what quantity is being encoded (e.g. whether it is sensory periphery or hippocampus), the goal of the studied system is to *transmit* the largest amount of bits about the input in the presence of noise. In my view, this does not make the proposed framework ”independent of the computational goal”. Furthermore, the derived theory is then applied to a DDC model which proposes a very specific solution to inference problems. The relationship between information transmission and inference is deep and nuanced. Because the writing is very dense, it is quite hard to understand how the information transmission framework developed in the first part applies to the inference problem. How does the neural coding diagram in Figure 3 map onto the inference diagram in Figure 10? How does the problem of information transmission under constraints from the first part of the manuscript become an inference problem with DDCs? I am certain that authors have good answers to these questions - but they should be explained much better.

      We are very thankful to the reviewer for highlighting the potential confusion surrounding these issues, in particular the relationship between the two halves of the paper – which was previously exacerbated by the length of the paper. We have now added further explanations at different points within the manuscript to better disentangle these issues and clarify our key assumptions. We have also significantly cut the length of the paper by moving more technical discussions to the Methods or Appendices. We will summarise these changes here and also clarify the rationale for our approach and point out potential disagreements with the reviewer.

      Key to our approach is that we indeed do not assume the entire goal of the studied neural system (whether part of the sensory system or not) is to transmit the largest amount of information about the stimulus input (in the presence of noise). In fact, general computations, including the inference of latent causes of inputs, often require filtering out or ignoring some information in the sensory input. It is thus not plausible that tuning curves in general (i.e. in an arbitrary part of the nervous system) are optimised solely with regards to the criterion of information transmission. Accordingly we do not assume they are entirely optimised for that purpose. However, we do make a key assumption or hypothesis (which like any hypothesis might turn out to be partly or entirely wrong): that (1) a minimal feature of the tuning curve (its scale or gain) is entirely free to be optimised for the aim of information transmission (or more precisely the goal of combating the detrimental effect of neural noise on coding fidelity), (2) other aspects of the population tuning curve structure (i.e. the shape of individual tuning curves and their arrangement across the population) are determined by (other) computational goals beyond efficient coding. (Conceptually, this is akin to the modularization between indispensible error correction and general computations in a digital computer, and the need for the former to be performed in a manner that is agnostic as to the computations performed.) We have added two paragraphs in the manuscript which present the above rationale and our key hypothesis or assumption. The first of these was added to the (second paragraph of the) Introduction section, and the second is a new paragraph following Eq. 1 (which is about the gain-shape decomposition of the tuning curves, and the optimisation of the former based on efficient coding) of Results.

      Our paper can be divided into two parts. In the first part, we develop a general, computationally agnostic (in the above sense, just as in the digital computer example), efficient coding theory. In the second part, we apply that theory to a specific form of computation, namely the DDC framework for Bayesian inference. The latter theory now determines the tuning curve shapes. When combined with the results of the first part (which dictate the tuning curve scale or gain according to efficient coding theory), this “homeostatic DDC” model makes full predictions for the tuning curves (i.e., both scale and shape) and how they should adapt to stimulus statistics.

      So to summarise, it is not the case that the problem of information transmission (or rather mitigating the effect noise on coding fidelity under metabolic constraints), dealt with in the first part, has become a problem of Bayesian inference. But rather, the dictates of efficient coding for optimal gains for coding fidelity (under constraints) have been applied to and combined with a computational theory of inference.

      We have added new expository text before and after Eq. 17 in Sec. 2.7 (at the beginning of the second part of the paper on homeostatic DDCs) to again make the connection with the first part and the rationale for its combination with the original DDC framework more clear.

      With the changes outlined above, we believe and hope the connection between the two parts (which we agree with the reviewer, was indeed rather obscure previously) has been adequately clarified.

      (2) Clarity of writing for an interdisciplinary audience. I do not believe that in its current form, the manuscript is accessible to a broader, interdisciplinary audience such as eLife readers. The writing is very dense and technical, which I believe unnecessarily obscures the key results of this study.

      We thank the reviewer for this comment. We have taken several steps to improve the accessibility of this work for an interdisciplinary audience. Firstly, several sections containing dense, mathematical writing have now been moved into appendices or the Methods section, out from the main text; in their place we have made efforts to convey the core of the results, and to providing intuitions, without going into unnecessary technical detail. Secondly, we have added additional figures to help illustrate key concepts or assumptions (see Fig. 1B clarifying the conceptual approach to efficient coding and homeostatic adaptation, and Fig. 8A describing the clustered population). Lastly, we have made sure to refer back to the names of symbols more often, so as to make the analysis easier to follow for a reader with an experimental background.

      (3) Positioning within the context of the field and relationship to prior work. While the proposed theory is interesting and timely, the manuscript omits multiple closely related results which in my view should be discussed in relationship to the current work. In particular, a number of recent studies propose normative criteria for gain modulation in populations: • Duong, L., Simoncelli, E., Chklovskii, D. and Lipshutz, D., 2024. Adaptive whitening with fast gain modulation and slow synaptic plasticity. Advances in Neural Information Processing Systems

      Tring, E., Dipoppa, M. and Ringach, D.L., 2023. A power law describes the magnitude of adaptation in neural populations of primary visual cortex. Nature Communications, 14(1), p.8366.

      Ml ynarski, W. and Tkaˇcik, G., 2022. Efficient coding theory of dynamic attentional modulation. PLoS Biology

      Haimerl, C., Ruff, D.A., Cohen, M.R., Savin, C. and Simoncelli, E.P., 2023. Targeted V1 co-modulation supports task-adaptive sensory decisions. Nature Communications • The Ganguli and Simoncelli framework has been extended to a multivariate case and analyzed for a generalized class of error measures:

      Yerxa, T.E., Kee, E., DeWeese, M.R. and Cooper, E.A., 2020. Efficient sensory coding of multidimensional stimuli. PLoS Computational Biology

      Wang, Z., Stocker, A.A. and Lee, D.D., 2016. Efficient neural codes that minimize LP reconstruction error. Neural Computation, 28(12),

      We thank the reviewer again for bringing these works to our attention. For each, we explain whether we chose to include them in our Discussion section, and why.

      (1) Duong et al. (2024): We decided not to discuss this manuscript, as our assessment is that it is very relevant to our work. That study starts with the assumption that the goal of the sensory system under study is to whiten the signal covariance matrix, which is not the assumption we start with. A mechanistic ingredient (but not the only one) in their approach is gain modulation. However, in their case it is the gains of computationally auxiliary inhibitory neurons that is modulated and not (as in our case) the gain the (excitatory) coding neurons (i.e. those which encode information about the stimulus and whose response covariance is whitened). These key distinction make the connection with our work quite loose and we did not discuss this work.

      (2) Tring et al. (2023): We have added a discussion of the results of this paper and its relationship to the results of our work and that of Benucci et al. This appears in the 7th paragraph of the Discussion. This study is indeed highly relevant to our paper, as it essentially replicates the Benucci et al. experiment, this time in awake mice (rather than anesthetised cats). However, in contrast to the resul‘ts of Benucci et al., Tring et al. do not find firing rate homeostasis in mouse V1. A second, remarkable finding of Tring et al. is that adaptation mainly changes the scale of the population response vector, and only minimally affects its direction. While Tring et al. do not portray it as such, this behaviour amounts to pure stimulus-specific adaptation without the neuron-specific factor found in the Benucci et al. results (see Eq. 24 of our manuscript). As we discuss in our manuscript, when our homeostatic DDC model is based on an ideal-observer generative model, it also displays pure stimulus-specific adaptation with no neuronal factor. Our final model for Benucci’s data did contain a neural factor, because we used a non-ideal observer DDC (in particular, we assumed a smoother prior distribution over orientations compared to the distribution used in the experiment - which has a very sharp peak – as it is more natural given the inductive biases we expect in the brain). The resultant neural factor suppresses the tuning curves tuned to the adaptor stimulus. Interestingly, when gain adaptation is incomplete, and happens to a weaker degree compared to what is necessary for firing rate homeostasis, an additional neural factor emerges that is greater than one for neurons tuned to the adaptor stimulus. These two multiplicative neural factors can approximately cancel each other; such a theory would thus predict both deviation from homeostasis and approximately pure stimulus-specific adaptation. We plan to explore this possibility in future work.

      (3) Ml ynarski and Tkaˇcik (2022): We are now citing and discussing this work in the Discussion (penultimate paragraph), in the context of a possible future direction, namely extending our framework to cover the dynamics of adaptation (via a dynamic efficient gain modulation and dynamic inference). We have noted there that Mlynarski have used such a framework (which while similar has key technical differences with our approach) based on a task-dependent efficient coding objective to model top-down attentional modulation. By contrast, we have studied bottom-up and task-independent adaptation, and it would be interesting to extend our framework and develop a model to make predictions for the temporal dynamics of such adaptation.

      (4) Haimerl et al. (2023): We have elected not to include this work within our discussion either, as we do not believe it is sufficiently relevant to our work to warrant inclusion. Although this paper also considers gain modulation of neural activity, the setting and the aims of the theoretical work and the empirical phenomena it is applied to are very different from our case in various ways. Most importantly, this paper is not offering a normative account of gain modulation; rather, gain modulation is used as a mechanism for enabling fast adaptive readouts of task relevant information.

      (5) Yerxa et al. (2020): We have now included a discussion of this paper in our Discussion section. Note that, even though this study generalises the Ganguli and Simoncelli framework to higher diemsnions, just like that paper it still places strict requirements (which are arguably even more stringent in higher dimensions) on the form of the tuning curves in the population, viz. that there exists a differentiable transform of the stimulus space which renders these unimodal curves completely homogeneous (i.e., of the same shape, and placed regularly and with uniform density).

      (6) Wang et al. (2016): We have included this paper in our discussion as well. As above, this paper does not consider general tuning curves, and places the same constraint on their shape and arrangement as in Ganguli and Simoncelli paper.

      More detailed comments and feedback:

      (1) I believe that this work offers the possibility to address an important question about novelty responses in the cortex (e.g. Homann et al, 2021 PNAS). Are they encoding novelty per-se, or are they inefficient responses of a not-yet-adapted population? Perhaps it’s worth speculating about.

      We are not sure why the relatively large responses to “novel” or odd-ball stimuli should be considered inefficient or unadapted: in the context in which those stimuli are infrequent odd-balls (and thus novel or surprising when occurring), efficient coding theory would indeed typically predict a large response compared to the (relatively suppressed) responses to frequently occurring stimuli. Of course, if the statistics change and the odd-ball stimulus now becomes frequent, adaptation should occur and would be expected to suppress responses to this stimulus. As to the question of whether (large) responses to infrequent stimuli can or should be characterised as novelty responses: this is partly an interpretational or semantic issue – unless it is grounded in knowledge of how downstream populations use this type of coding in V1, which could then provide a basis for solidly linking them to detection of novelty per se. In short, our theory, could be applied to Homann et al.’s data, but we consider that beyond the scope of the current paper.

      (2) Clustering in populations - typically in efficient coding studies, tuning curve distributions are a consequence of input statistics, constraints, and optimality criteria. Here the authors introduce randomly perturbed curves for each cluster - how to interpret that in light of the efficient coding theory? This links to a more general aspect of this work - it does not specify how to find optimal tuning curves, just how to modulate them (already addressed in the discussion).

      We begin by addressing the reviewer’s more general concern regarding the fact that our theory does not address the problem of finding optimal tuning curves, only that of modulating them optimally. As we expound within the updated version of the paper (see the newly expanded 3rd paragraph in Sec. 2.1 and the expanded 2nd paragraph in Introduction), it is not plausible that the sole function of sensory systems, and neural circuits more generally, is the transmission of information. There are many other computational tasks which must be performed by the system, such as the inference of the latent causes of sensory inputs. For many such tasks, it is not even desirable to have complete transmission of information about the external stimulus, since a substantial portion of that information is not important for the task at hand, and must be discarded. For example, such discarding of information is the basis of invariant representations that occur, e.g., in higher visual areas. So we recognise that tuning curve shapes are in general dictated and shaped by computational goals beyond transmission of information or error correction. As such, we have remained agnostic as to the computational goals of neural systems and therefore the shape of the tuning curve. We have made the assumption and adopted the postulate that those computational goals determine the shape of the tuning curves, leaving the gains to be adjuted freely for the purpose of mitigating the effect noise on coding fidelity (this is similar to how error correction is done in computers independendently of the computations performed). by assuming that those computational goals are captured adequately by the shape of tuning curves, this leaves us free to optimise the gains of those curves for purely information theoretic objectives. Finally, we note that the case where the tuning curve shapes are additionally optimised for information transmission is a special case of our more general approach. For further discussion, see the updated version of our introduction.

      We now turn to our choice to model clusters using random perturbations. This is, of course, a toy model for clustering tuning curves within a population. With this toy model we are attempting to capture the important aspects of tuning curve clusters within the population while not over-complicating the simulations. Within any neural population, there will be tuning curves that are similar; however, such curves will inevitably be heterogeneous, as opposed to completely identical. Thus, when we cluster together similar curves there will be an “average” cluster tuning curve (found by, e.g., normalising all individual curves and taking the average), which all other tuning curves within the cluster are deviations from. The random perturbations we apply are our attempt to capture these deviations. However, note that the perturbations are not fully random, but instead have an “effective dimensionality” which we vary over. By giving the perturbations an effective dimensionality, we aim to capture the fact that deviations from the average cluster tuning curve may not be fully random, and may display some structure.

      (3) Figure 8 - where do Hz come from as physical units? As I understand there are no physical units in simulations.

      We have clarified this within the figure caption. The within-cluster optimisation problem requires maximising a quadratic program subject to a constraint on the total mean spike count of the cluster. The objective for the quadratic program is however mathematically homogeneous. So we can scale the variables and parameters in a consistent to be in units of Hz – i.e., turn them into mean firing rates, instead of mean spike counts, with an assumption on the length of the coding time interval. We fix this cluster firing rate to be k × 5 Hz, so that the average single-neuron firing rate is 5 Hz (based on empirical estimates – see our Sec. 2.5). This agrees with our choice of µ in our simulations (i.e., µ = 10) if we assume a coding interval of 0.1 seconds.

      (4) Inference with DDCs in changing environments. To perform efficient inference in a dynamically changing environment (as considered here), an ideal observer needs some form of posterior-prior updating. Where does that enter here?

      A shortcoming of our theory, in its current form, is that it applies only to the system in “steady-state”, without specifying the dynamics of how adaptation temporlly evolves (we assume the enrivonment has periods of relative stability that are of relatively long duration compared to the dynamical timescales of adaptation, and consider the properties of the well-adapted steady state population). Thus our efficient coding theory (which predicts homeostatic adaptation under the outlined conditions) is silent on the time-course over which homeostasis occurs. Likewise, the DDC theory (in its original formulation in Vertes & Sahani) is silent on dynamic updating of posteriors and considers only static inference with a fixed internal model. We have now discuss a new future directoin in the Discussion (where we cite the work of Mlynarski and Tkacik) to point out that our theory can in principle be extended (based on dynamic inference and efficient coding) to account for the dynamics of attention, but this is beyond the scope of the current work.

      (5) Page 6 - ”We did this in such a way that, for all , the correlation matrices, (), were derived from covariance matrices with a 1/n power-law eigenspectrum (i.e., the ranked eigenvalues of the covariance matrix fall off inversely with their rank), in line with the findings of Stringer et al. (2019) in the primary visual cortex.” This is a very specific assumption, taken from a study of a specific brain region - how does it relate to the generality of the approach?

      Our efficient coding framework has been formulated without relying on any specific assumptions about the form of the (signal or noise) correlation matrices in cortex. The homeostatic solution to this efficient coding problem, however, emerges under certain conditions. But, as we demonstrate in our discussion of the analytic solutions to our efficient coding objective and the conditions necessary for the validity of the homeostatic solution, we expect homeostasis to arise whenever the signal geometry is sufficiently high-dimensional (among other conditions). By this we mean that the fall-off of the eigenvalues of the signal correlation matrix must be sufficiently slow. Thus, a fall-off in the eigenvalue spectrum slower than 1/n would favor homeostasis even more than our results. If the fall off was faster, then whether or not (and to what degree) firing rate homeostasis becomes suboptimal depends on factors such as the fastness of the fall-off and also the size of the population. Thus (1) rate homeostasis does not require the specific 1/n spectrum, but that spectrum is consistent with the conditions for optimality of rate homeostasis, (2) in our simulations we had to make a specific choice, and relying on empirical observations in V1 was of course a well-justified choice (moreover, as far as we are aware, there have been no other studies that have characterised the spectrum of the signal covariance matrix in response to natural stimuli, based on large population recordings).

      Reviewer #2 (Public Review):

      Strengths:

      The problem of efficient coding is a long-standing and important one. This manuscript contributes to that field by proposing a theory of efficient coding through gain adjustments, independent of the computational goals of the system. The main result is a normative explanation for firing rate homeostasis at the level of neural clusters (groups of neurons that perform a similar computation) with firing rate heterogeneity within each cluster. Both phenomena are widely observed, and reconciling them under one theory is important.

      The mathematical derivations are thorough as far as I can tell. Although the model of neural activity is artificial, the authors make sure to include many aspects of cortical physiology, while also keeping the models quite general.

      Section 2.5 derives the conditions in which homeostasis would be near-optimal in the cortex, which appear to be consistent with many empirical observations in V1. This indicates that homeostasis in V1 might be indeed close to the optimal solution to code efficiently in the face of noise.

      The application to the data of Benucci et al 2013 is the first to offer a normative explanation of stimulus-specific and neuron-specific adaptation in V1.

      We thank the reviewer for these assessments.

      Weaknesses:

      The novelty and significance of the work are not presented clearly. The relation to other theoretical work, particularly Ganguli and Simoncelli and other efficient coding theories, is explained in the Discussion but perhaps would be better placed in the Introduction, to motivate some of the many choices of the mathematical models used here.

      We thank the reviewer for this comment; we have updated our introduction to make clearer the relationship between this work and previous works within efficient coding theory. Please see the expanded 2nd paragraph of Introduction which gives a short account of previous efficient coding theories and now situates our work and differentiates it more clearly from past work.

      The manuscript is very hard to read as is, it almost feels like this could be two different papers. The first half seems like a standalone document, detailing the general theory with interesting results on homeostasis and optimal coding. The second half, from Section 2.7 on, presents a series of specific applications that appear somewhat disconnected, are not very clearly motivated nor pursued in-depth, and require ad-hoc assumptions.

      We thank the reviewer for this suggestion. The reviewer is right to note that our paper contains both the exposition of a general efficient coding theory framework in addition to applications of that framework. Following your advice we have implemented the following changes. (1) significantly shortened or entirely moved some of the less central results in the second half of Results, to the Methods or appendices (this includes the entire former section 2.7 and significant shortening of the section on implementation of Bayes ratio coding by divisive normalisation). (2) We have added a new figure (Fig 1B) and two long pieces of text to the (2nd paragraph of) Introduction, after Eq. (1), and in Sec. 2.7 (introducing homeostatic DDCs) to more clearly explain and clarify the assumptions underlying our efficient coding theory, and its connection with the second half of the Results (i.e. application to DDC theory of Bayesian inference), and better motivate why we consider the homeostatic DDC.

      For instance, it is unclear if the main significant finding is the role of homeostasis in the general theory or the demonstration that homeostatic DDC with Bayes Ratio coding captures V1 adaptation phenomena. It would be helpful to clarify if this is being proposed as a new/better computational model of V1 compared to other existing models.

      We see the central contribution of our work as not just that homeostasis arises as a result of an efficient coding objective, but also that this homeostasis is sufficient to explain V1 adaptation phenomena - in particular, stimulus specific adaptation (SSA) - when paired with an existing theory of neural representation, the DDC (itself applied to orientation coding in V1). Homeostatic adaptation alone does not explain SSA; nor do DDCs. However, when the two are combined they provide an explanation for SSA. This finding is significant, as it unifies two forms of adaptation (SSA and homeostatic adaptation) whose relationship was not previously appreciated. Our field does not currently have a standard model of V1, and we do not claim to have provided one either; rather, different models have captured different phenomena in V1, and we have done so for homeostatic SSA in V1.

      Early on in the manuscript (Section 2.1), the theory is presented as general in terms of the stimulus dimensionality and brain area, but then it is only demonstrated for orientation coding in V1.

      The efficient coding theory developed in Section 2 is indeed general throughout, we make no assumptions regarding the shape of the tuning curves or the dimensionality of the stimulus. Further, our demonstrations of the efficient coding theory through numerical simulations - make assumptions only about the form of the signal and noise covariance matrices. When we later turn our attention away from the general case, our choice to focus on orientation coding in V1 was motivated by empirical results demonstrating a co-occurrence of neural homeostasis and stimulus specific adaptation in V1.

      The manuscript relies on a specific response noise model, with arbitrary tuning curves. Using a population model with arbitrary tuning curves and noise covariance matrix, as the basis for a study of coding optimality, is problematic because not all combinations of tuning curves and covariances are achievable by neural circuits (e.g. https://pubmed.ncbi.nlm.nih.gov/27145916/ )

      First, to clarify, our theory allows for complete generality of neural tuning curve shapes, and assumes a broad family of noise models (which, while not completely arbitrary, includes cases of biological relevance and/or models commonly used in the theoretical literature). Within this class of noise covariance models, we have shown numerical results for different values for different parameters of the noise covariance model, but more importantly, have analytically outlined the general properties and requirements on noise strength and structure (and its relationship to tuning curves and signal structure) under which homeostatic adaptation would be optimal. Regarding the point that not all combinations of tuning curves and noise covariances occur in biology or are achievable by neural circuits: (1) If we are guessing correctly the specific point of the reviewer’s reference to the review paper by Kohn et al. 2016, we have in fact prominently discussed the case of information limiting noise which corresponds to a specific relationship between signal structure (as determined by tuning curves) and noise structure (as specified by the noise covariance matrix). Our family of noise models include that biologically relevant case and we have indeed paid it particular attention in our simulations and discussions (see discussion of Fig. 7 in Sec. 2.3, and that of aligned noise in Sec. 2.5). (2) As for the more general or abstract point that not all combinations of noise covariance and tuning curve structures are achievable by neural circuits, we can make the following comments. First, in lieu of a full theoretical or empirical understanding of the achievable combinations (which does not exist), we have outlined conditions for homeostatic adaptations under a broad class of noise models and arbitrary tuning curves. If some combinations within this class are not realised in biology, that does not invalidate the theoretical results, as the latter have been derived under more general conditions, which nevertheless include combinations that do occur in biology and are achievable by neural circuits (which, as pointed out, include the important case of aligned noise and signal structure – as reviewed in Kohn et al.– to which we have paid particular attention).

      The paper Benucci et al 2013 shows that homeostasis holds for some stimulus distributions, but not others i.e. when the ’adapter’ is present too often. This manuscript, like the Benucci paper, discards those datasets. But from a theoretical standpoint, it seems important to consider why that would be the case, and if it can be predicted by the theory proposed here.

      The theory we provide predicts that, under certain (specified) conditions, we ought to see deviation from exact homeostatic results; indeed, we provide a first order approximation to the optimal gains in this case which quantifies such deviations when they are small. However, unfortunately the form of this deviation depends on a precise choice of stimulus statistics (e.g. the signal correlation matrix, the noise correlation matrix averaged over all stimulus space, and other stimulus statistics), in contrasts to the universality of the homeostatic solution, when it is a valid approximation. In our model of Benucci et al.’s experiment, we restrict to a simple one-dimensional stimulus space (corresponding to orientated gratings), without specifying neural responses to all stimuli; as such, we are not immediately able to make predictions about whether the homeostatic failure can be predicted using the specific form of deviation from homeostasis. However, we acknowledge that this is a weakness of our analysis, and that a more complete investigation would address this question. For reasons of space, we elected not to pursue this further. We have added a paragraph to our Discussion (8th paragraph) explaining this.

      Reviewer#1 (Recommendations for the authors):

      (1) To make the article more accessible I would suggest the following:

      (a) Include a few more illustrations or diagrams that demonstrate key concepts: adaptationof an entire population, clustering within a population, different sources of noise, inference with homeostatic DDCs, etc.

      We thank the reviewer for this suggestion - we have added an additional figure in (Figure 8, Panel A) to explain the concept of clustering within a population. We also added a new panel to Figure 1 (Figure 1B) which we hope will clarify the conceptual postulate underlying our efficient coding framework and its link to the second half of the paper.

      (b) Within the text refer to names of quantities much more often, rather than relying onlyon mathematical symbols (e.g. w,r,Ω, etc).

      We thank the reviewer for the suggestion; we have updated the text accordingly and believe this has improved the clarity of the exposition.

      (2) It is hard to distill which components of the considered theory are crucial to reproducing the experimental observations in Figure 12. Is it the homeostatic modulation, efficient coding, DDCs, or any combination of those or all of them necessary to reproduce the experiment? I believe this could be explained much better, also with an audience of experimentalists in mind.

      We have updated the text to provide additional clarity on this matter (see the pointers to these changes and additions in the revised manuscript, given above in response to your first comment). In particular, reproducing the experimental results requires combining DDCs with homeostatic modulation – with the latter a consequence of our efficient coding theory, and not an independent ingredient or assumption.

      (3) It would be good to comment on how sensitive the results are to the assumptions made, parameter values, etc. For example: do conclusions depend on statistics of neural responses in simulated environments? Do they generalize for different values of the constraint µ? This could be addressed in the discussion / supplementary material.

      This issue is already discussed extensively within the text - see Sec. 2.4, Analytical insight on the optimality of homeostasis, and Sec. 2.5, Conditions for the validity of the homeostatic solution to hold in cortex. In these sections, we outline that - provided a certain parameter combination is small - we expect the homeostatic result to hold. Accordingly, we anticipate that our numerical results will generalise to any settings in which that parameter combination remains small.

      (4) How many neurons/units were used for simulations?

      We apologies for omitting this detail; we used 10,000 units for our simulations. We have edited both the main text and the methods section to reflect this.

      (5) Typos etc: a) Figure 5 caption - the order of panels B and C is switched. b) Figure 6A - I suggest adding a colorbar.

      Thank you. We have relabelled the panels B and C in the appropriate figures so that the ordering in the figure caption is correct. We feel that a colourbar in figure 6A would be unnecessary, since we are only trying to convey the concept of uniform correlations, rather than any particular value for the correlations; as such we have elected not to add a colourbar. We have, however, added a more explicit explanation of this cartoon matrix in the figure caption, by referring to the colors of diagonal vs off-diagonal elements.

      Reviewer#2 (Recommendations for the authors):

      The text on page 10, with the perturbation analysis, could be moved to a supplement, leaving here only the intuition.

      We thank the reviewer for this suggestion; we have moved much of the argument into the appendix so as to not distract the reader with unnecessary technical details.

      Text before eq. 12 “...in cluster a maximize the objective...” should be ‘minimize’?

      The cluster objective as written is indeed maximised, as stated in the text. Note that, in the revised manuscript, this argument has been moved to an appendix to reduce the density of mathematics in the main text.

      Top of page 25 “S<sub>0</sub> and S<sub>0</sub>” should be “S<sub>0</sub> and S<sub>1</sub>”?

      Thank you, we have corrected the manuscript accordingly.

    1. eLife Assessment

      This important study investigates nerve-injury-induced allodynia by studying the role of a subpopulation of excitatory dorsal horn CCK+ neurons that express the estrogen receptor GPR30 and potentially modulate nociceptive sensitivity via direct inputs from primary somatosensory cortex. In this revised version, the authors addressed many of the critiques raised through added analyses that convincingly support the notion that spinal GPR30 neurons are indeed an excitatory subpopulation of CCK+ neurons that contribute to neuropathic pain. While evidence of a direct functional corticospinal projection to CCK+/GPR30+neurons is not fully demonstrated, this work will be of broad interest to researchers interested in the neural circuitry of pain.

    2. Reviewer #1 (Public review):

      In this manuscript, Chen et al. investigate the role of the membrane estrogen receptor GPR30 in spinal mechanisms of neuropathic pain. Using a wide variety of techniques, they first provide convincing evidence that GPR30 expression is restricted to neurons within the spinal cord, and that GPR30 neurons are well-positioned to receive descending input from the primary sensory cortex (S1). In addition, the authors put their findings in the context the previous knowledge in the field, presenting evidence demonstrating that GRP30 is expressed in the majority of CCK-expressing spinal neurons. Overall, this manuscript furthers our understanding of neural circuity that underlies neuropathic pain and will be of broad interest to neuroscientists, especially those interested in somatosensation. Nevertheless, the manuscript would be strengthened by additional analyses and clarification of data that is currently presented.

      Strengths:

      The authors present convincing evidence for expression of GPR30 in the spinal cord that is specific to spinal neurons. Similarly, complementary approaches including pharmacological inhibition and knockdown of GPR30 are used to demonstrate a role for the receptor in driving nerve injury-induced pain in rodent models.

      Weaknesses:

      Although steps were taken to put their data into the broader context of what is already known about the spinal circuitry of pain, more considerations and analyses would help the authors better achieve their goal. For instance, to determine whether GPR30 is expressed in excitatory or inhibitory neurons, more selective markers for these subtypes should be used over CamK2. Moreover, quantitative analysis of the extent of overlap between GPR30+ and CCK+ spinal neurons is needed to understand the potential heterogeneity of the GPR30 spinal neuron population, and to interpret experiments characterizing descending SI inputs onto GPR30 and CCK spinal neurons. Filling these gaps in knowledge would make their findings more solid.

      Revised Manuscript Update:

      In their revised manuscript, Chen et al. have added additional data that establishes GPR30 spinal neurons as a population of excitatory neurons, half of which express CCK. These data help to position GPR30 neurons in the existing framework of spinal neuron populations that contribute to neuropathic pain, strengthening the author's findings.

    3. Reviewer #3 (Public review):

      Summary:

      The authors convincingly demonstrate that a population of CCK+ spinal neurons in the deep dorsal horn express the G protein coupled estrogen receptor GPR30 to modulate pain sensitivity in the chronic constriction injury (CCI) model of neuropathic pain in mice. Using complementary pharmacological and genetic knockdown experiments they convincingly show that GPR30 inhibition or knockdown reverses mechanical, tactile and thermal hypersensitivity, conditioned place aversion, and c-fos staining in the spinal dorsal horn after CCI. They propose that GPR30 mediates an increase in postsynaptic AMPA receptors after CCI using slice electrophysiology which may underlie the increased behavioral sensitivity. They then use anterograde tracing approaches to show that CCK and GPR30 positive neurons in the deep dorsal horn may receive direct connections from primary somatosensory cortex. Chemogenetic activation of these dorsal horn neurons proposed to be connected to S1 increased nociceptive sensitivity in a GPR30 dependent manner. Overall, the data are very convincing and the experiments are well conducted and adequately controlled. However, the proposed model of descending corticospinal facilitation of nociceptive sensitivity through GPR30 in a population of CCK+ neurons in the dorsal horn is not fully supported.

      Strengths:

      The experiments are very well executed and adequately controlled throughout the manuscript. The data are nicely presented and supportive of a role for GPR30 signaling in the spinal dorsal horn influencing nociceptive sensitivity following CCI. The authors also did an excellent job of using complementary approaches to rigorously test their hypothesis.

      Weaknesses:

      The primary weakness in this manuscript involves overextending the interpretations of the data to still propose a role for corticospinal descending facilitation. While the viral tracing demonstrates a potential connection between S1 and CCK+ or GPR30+ spinal neurons, no direct evidence is provided for S1 in facilitating any activity of these neurons in the dorsal horn.

      Comments on the latest version:

      The authors did an excellent job addressing many of the critiques raised. Despite acknowledging that a direct functional corticospinal projection to CCK/GPR30+neurons is not supported by the data and revising the title, these claims still persist throughout the manuscript. Manipulating gene expression or the activity of postsynaptic neurons through a trans-synaptic labeling strategy does not directly support any claim that those upstream neurons are directly modulating spinal neurons through the proposed pathway. Indeed they might, but that is not demonstrated here.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      In this manuscript, Chen et al. investigate the role of the membrane estrogen receptor GPR30 in spinal mechanisms of neuropathic pain. Using a wide variety of techniques, they first provide convincing evidence that GPR30 expression is restricted to neurons within the spinal cord, and that GPR30 neurons are well-positioned to receive descending input from the primary sensory cortex (S1). In addition, the authors put their findings in the context of the previous knowledge in the field, presenting evidence demonstrating that GRP30 is expressed in the majority of CCK-expressing spinal neurons. Overall, this manuscript furthers our understanding of neural circuity that underlies neuropathic pain and will be of broad interest to neuroscientists, especially those interested in somatosensation. Nevertheless, the manuscript would be strengthened by additional analyses and clarification of data that is currently presented. 

      Strengths: 

      The authors present convincing evidence for the expression of GPR30 in the spinal cord that is specific to spinal neurons. Similarly, complementary approaches including pharmacological inhibition and knockdown of GPR30 are used to demonstrate the role of the receptor in driving nerve injury-induced pain in rodent models. 

      Weaknesses: 

      Although steps were taken to put their data into the broader context of what is already known about the spinal circuitry of pain, more considerations and analyses would help the authors better achieve their goal. For instance, to determine whether GPR30 is expressed in excitatory or inhibitory neurons, more selective markers for these subtypes should be used over CamK2. Moreover, quantitative analysis of the extent of overlap between GPR30+ and CCK+ spinal neurons is needed to understand the potential heterogeneity of the GPR30 spinal neuron population, and to interpret experiments characterizing descending SI inputs onto GPR30 and CCK spinal neurons. Filling these gaps in knowledge would make their findings more solid. 

      Thank you very much for your constructive feedback.

      In response to your suggestion, we have used more specific markers to distinguish excitatory (VGLUT2) and inhibitory (VGAT) neurons via in situ hybridization. These analyses revealed that GPR30 is predominantly expressed in excitatory neurons of the superficial dorsal horn (SDH), as presented in the Results section (lines 117-120) and in Figure 2A-B.

      Additionally, we performed a quantitative analysis to determine the extent of co-localization between GPR30+ and CCK+ neurons. The data were included in the Results (lines 131–132) and Figure 2G.

      Reviewer #2 (Public review):

      Using a variety of experimental manipulations, the authors show that the membrane estrogen receptor G protein-coupled estrogen receptor (GPER/GPR30) expressed in CCK+ excitatory spinal interneurons plays a major role in the pain symptoms observed in the chronic constriction injury (CCI) model of neuropathic pain. Intrathecal application of selective GPR30 agonist G-1 induced mechanical allodynia and thermal hyperalgesia in male and female mice. Downregulation of GPR30 in CCK+ interneurons prevented the development of mechanical and thermal hypersensitivity during CCI. They also show the up modulation of AMPA receptor expression by GPR30. 

      Generally, the conclusions are supported by the experimental results. I also would like to see significant improvements in the writing and the description of results. 

      Methodological details for some of the techniques are rather sparse. For example, when examining the co-localization of various markers, the authors do not indicate the number of animals/sections examined. Similarly, when examining the effect of shGper1, it is unclear how many cells/sections/animals were counted and analyzed. 

      In other sections, there is no description of the concentration of drugs used (for example, Figure 4H). In Figures 4C-E, there is no indication of the duration of the recordings, the ionic conditions, the effect of glutamate receptor blockers, etc 

      Some results appear anecdotal in the way they are described. For example, in Figure 5, it is unclear how many times this experiment was repeated. 

      We sincerely appreciate your valuable feedback and thoughtful recommendations.

      To address your concerns regarding methodological transparency, we have added the following details to the revised manuscript:

      The number of animals and sections analyzed in co-localization studies.

      The number of cells/sections/animals used in each quantification following shGper1 treatment.

      The concentrations of drugs administered (e.g., in Figure 4H).

      Detailed recording conditions, including duration, ionic composition, and pharmacological conditions (Figures 4C-E).

      In addition, we have thoroughly revised the writing throughout the manuscript to enhance clarity and precision in the description of our findings.

      Reviewer #3 (Public review): 

      Summary: 

      The authors convincingly demonstrate that a population of CCK+ spinal neurons in the deep dorsal horn express the G protein-coupled estrogen receptor GPR30 to modulate pain sensitivity in the chronic constriction injury (CCI) model of neuropathic pain in mice. Using complementary pharmacological and genetic knockdown experiments they convincingly show that GPR30 inhibition or knockdown reverses mechanical, tactile, and thermal hypersensitivity, conditioned place aversion, and c-fos staining in the spinal dorsal horn after CCI. They propose that GPR30 mediates an increase in postsynaptic AMPA receptors after CCI using slice electrophysiology which may underlie the increased behavioral sensitivity. They then use anterograde tracing approaches to show that CCK and GPR30 positive neurons in the deep dorsal horn may receive direct connections from the primary somatosensory cortex. Chemogenetic activation of these dorsal horn neurons proposed to be connected to S1 increased nociceptive sensitivity in a GPR30-dependent manner. Overall, the data are very convincing and the experiments are well conducted and adequately controlled. However, the proposed model of descending corticospinal facilitation of nociceptive sensitivity through GPR30 in a population of CCK+ neurons in the dorsal horn is not fully supported. 

      Strengths: 

      The experiments are very well executed and adequately controlled throughout the manuscript. The data are nicely presented and supportive of a role for GPR30 signaling in the spinal dorsal horn influencing nociceptive sensitivity following CCI. The authors also did an excellent job of using complementary approaches to rigorously test their hypothesis. 

      Weaknesses: 

      The primary weakness in this manuscript involves overextending the interpretations of the data to propose a direct link between corticospinal projections signaling through GPR30 on this CCK+ population of spinal dorsal horn neurons. For example, even in the cropped images presented, GPR30 is present in many other CCK-negative neurons. Only about a quarter of the cells labeled by the anterograde viral tracing experiment from S1 are CCK+. Since no direct evidence is provided for S1 signaling through GPR30, this conclusion should be revised. 

      Thank you for your encouraging comments and critical insights.

      We fully acknowledge the concern regarding the proposed direct involvement of corticospinal projections in modulating nociceptive behavior via GPR30 in CCK+ neurons. While our anterograde tracing experiments suggest anatomical overlap, we agree that definitive evidence of functional connectivity is lacking.

      Accordingly, we have revised the Abstract, Discussion, and Graphical Abstract to present our findings more cautiously. We now describe our observations as indicating that S1 projections potentially interact with GPR30<sup>+</sup> spinal neurons, rather than asserting a definitive functional link.

      To support this revised interpretation, we performed additional quantitative analyses examining the co-localization among S1 projections, CCK+, and GPR30+ neurons. Furthermore, we clarified that the chemogenetic activation studies targeted a mixed neuronal population and did not exclusively manipulate CCK+ neurons.

      These changes aim to better align our conclusions with the presented data and provide a more nuanced framework for future investigations.

      Reviewer #1 (Recommendations for the authors): 

      Major corrections 

      (1) Figure 2: The authors conclude that GPR30 is mainly expressed in excitatory spinal neurons because they are labeled by a virus with a Camk2 promoter. While there is evidence that Camk2 is specific to excitatory neurons in the brain, based on RNAseq datasets (e.g. Linnarsson Lab, http://mousebrain.org/adolescent/genesearch.html ) this is less clear cut within the spinal cord. A more direct way to assess the relative expression of GPR30 in excitatory versus inhibitory neurons would be to perform immunohistochemistry or FISH with GPR30/Vglut2/Vgat. 

      Alternatively, if this observation is not crucial for the overall arch of the story, I recommend the authors eliminate these data, as they do not support the idea that GPR30 is mainly in excitatory neurons. 

      We thank the reviewer for highlighting this important limitation. To strengthen our conclusion regarding the neuronal identity of GPR30-expressing cells, we performed fluorescent in situ hybridization (FISH) using vGluT2 (marker for excitatory neurons) and VGAT (marker for inhibitory neurons). The results confirmed that GPR30 is predominantly expressed in vGluT2-positive excitatory neurons within the spinal cord. These new data are presented in the revised manuscript (lines 117-120) and shown in Figure 2A-B.

      (2) (2a) Figure 2: The authors also report that GPR30 is expressed in most CCK+ spinal neurons. A more rigorous way to present the data would be to perform quantification and report the % of CCK neurons that are GPR30. 

      (2b) More importantly, it is unclear what % of GPR30 neurons are CCK+. These types of quantifications would provide useful insights into the heterogeneity of CCK and GPR30 neuron populations, and help align findings of experiments using the behavioral pharmacology using GRP antagonists to the knockdown of Gper1 in CCK spinal neurons - for instance, does a population of GRP30+/CCK- neurons exist? If so, it would be worth discussing what role (if any) that population might play in nerve injury-induced mechanical allodynia. 

      Understanding the breakdown of GPR30 populations becomes even more relevant when the authors characterize which cell types are targeted by descending projections from S1. It is clear that the vast majority of CCK+ neurons that receive descending input from S1 neurons are GPR30+, but there are many other GPR30+ neurons that do not receive input from SI neurons presented in 5M. Is this simply because only a small fraction of CCK+/GPR30+ neurons are targeted by descending S1 projections, or could they represent a distinct population of GPR30 neurons? 

      (2a) We appreciate the suggestion. Quantification showed that approximately 90% of CCK⁺ neurons express GPR30, and about 50% of GPR30⁺ neurons co-express CCK. These data are now provided in the revised Results (lines 131-132) and in Figure 2F-G.

      (2b) Indeed, our data reveal that a substantial portion of GPR30⁺ neurons do not co-express CCK. While this study focuses on GPR30 function in CCK⁺ neurons, we recognize the potential relevance of GPR30⁺/CCK⁻ populations. We have addressed this point in the Discussion (lines 303-306):

      “However, it should be noted that half of GPR30⁺ neurons are not co-localized with CCK⁺ neurons, and further studies are needed to explore the function of these GPR30⁺/CCK⁻ neurons in neuropathic pain.”

      Regarding descending input, our data in Figure 5 show that S1 projections selectively innervate a subset (~30%) of CCK⁺ neurons, most of which co-express GPR30. This suggests that S1-targeted CCK⁺/GPR30⁺ neurons may represent a functionally distinct population. We have added clarification to the revised manuscript, while acknowledging that further studies are needed to elucidate the roles of non-targeted GPR30⁺ neurons.

      (3) Throughout the manuscript both male and female mice were used in experiments. Rather than referring to male and female mice as different genders, it would be more appropriate to describe them as different sexes. 

      As suggested, we have replaced all instances of “gender” with “sex” throughout the revised manuscript.

      (4) Figure 5: To increase the ease of interpreting the figure, in panels 5J and 5N, it would be helpful to indicate directly on the figure panel which another marker was assessed in double-labeling analyses.

      We have revised Figures 5J and 5N to include clear labels identifying the markers used in double-labeling analyses, to improve interpretability.

      Minor corrections: 

      (1) Line 36, I believe the authors mean to say "GPER/GPR30 in spinal neurons", rather than just "spinal". 

      Corrected as suggested. The sentence now reads (line 34):

      “Here we showed that the membrane estrogen receptor G-protein coupled estrogen receptor (GPER/GPR30) in spinal neurons was significantly upregulated in chronic constriction injury (CCI) mice…”

      (2) There are minor grammatical errors throughout the manuscript that interfere with comprehension. Proofreading/editing of the English language use may be beneficial. 

      We have thoroughly revised the manuscript for clarity and corrected grammatical and syntactic errors to improve readability.

      (3) Line 169-170, reads "Known that EPSCs are mediated by glutamatergic receptors like AMPA receptors and several studies have been reported the relationship between GPR30 and AMPA receptor25,29". Rewriting the sentence such that it better describes what the known relationship is between GPR30 and AMPA would be helpful in setting up the rationale of the experiment in Figure 4. 

      We have rewritten this section to better clarify the rationale behind the electrophysiological experiments (lines 161-164):

      “Given that EPSCs are primarily mediated through glutamatergic receptors such as AMPA receptors, and emerging evidence suggesting that GPR30 enhances excitatory transmission by promoting clustering of glutamatergic receptor subunits, we examined whether GPR30 modulates EPSCs via AMPA receptor-dependent mechanisms.”

      (4) Line 198-199 "Then we explored the possible connections among GPR30, S1-SDH projections and CCK+ neuron." In the context of spinal circuitry, "connections" may raise the expectation that synaptic connectivity will be evaluated. What I think best describes what the authors investigated in Figure 5 is the "relationship" between GPR30, S1-SDH projections, and CCK+ neurons. 

      We have revised the sentence accordingly (lines 184-186):

      “Building on previous findings suggesting a functional interaction between S1-SDH projections and spinal CCK⁺ neurons, our current study aimed to further elucidate the structural relationship among GPR30, S1-SDH projections, and CCK⁺ neurons.”

      (5) Figure 5: To increase the ease of interpreting the figure, in panels 5J and FN, it would be helpful to indicate directly on the figure panel which other marker was assessed in double-labeling analyses. 

      We have added direct labels to figure panels to clarify double-labeled analyses in the revised Figure 5J and 5N.

      Reviewer #2 (Recommendations for the authors): 

      (1) Can the authors provide more detail about the distribution of CCK+ cells in the spinal cord and, in particular, the localization of double-stained (CCK/cfos) neurons? 

      We thank the reviewer for this suggestion. To better characterize the distribution of CCK⁺ neurons within the spinal dorsal horn (SDH), we performed immunostaining in CCK-tdTomato mice using lamina-specific markers: CGRP (lamina I), IB4 (lamina II), and NF200 (lamina III–V). Our results demonstrate that CCK⁺ neurons are primarily localized in the deeper laminae of the SDH. These findings are now described in the revised Results (lines 126–129) and shown in Figure 2E.

      In addition, we conducted c-Fos immunostaining in CCK-Ai14 mice and found increased activation of CCK⁺ neurons following CCI. This supports the involvement of CCK⁺ neurons in neuropathic pain. These data are included in the Results (lines 129–131) and Supplementary Figure S4.

      (2) Figure 2A. There is no formal quantification of the percentage of TdTomato+ neurons that are also CCK+. The description of these results is insufficient. 

      We appreciate this point and have revised the description of Figure 2A accordingly. To strengthen our analysis, we conducted additional FISH experiments with vGluT2 and VGAT probes. Quantification revealed that GPR30 is predominantly expressed in excitatory neurons (approximately 60%). These data are shown in the revised Results (lines 117-119) and Figures 2A-B and S3. This supports our conclusion that GPR30 is largely localized to excitatory spinal interneurons.

      (3) Figure 4H. What is the evidence that these are AMPA-mediated currents? This is not explained in the text. 

      Thank you for raising this point. We now provide detailed experimental procedures to clarify that the recorded EPSCs are AMPA receptor–mediated. Specifically, spinal slices from CCK-Cre mice were used, and excitatory postsynaptic currents were recorded in the presence of APV (100 μM, NMDA receptor blocker), bicuculline (20 μM, GABA_A receptor blocker), and strychnine (0.5 μM, glycine receptor blocker), ensuring that the observed currents were AMPA-dependent. These methodological details are now clearly described in the revised Results (lines 165–173) and supported by prior literature (Zhang et al., J Biol Chem 2012; Hughes et al., J Neurosci 2010).

      (1) Yan Zhang, Xiao Xiao, Xiao-Meng Zhang, Zhi-Qi Zhao, Yu-Qiu Zhang (2012). Estrogen facilitates spinal cord synaptic transmission via membrane-bound estrogen receptors: implications for pain hypersensitivity. J Biol Chem. Sep 28;287(40):33268-81.

      (2) Ethan G Hughes, Xiaoyu Peng, Amy J Gleichman, Meizan Lai, Lei Zhou, Ryan Tsou, Thomas D Parsons, David R Lynch, Josep Dalmau, Rita J Balice-Gordon (2010). Cellular and synaptic mechanisms of anti-NMDA receptor encephalitis. J Neurosci. 2010 Apr 28;30(17):5866-75.

      (4) What is the signaling mechanism leading to a larger amplitude of currents after G-1 infusion? 

      We thank the reviewer for this important question. G-1 is a selective agonist for GPR30. Based on previous studies by Luo et al. (2016), we speculate that activation of GPR30 may increase the clustering of glutamatergic receptor subunits at postsynaptic sites, thereby enhancing AMPA receptor-mediated currents. While our current study did not directly address the intracellular signaling cascade, we have incorporated this mechanistic speculation in the Discussion.

      Jie Luo, X.H., Yali Li, Yang Li, Xueqin Xu, Yan Gao, Ruoshi Shi, Wanjun Yao, Juying Liu, Changbin Ke (2016). GPR30 disrupts the balance of GABAergic and glutamatergic transmission in the spinal cord driving to the development of bone cancer pain. Oncotarget 7, 73462-73472. 10.18632/oncotarget.11867.

      (5) Figure 4I. Please include error bars. 

      We have revised Figure 4I to include error bars, as requested.

      (6) Line 198. What is the evidence that AAV2/1 EF1α FLP is an antegrade trans monosynaptic marker? 

      We thank you for this request. AAV2/1 has been widely used for anterograde monosynaptic tracing based on its properties (Wang et al., Nat Neurosci 2024; Wu et al., Neurosci Bull 2021): (1) it infects neurons at the injection site and undergoes active anterograde transport; (2) newly assembled viral particles are released at synapses and infect postsynaptic partners; (3) in the absence of helper viruses, the spread halts at the first synapse, ensuring monosynaptic restriction. We have elaborated on this in the revised manuscript (line 198), citing Wang et al. (Nat Neurosci 2024) and Wu et al. (Neurosci Bull 2021).

      (1) Hao Wang, Qin Wang, Liuzhe Cui, Xiaoyang Feng, Ping Dong, Liheng Tan, Lin Lin, Hong Lian, Shuxia Cao, Huiqian Huang, Peng Cao, Xiao-Ming Li (2024). A molecularly defined amygdalaindependent tetra-synaptic forebrain-tohindbrain pathway for odor-driven innate fear and anxiety. Nat Neurosci. 2024 Mar;27(3):514-526.

      (2) Zi-Han Wu, Han-Yu Shao, Yuan-Yuan Fu, Xiao-Bo Wu, De-Li Cao, Sheng-Xiang Yan, Wei-Lin Sha, Yong-Jing Gao, Zhi-Jun Zhang (2021). Descending Modulation of Spinal Itch Transmission by Primary Somatosensory Cortex. Neurosci Bull. 2021 Sep;37(9):1345-1350.

      (7) Figure 5G. I do not understand the logic of this experiment. A Cre AAV is injected in the S1 cortex. Why should this lead to the expression of tdTomato on a downstream (postsynaptic?) neuron? The authors should quote the literature that supports this anterograde transsynaptic transport.

      We appreciate this question. As described in previous studies (e.g., Wu et al., Neurosci Bull 2021), AAV2/1-Cre injected into the S1 cortex leads to Cre expression in projection targets due to transsynaptic anterograde transport. Subsequent injection of a Cre-dependent AAV (AAV2/9-DIO-mCherry) into the spinal cord enables specific labeling of postsynaptic neurons that receive input from S1. We have clarified this mechanism in line 206 and provided the appropriate citation.

      Zi-Han Wu, Han-Yu Shao, Yuan-Yuan Fu, Xiao-Bo Wu, De-Li Cao, Sheng-Xiang Yan, Wei-Lin Sha, Yong-Jing Gao, Zhi-Jun Zhang (2021). Descending Modulation of Spinal Itch Transmission by Primary Somatosensory Cortex. Neurosci Bull. 2021 Sep;37(9):1345-1350.

      (8) The same question arises when interpreting the results obtained in Figure 6.

      We thank the reviewer for the question, and we have addressed it in point (7).

      (9) Line 257. How do the authors envision that estrogen would change its modulation of GPR30 under basal and neuropathic conditions? Is there any evidence for this speculation? 

      We thank the reviewer for raising this thoughtful question. In the current study, we focused on pharmacologically manipulating GPR30 activity via its selective agonist and antagonist. We did not directly investigate how endogenous estrogen regulates GPR30 under physiological and neuropathic states. We have recognized this limitation and highlighted the need for future research to investigate this regulatory mechanism.

      (10-20) In my opinion, the entire manuscript needs a careful revision of the English language. While one can follow the text, it contains numerous grammatical and syntactic errors that make the reading far from enjoyable. I am highlighting just a few of the many errors. 

      We appreciate the reviewer’s honest assessment. The manuscript has undergone thorough language editing by a native English speaker to correct grammatical errors, improve clarity, and enhance overall readability. We also restructured several sections, particularly the Discussion, to improve logical flow.

      (21) The discussion of results is a bit disorganized, with disconnected sentences and statements, and somewhat repetitive. For example, lines 303 to 306 lack adequate flow. It is also quite long and includes general statements that add little to the discussion of the new findings (lines 326-333). 

      We agree and have revised the Discussion extensively. Disconnected or repetitive sentences (e.g., lines 303-306, 326-333) have been removed or rewritten. For instance, we added a new transitional paragraph (lines 307-311) to improve flow:

      “Abnormal activation of neurons in the SDH is a key contributor to hyperalgesia, and enhanced excitatory synaptic transmission is a major mechanism driving increased neuronal excitability. Therefore, we evaluated excitatory postsynaptic currents (EPSCs) and observed increased amplitudes in CCK⁺ neurons following CCI, suggesting elevated excitability in these neurons.”

      We also removed redundant generalizations to maintain a focused discussion of our novel findings.

      Reviewer #3 (Recommendations for the authors): 

      (1) What is the distribution of GPR30 throughout the spinal cord and DRG? The authors demonstrate that this can overlap with a CCK+ population, but there are many GPR30+ and CCK negative neurons, even in the cropped images presented. It would be helpful to quantify the colocalization with CCK. 

      We thank the reviewer for this important point. As shown in the revised manuscript, GPR30 is expressed in both the spinal cord and dorsal root ganglia (DRG). However, our updated data (Figure 1B) demonstrate that Gper1 mRNA levels in the DRG are not significantly altered after CCI, suggesting a limited involvement of DRG GPR30 in neuropathic pain. These results are described in the revised Results (line 94).

      Regarding spinal co-expression, we performed a detailed quantification. Approximately 90% of CCK⁺ neurons express GPR30, while about 50% of GPR30⁺ neurons are CCK⁺. These co-localization results are now included in the revised Results and presented in Figure 2G.

      (2) It is clear that CCI and GPR30 influence excitatory synaptic transmission in CCK+ neurons. However, these experiments do not fully support the authors' claims of a postsynaptic upregulation of AMPARs. Comparing amplitudes and frequencies of spontaneous EPSCs cannot necessarily distinguish a pre- vs postsynaptic change since some of these EPSCs can arise from spontaneous action potential firing. I suggest revising this conclusion. 

      We appreciate these insightful comments. We fully agree that our data from spontaneous EPSC recordings (sEPSCs) in CCK⁺ neurons are not sufficient to distinguish between pre- and postsynaptic mechanisms, as sEPSCs may include spontaneous presynaptic activity. Therefore, we have revised the text throughout the manuscript to avoid overstating conclusions related to postsynaptic AMPA receptor upregulation.

      (3) What is the rationale for the evoked EPSC experiments from electrical stimulation in "the deep laminae of SDH?" I do not think that this experiment can rule out a presynaptic contribution of GPR30 to the evoked responses, particularly if these are Gs-coupled at presynaptic terminals. Paired-pulse stimulations could help answer this question, otherwise, alternative interpretations, also related to the point above, should be provided. 

      We thank the reviewer for this thoughtful critique. Indeed, electrical stimulation of the deep SDH laminae does not exclude presynaptic involvement, especially considering that GPR30 is a G protein–coupled receptor (GPCR) and could act presynaptically. We agree that paired-pulse ratio (PPR) analysis would be more informative in distinguishing pre- from postsynaptic effects, but this was not performed due to technical limitations in our current experimental setup.

      Accordingly, we have revised our interpretations in both the Results and Discussion to acknowledge that our data do not rule out presynaptic contributions. We now state that GPR30 activation enhances EPSCs in CCK⁺ neurons, while further studies are needed to dissect the precise site of action.

      (4) I appreciate the challenging nature of the trans-synaptic viral labeling approaches, but the chemogenetic and Gper knockdown experiments do not selectively target this CCK+ population of deep dorsal horn neurons. The data are clear that each of these components (descending corticospinal projections, CCK neurons, and GPR30) can modulate nociceptive hypersensitivity, but I do not agree with the overall conclusion that each of are directly linked as the authors propose. I recommend revising the overall conclusion and title to reflect the convincing data presented. 

      We thank the reviewer for this critical observation. We agree that while our data show functional roles for descending cortical input, CCK⁺ neurons, and GPR30 in modulating pain hypersensitivity, the evidence does not establish a definitive direct circuit integrating all three components.

      In response, we have revised our conclusions to reflect this limitation. Specifically, we avoided claiming a direct functional link among S1 projections, CCK⁺ neurons, and GPR30. Instead, we now propose that GPR30 modulates neuropathic pain primarily through its action in CCK⁺ spinal neurons, with potential involvement of descending facilitation from the somatosensory cortex.

      Additionally, we have revised the manuscript title to better reflect our mechanistic focus:<br /> “GPR30 in spinal CCK-positive neurons modulates neuropathic pain.”

      Minor Corrections

      (1) The authors should refer to mice by sex, not gender. 

      Corrected throughout the manuscript.

      (2) Page 9, line 195: "significantly" is used to refer to co-localization of 28.1%. What is this significant to? 

      We have revised the sentence to accurately describe the observed percentage, without implying statistical significance:

      “Our co-staining results revealed that a high proportion of CCK⁺ S1-SDH postsynaptic neurons expressed GPR30” (line 198-199).

      (3) I recommend modifying some of the transition phrases like "by the way," "what's more," and "besides". 

      All informal expressions have been replaced with academic alternatives including “Furthermore,” “Additionally,” and “Moreover.”

      (4) Additional guides to mark specific laminae in the dorsal horn would be useful. 

      We added immunostaining with laminar markers (CGRP for lamina I and NF200 for lamina III–V), and these data are now shown in Figure 2E and described in the Results (lines 126-129).

      (5) Page 5, line 115: immunochemistry should be immunohistochemistry. 

      Corrected as suggested.

      (6) Page 6, line 136: "Confirming the structural connnections" was not demonstrated here. Perhaps co-localization between GPR30 and CCK+. 

      The text was revised to “To functionally interrogate GPR30 and CCK⁺ neurons in neuropathic pain...” (line 133).

      (7) Page 8, line 166: unsure what "took and important role" means. 

      This phrasing was corrected for clarity and replaced with an accurate scientific description.

      (8) Page 8, line 168: "IPSCs of spinal CCK+ neurons" implies that they are sending inhibitory inputs. 

      We revised the term to “EPSCs” to correctly reflect excitatory synaptic currents in CCK⁺ neurons.

      (9) Page 8, line 169: "Known that EPSCs" is missing an introductory phrase. 

      The sentence was rewritten to include an appropriate introductory clause (lines 161–164):

      “Given that EPSCs are primarily mediated through glutamatergic receptors such as AMPA receptors...”

      (10) Page 10, line 227 and 228: "adequately" and "sufficiently" should be adequate and sufficient. 

      We corrected these terms to the proper adjective forms: “adequate” and “sufficient” (lines 224-225).

    1. eLife Assessment

      This study presents a valuable finding regarding the role of oxytocin neurons in thermogenesis and behavioral thermoregulation. The use of numerous converging methods, including behavior, fiber photometry, optogenetics, thermal recordings, metabolic analyses, and more, produces a multi-dimensional dataset delivering findings that provide solid support for the conclusions. Conclusions would be strengthened with validation of the approaches, inclusion of a loss of function experiment, and further investigation of the social nature of the behavior. The maternal findings are, at present, somewhat disconnected from the conclusions. The findings are novel and open new doors for understanding the role of the PVT and oxytocin in thermoregulation work; the work will be of strong interest to the thermoregulation, social behavior, and oxytocin signaling communities.

    2. Reviewer #1 (Public review):

      Summary:

      The authors identify and investigate a specific population of PVNOT neurons (oxytocin neurons of the paraventricular hypothalamus) that seem to be involved in both behavioral and autonomic thermoregulation. These cells are activated by social thermoregulatory behaviors, but can influence thermoregulation in both social and nonsocial contexts, specifically during transitions and when mice are at low core body temperature (Tb).

      Strengths:

      The manuscript has many strengths.

      This is a novel study, with a clear question that is addressed using an array of well-designed experiments employing integrative methods. Most of the figures are well-developed, and the analysis is generally rigorous and well-detailed. The authors are clearly very experienced in this field, and indeed, their scholarly introduction and discussion sections are to their credit.

      The link between thermoregulation and the oxytocin system is well established, as is the link between social behavior and the same broad system. However, the link between these three things is novel, if it can be well substantiated. I am not persuaded that was achieved here, but I do think this manuscript has many novel and useful offerings.

      The authors use a cooling floor, and only go down to 10 degrees Celsius. This is fine, but I would like to see the effects using ambient temperature also. This is not a crucial issue, as it is not necessary for the authors' interpretations, but it could improve measurement sensitivity.

      Through an elegant behavioral experiment in Figure 1, the authors identify c-Fos patterns in the PVN that are activated by active social huddling, and they show that at the RNA level these cells overlap with oxytocin, indicating that they are oxytocin-producing cells. But this is not well discussed or indeed quantified.

      The authors engage in a deep analysis of fiber photometry experiments, first by observing PVNOT neuron overall activity during a variety of different behaviors in the context of three different temperatures. Activity was associated with nesting, quiescence, and both types of huddling (when social opportunities exist). Social situations did not strongly affect this, nor did temperature conditions. These analyses indicate that the PVNOT neurons are involved in mediating specific behavioral outputs.

      With more detailed analysis, the authors investigated how PVNOT neuronal activity relates to behavioral state transition. They found that the probability of peak PVNOT neural activity strongly predicts the offset of quiescence or quiescent huddling, and therefore can be argued to signal an increase in physical activity, and as such, increased metabolism. However, the opposite pattern was observed for huddling and nesting (onset being associated with PVNOT activity), again arguing for increased thermogenesis as a function.

      What is particularly compelling is that these peaks of activity tend to occur during low Tb, again arguing for the function in increasing body warmth.

      The authors then employ an impressive setup where they image brown adipose tissue (BAT) in tandem with DeepLabCut (DLC) based animal tracking. Crucially, BAT activity and surface temperature correlated with the calcium peak of PVNOT neurons.

      Lastly, optogenetic activation of PVNOT neurons increased Tb when it was in the lower range, but not when in the higher range. It also affected BAT and rump temperature, again at low Tb. However, there is no real effect on behavior, except a trend in activity.

      The authors do some interesting tracing work at the end, though this is not functionally explored. That is not a criticism, as it does seem like this would be a whole follow-up study.

      Weaknesses:

      While novel and valuable, the manuscript feels incomplete in its current form.

      The main evidence lacking is a loss of function of the experiment. Ideally, the authors would chronically and/or acutely inhibit PVNOT neurons to establish their necessity. I know this seems obvious, but I think it is important.

      The relative lack of behavioral analysis following optogenetic activation of PVNOT neurons is puzzling. The authors must surely want to study what this intervention does to behavioral state transitions. I feel that the current level of analysis limits the overall conclusions of this study to a large extent.

      A broader criticism is that the social dimension of this manuscript seems overplayed. Naturally, oxytocin signalling can be implicated in social behavior based on a large literature. However, the focus on social thermogenesis seems like a crude integration of social behavior and thermogenesis. Given that the authors see their effects in both social and nonsocial cases of thermoregulation, I am not sure the attempts at integrating social functions and thermogenic functions of PVNOT neurons are warranted. That is, unless the authors have further experiments or analysis that can convincingly justify this link.

      In addition, the analysis of virgin females and lactating mothers seems out of place in Figure 4.

      The c-Fos/oxytocin overlap needs to be quantified.

      The methods section could be improved by explaining how the authors exclude animals that exhibit both types of huddling, if they occur within a 90-minute time window. This seems like it could cause significant confounds.

      The computer vision model is not well-explained. The authors need to be far more explicit here about how it was validated.

      The authors should cite and consider this preprint: https://www.biorxiv.org/content/10.1101/2024.09.17.613378v1

    3. Reviewer #2 (Public review):

      Summary:

      This is a very interesting study from Vandendoren and colleagues examining the role of PVN oxytocin neurons during thermoregulatory behaviors, in particular during thermoregulatory huddling. The findings are important and compelling, and have implications for the thermoregulation field as well as the social/naturalistic behavior field.

      Strengths:

      The study is very creative and tackles a challenging task to examine how natural and social behavior influences neural circuits for a homeostatic system such as thermoregulation. The authors use a combination of state-of-the-art tools (photometry, optogenetics, automated behavior tracking, thermal imaging, and core body temperature measurement), often in combination with each other, to produce a rigorous and high-dimensional dataset. Carrying out tightly temperature-controlled experiments and examining natural behavior, neural activity, and body physiology simultaneously is quite a feat. I applaud the authors for taking this on in a rigorous and detailed manner. This paper will be valuable for both the thermoregulation field as well as for researchers interested in naturalistic social behaviors. The conclusions are supported by the data.

      Weaknesses:

      I have a number of questions and suggestions for clarification that would help improve the interpretation of the findings.

      (1) Figure 1D-F: It would be helpful to include representative images of cFos expression in the PVN, LS, and DMH during both quiescent and solo huddling conditions, to better illustrate the reported differences.

      (2) Figure 1C: The data suggest a general suppression of neural activity during sleep-associated quiescent huddling, which somewhat complicates the interpretation of what specifically the active huddling cells are responding to. A more informative control might have been a comparison between huddling and a more generic form of social engagement (e.g., dyadic sniffing) to assess whether huddling-responsive neurons are broadly tuned to social stimuli. While it may not be feasible to add this experimentally at this time, a brief discussion of this limitation in the main text would be valuable.

      (3) Figure 2H-J vs. Figure 1: The fiber photometry data suggest increased PVN activity during quiescent huddling vs active huddling, which appears to contrast with the cFos results from Figure 1. It would be helpful for the authors to comment on possible reasons for this discrepancy-e.g., methodological differences, temporal resolution, or cell-type specificity.

      (4) Figure 2O: A comparable linear regression for active huddling would be informative to assess whether the observed relationships extend across behavioral states.

      (5) Temperature manipulation: The use of floor temperature changes presents a distinct physiological and sensory experience from, for example, manipulation of ambient temperature. A discussion of how this choice may affect neural circuit engagement or interpretation of thermoregulatory responses would be beneficial.

      (6) Correlations with behavior: Across the manuscript, it would be informative to see correlations between huddle duration and neural activity (e.g., cFos expression, calcium signal magnitude). Similarly, do longer huddles produce greater thermogenic effects?

      (7) Lactating vs. virgin mothers: The inclusion of maternal data is intriguing but feels somewhat disconnected from the central huddling-thermoregulation narrative. If these experiments are to remain, additional explanation of their rationale and how they fit into the broader story would help clarify their relevance.

      (8) Optogenetic manipulation: Have the authors tested the effect of PVN OT neuron stimulation or inhibition during huddling? Even a negative result would be of interest to the field. If these data exist (main or supplementary), I apologize for missing them. If not, the authors might consider including them or commenting briefly on any attempts or challenges in carrying out these experiments.

    4. Reviewer #3 (Public review):

      Summary:

      The authors aimed to elucidate the relationship between physiological state (i.e., behavioral status and thermogenic sympathetic activity) and the activity of hypothalamic paraventricular oxytocin (PVNOT) neurons in female mice. They studied this by combining automated classification of mouse behavior via video-based analysis with calcium imaging of PVNOT neuron activity. Sympathetic thermogenesis was inferred from surface temperature changes captured by infrared thermography, and the authors provided their custom analysis scripts in the manuscript. Notably, they found that a strong, pulsatile activation of PVNOT neurons was "occasionally" observed immediately before the animals transitioned from a resting to an active state. This pulsatile activity was observed in both pair-housed and individually housed animals. While PVNOT neurons are often associated with social behaviors, this finding suggests that the oxytocinergic system is also engaged during naturalistic behaviors, even in the absence of social interactions. If experiments were more convincingly performed and presented, the results would point to a broader physiological role of central oxytocin, including in the regulation of fundamental brain states and homeostatic processes, and offer a new perspective on the functional significance of central oxytocin signaling.

      Strengths:

      The oxytocinergic neural system is believed to subserve a wide range of physiological functions, and elucidating these roles requires monitoring PVNOT neuronal activity under various behavioral contexts, as well as manipulating this activity to establish causal links. In the present study, the authors show a technically sound experimental framework that integrates behavioral tracking in both individually and group-housed mice with the observation and manipulation of PVNOT neuron activity. This experimental setup represents a valuable methodological resource for researchers investigating the physiological functions of oxytocin.

      Weaknesses:

      While this study successfully established a new experimental setup for simultaneous analyses of behavior and PVNOT neuronal activity, there are several concerns regarding the interpretation of the results and the robustness of the conclusions, which should be more thoroughly addressed.

      (1) The study relies on the assumption that calcium imaging and optogenetic manipulation were restricted only to PVNOT neurons. However, the specificity of AAV-mediated gene expression was not verified quantitatively. A fair number of cell bodies in the PVN expressed GCaMP8s, but not OT, indicating potential off-target expression (see Figure S2A, B). The lack of quantitative validation weakens confidence in the causal interpretation of the results.

      (2) The study focuses on the transition from rest to active states following pulsatile activity of PVNOT neurons. However, the physiological significance of this pulsatile activity remains unclear. According to the authors, pulsatile activity occurred with an approximately 20% probability within 100 seconds prior to the end of the resting state. This implies that, in the remaining 80% of rest-to-active transitions, pulsatile PVNOT activity did not occur, suggesting that it is not essential for initiating the transition. A comparative analysis of behavioral and thermogenic changes between transitions with and without pulsatile PVNOT activity would help to further clarify the functional relevance of this phenomenon and strengthen the authors' interpretation of the findings.

      (3) The study identifies a correlation between pulsatile activity of PVNOT neurons and rest-to-active transitions, and tests for a causal relationship using optogenetic stimulation. However, since PVNOT neurons are known to co-release other neurotransmitters such as glutamate, it remains unclear whether the observed effects are mediated specifically through oxytocin receptor signaling. To address this question, functional intervention experiments using oxytocin receptor antagonists or receptor knockout mice are necessary.

      (4) The authors attempted to detect BAT thermogenesis and skin vasomotion using infrared thermography. This technique measures only skin hair temperatures (since the skin was not shaved), but does not measure "BAT temperature" or "vasomotor tone". As seen in Figure 5E, the temperatures of the body surface areas ("BAT", "Rump", and "Dorsal surface") mostly changed in parallel, indicating that these temperatures are strongly affected by body core temperature. Therefore, the thermographic measurements in this study did not provide convincing information on BAT thermogenesis or skin vasomotion. To avoid misleading reports, the authors need to use other techniques to directly measure temperatures, such as telemetry.

      (5) Photostimulation of PVNOT neurons increased Tb after 400 sec (6.6 min) (Figure 5). This latency is too long to conclude that the neuronal stimulation elicited BAT thermogenesis. A more reasonable explanation is that the increase in Tb was caused by the induction of physical activity (Figure S4C), which slowly generates heat and contributes to the elevation of Tb. However, this view contradicts the authors' claim. To address this concern, the authors should directly measure BAT thermogenesis and compare it with the rate of Tb elevation. If BAT thermogenesis occurs, the rate at which the BAT temperature increases must exceed the rate at which Tb rises.

    5. Author response:

      (1) Maternal lactation assay and PVN oxytocin neuron identity

      Reviewers and editors noted that the maternal lactation assay felt out of place (Editors, R1, R2) and asked for clearer validation of AAV specificity in the PVN (R3). These issues are linked: the primary purpose of the lactation assay was to physiologically validate that the recorded neurons are oxytocinergic, as PVNOT neurons exhibit well-established pulsatile activity during lactation.

      In response, we will (i) explicitly frame the lactation assay as a validation experiment, (ii) streamline its presentation to sit naturally with our identity-validation rationale, and (iii) clarify our AAV targeting and expression controls; we will also address our oxytocin immunohistochemistry quantification and its limitations (we observed notable intra-individual and technical variability in oxytocin immunoreactivity), which motivated the complementary physiological approach.

      (2) Clarifications and analyses.

      The reviewers pointed to several analyses, inferences, and conclusions that should be clarified. We will clarify: (i) the oxytocin histology in Figure 1 (marker definitions and quantification), (ii) the roles of floor versus ambient temperature, and (iii) further elucidate some of the quantitative links among behavioral state, neural activity, and body temperature (e.g., behavior bout duration vs. neural responses and Tb), (iv) the computer vision methodology. These additions will address the reviewers’ requests for clearer inferences and presentation.

      (3) Optogenetic inhibition. 

      We appreciate the suggestion to include an inhibition experiment (Editors, R1, R2). While interesting, this is beyond the scope of the current revision. Our stimulation experiments were designed to functionally test a specific observation from calcium imaging, namely, that PVNOT neurons show bursts of heightened activity at transitions from quiescence to arousal/thermogenesis, and to assess causal sufficiency for thermogenic/arousal-related readouts. We will make this rationale explicit, discuss the scope limits of the current dataset, and note inhibition as an important direction for future work.

    1. eLife Assessment

      This valuable study identifies a brown adipose tissue-specific heat shock factor 1-alcohol dehydrogenase 5 (ADH5) molecular cascade as a regulator of systemic aging, showing that ADH5 deficiency contributes to BAT dysfunction and health decline in aged mice. While there is evidence to support this mechanism, the conclusions remain incomplete, particularly regarding statistical rigor and clarity in data presentation.

    2. Reviewer #1 (Public review):

      Sebag et al. addressed the role of ADH5 in BAT in the development of aging and metabolic disarrangements associated with it. This is a follow-up study after the authors' demonstration of the role of BAT ADH5 in glucose homeostasis, obesity, and cold tolerance. By ablating ADH5 specifically in brown adipocytes or pharmacologically modulating ADH5 through activation of its transcription factor, the authors conclude that preservation of BAT function is crucial for healthy aging and ADH5 is causally involved in this process. The topic is appealing given the rise in the aging population and the unclear role of BAT function in this process. Overall, the study uses several techniques, is easy to follow, and addresses several physiological and molecular manifestations of aging. However, the study lacks an appropriate statistical analysis, which severely affects the conclusions of the work. Therefore, interpretation of the findings is limited and must be done with caution.

    3. Reviewer #2 (Public review):

      Summary:

      This study investigates the role of the enzyme Alcohol Dehydrogenase 5 (ADH5) in brown adipose tissue (BAT) during aging. BAT is crucial for thermogenesis and energy balance, but its function and mass diminish with age, contributing to metabolic dysfunction and age-related diseases. ADH5, also known as S-nitrosoglutathione reductase, regulates nitric oxide (NO) signaling by damaging S-nitrosylation modifications from proteins. The authors show that aging in mice leads to increased protein S-nitrosylation but reduced ADH5 expression in BAT, resulting in impaired metabolic and cognitive functions. Deletion of ADH5 in BAT accelerates tissue senescence and systemic metabolic decline.

      Mechanisticaremoving lly, aging suppresses ADH5 via downregulation of heat shock factor 1 (HSF1), a master regulator of protein homeostasis. Importantly, pharmacologically boosting HSF1 improves BAT function and mitigates both metabolic and cognitive declines in aged mice. The findings highlight a critical HSF1-ADH5 pathway in BAT that protects against aging-related dysfunction, suggesting that targeting this pathway may offer new therapeutic strategies for improving metabolic health and cognition during aging.

      Strengths:

      This research provides insight into the interplay between redox biology, proteostasis, and metabolic decline in aging. By identifying a specific enzyme that controls SNO status in BAT and further developing a therapy to target ADH5 in BAT to prevent age-related decline, the authors have identified a putative mechanism to combat age-related decline in BAT function.

      Weaknesses:

      (1) Sex needs to be considered as a biological variable, at a minimum in the reporting of the phenotypes observed in this manuscript, but also potentially by further experimentation. The only mention of sex I could find is that the authors reported the general protein SNO status in BAT is increased with age in male C57Bl/6J mice. Is this also true in female mice? For all of the ADH5 knockout mouse data, are these also male mice? Do female ADH5 knockout mice have a consistent phenotype, or are the sex differences?

      (2) It would be helpful to know the extent of ADH5 loss in the adipose tissue of knockout mice, either by mRNA or by immunoblotting for ADH5. It could also be helpful to know if ADH5 is deleted from the inguinal adipose tissue of these mice, especially since they seem to accumulate fat mass as they age (Figure 2B).

      (3) For Figure 4D, the ChiP, it would be better to show the IgG control pulldowns. Also, there's an unexpected thing where all the values for the Adh5 flox mice are exactly the same - how is this possible? Finally, it's not clear how these BAT samples were treated with HSF1A - was this done in vivo or ex vivo?

      (4) I didn't understand what was on the y-axis in Figure 5A, nor how it was measured. I assume it's HSF1A, and maybe it's the part in the methods with the Metabolomic Analysis, but this wasn't clear. It would also help if release from the NC-Vehicle formulation could be included as a negative control.

      (5) What happens to BAT protein S-nitrosylation in HSF1A-treated mice?

      (6) Figure 1B: What is the age of the positive (ADH5BKO) and negative (Adh5 fl) mice?

      (7) Figure 1F: Can you clarify what I'm looking at in the P16ink4a panels? The red staining? Is the blue staining DAPI? This is also a problem in Figures 3C, 3D and 5G, and 5I. Figure 4B looks great - maybe this could be used as an example?

      (8) Figure 3B looks a bit odd since 7 of the 12 total mice seem to have an IL-beat level of exactly 5. I was a bit unclear about why arbitrary units were used for IL-1β levels since it says an ELISA was used to quantify IL-1β; however, in the methods the authors describe a Bio-Rad Laboratories Bio-plex Pro Mouse Cytokine 23-Plex approach, which I don't think is an ELISA. Can the approach to measuring IL-1β be clarified, and could the authors explain why they can't show units of mass for IL-1β levels?

      (9) Figure 2C and 2D: I don't really understand why the Heat or VO2 need to be expressed as fold changes. Can't these just be expressed with absolute units? It's also confusing why the heat fold change is 1.0 in the light and the dark for the floxed animal. I bet this is because the knockout is normalized to the floxed animal for light and then normalized again for the dark period, but since both are on the same graph, readers could be confused into thinking there is no difference in the heat production or VO2 between light and dark, which would be surprising. This could all just be solved if absolute units were used.

    4. Author response:

      Reviewer #1 (Public review):

      The topic is appealing given the rise in the aging population and the unclear role of BAT function in this process. Overall, the study uses several techniques, is easy to follow, and addresses several physiological and molecular manifestations of aging.  However, the study lacks an appropriate statistical analysis, which severely affects the conclusions of the work. Therefore, interpretation of the findings is limited and must be done with caution. 

      We greatly appreciate the reviewer’s encouragement. Our team is fully committed to maintaining clarity and rigor in the design, execution, and reporting of this study. We are grateful to the reviewers for bringing these issues to our attention. We also acknowledge and are working on that several statistical analyses could be reperformed to better emphasize our focus on the genetic effect of ADH5 deletion in mice of the same age.

      Reviewer #2 (Public review):

      Strengths: 

      This research provides insight into the interplay between redox biology, proteostasis, and metabolic decline in aging. By identifying a specific enzyme that controls SNO status in BAT and further developing a therapy to target ADH5 in BAT to prevent age-related decline, the authors have identified a putative mechanism to combat age-related decline in BAT function. 

      We greatly appreciate the reviewer’s encouragement. 

      Weaknesses: 

      (1) Sex needs to be considered as a biological variable, at a minimum in the reporting of the phenotypes observed in this manuscript, but also potentially by further experimentation. 

      We thank the reviewer for the insightful remark, and we agree with the reviewer that sex needs to be considered as a biological variable. We will assess ADH5 expression in aged female mice.

      (2)  It would be helpful to know the extent of ADH5 loss in the adipose tissue of knockout mice, either by mRNA or by immunoblotting for ADH5. It could also be helpful to know if ADH5 is deleted from the inguinal adipose tissue of these mice, especially since they seem to accumulate fat mass as they age (Figure 2B). 

      We thank the reviewer for the comment/suggestion. Indeed, we have measured the ADH5 expression in both brown adipose tissue (BAT) and inguinal adipose tissue (iWAT). We regret that we did not include our results in the first submission and will provide these results in the revised manuscript.

      (3)  For Figure 4D, the ChiP, it would be better to show the IgG control pulldowns. Finally, it's not clear how these BAT samples were treated with HSF1A - was this done in vivo or ex vivo? 

      We thank the reviewer for their thoughtful comment and will provide detailed information in the revised manuscript.

      (4) I didn't understand what was on the y-axis in Figure 5A, nor how it was measured.

      We apologize for not making these critical points clearer in the first submission. In the revised manuscript we will include, in detail, the logistics of the experiments in the materials and methods section, figure annotation and figure legends.  

      (5) What happens to BAT protein S-nitrosylation in HSF1A-treated mice? 

      We thank the reviewer for the insightful remark, and we will measure general protein Snitrosylation status in the BAT of HSF1A-treated mice. 

      (6) Figure 1B: What is the age of the positive (ADH5BKO) and negative (Adh5 fl) mice? 

      We regret that we did not describe our results clearly in the first submission and will provide detailed information in the revised manuscript.

      (7) Figure 1F: Can you clarify what I'm looking at in the P16ink4a panels? The red staining? Is the blue staining DAPI? This is also a problem in Figures 3C, 3D and 5G, and 5I. Figure 4B looks great - maybe this could be used as an example?  

      We regret that we did not present results clearly in the first submission and will provide detailed information in the revised manuscript.

      (8) Figure 3B looks a bit odd. Can the approach to measuring IL-1β be clarified, and could the authors explain why they can't show units of mass for IL-1β levels? 

      We will provide detailed information in the revised manuscript.

      (9) Figure 2C and 2D: I don't really understand why the Heat or VO2 need to be expressed as fold changes. Can't these just be expressed with absolute units? 

      We thank the reviewer for the insightful comment. We will present these results as suggested in the revised manuscript.

    1. eLife Assessment

      This modelling study tests several hypotheses describing how seasonality and migration drive the epidemiology of Rift Valley Fever Virus among transhumant cattle in The Gambia. The work is methodologically solid, and findings offer valuable insights into how the movement of cattle in and out of the Gambia River and Sahel ecoregions could lead to source-sink transmission dynamics among cattle subpopulations, sustaining endemic transmission.

    2. Joint Public Review:

      Summary:

      This study uses data from a recent RVFV serosurvey among transhumant cattle in The Gambia to inform the development of an RVFV transmission model. The model incorporates several hypotheses that capture the seasonal nature of both vector-borne RVFV transmission and cattle migration. These natural phenomena are driven by contrasting wet and dry seasons in The Gambia's two main ecoregions and are purported to drive cyclical source-sink transmission dynamics. Although the Sahel is hypothesized to be unsuitable for year-long RVFV transmission, findings suggest that cattle returning from the Gambia River to the Sahel at the beginning of the wet season could drive repeated RVFV introductions and ensuing seasonal outbreaks. The model is also used to evaluate the potential impacts of cattle movement bans on transmission dynamics, although there is doubt about the certainty of these latter findings in light of various simplifying assumptions.

      Strengths:

      Like most infectious diseases in animal systems in low- and middle-income countries, the transmission dynamics of RVFV in cattle in The Gambia are poorly understood. This study harnesses important data on RVFV seroepidemiology to develop and parameterize a novel transmission model, providing plausible estimates of several epidemiological parameters and transmission dynamic patterns.

      This study is well written and easy to follow.

      The authors consider both deterministic and stochastic formulations of their model, demonstrating potential impacts of random events (e.g., extinctions) and providing confidence regarding model robustness.

      The authors use well-established Bayesian estimation techniques for model fitting and confront their transmission model with a seroepidemiological model to assess model fit.

      Elasticity analyses help to understand the relative importance of competing demographic and epidemiological drivers of transmission in this system.

      Weaknesses:

      The model predicts relatively stable annual dynamics reminiscent of a seasonal endemic pathogen, but RVF in sub-Saharan Africa is often characterized as causing periodic epizootics with sustained lulls in between outbreaks. Do the authors believe this conventional wisdom regarding RVF epidemiology is wrong, and that their results better support that transmission patterns are seasonal but truly relatively stable year-over-year, at least in the Gambia? The authors should discuss whether these predicted dynamics could be an artefact of the model's structure, and what ramifications this could have for their conclusions.

      It is unclear how the network analysis is used to inform the model. The network (Figure S2) suggests a highly fragmented population, which could better support, for example, a herd metapopulation approach. The first results section highlights that transhumant movements cover large distances (perhaps to justify the assumption of homogenous mixing within each ecoregion?), but the median (13.5km) is quite short.

      The model does not include an impact of infection on cattle birth rates, but the authors highlight the well-known impacts of RVF epizootics on cattle abortion and neonatal death.

      ODEs for M herds in the dry season are missing from the appendix. Even in the absence of transmission among this subpopulation in this season, demographic turnover should influence its SIR population dynamics. Were these not included in the model or simply omitted from the text?

      The importance of the LVFV positivity decay rate is highlighted, but the loss of immunity is not considered in the SIR model. The authors do discuss uncertainty regarding model structure, but could better justify their choice. Is there evidence of reduced infection risk among previously infected seronegatives, and why was an SIRS model not considered? How might findings be expected to differ under an SIRS model?

      Shouldn't disease-induced host death be included in the serocatalytic model? A high RVF mortality rate has been estimated, and FOI is relatively high, suggesting a non-negligible impact of RVF death on seroprevalence dynamics, and indeed possibly a greater impact than seroreversion.

      It is helpful that the authors have described findings from the previously conducted household survey, which is a key foundation for the model, but it needs to be made clearer what work was already conducted as part of the previous study, in particular the Methods sections RVFV seroprevalence & household survey data and Epidemiological setting & cattle population structure. Same for the sections Study Area and Data Collection in the appendix.

      The study limitations paragraph is vague. What modelling assumptions have introduced the greatest uncertainty, and what implications could this have for study conclusions?

      Two main issues with the simulations of a ban on transhuman movement:

      The introduction rightly highlights the importance of pastoral lifestyles for subsistence farmers in the Gambia. It therefore seems likely that transhumant movement bans would have great socioeconomic and ethical challenges in addition to obvious practical challenges. Is such an intervention even a remote possibility?

      The model's structure, including homogenous mixing within each ecoregion and step-change seasonality, allows for estimation of generalized transmission rates at a macro scale. However, it greatly simplifies the movement process itself and assumes that transhumant cattle movement is the only mechanism for RVF reintroduction into the Sahel region. The model is therefore likely to misrepresent the potential impacts of movement bans on transmission. As studies, for example, in healthcare settings have shown, where fine-scaled contact data are available, incorporating the specific and complex nature of inter-individual contact can change not only the magnitude but the direction of intervention impacts relative to predictions from a model with homogenous mixing assumptions. Conclusions from this work regarding the impacts of movement bans, therefore, seem poorly supported.

      This model seems perhaps better suited to exploring, for example, cattle vaccination, and potential differential efficiency when targeting T herds relative to M or L.

    3. Author response:

      (1) Stable annual dynamics vs. episodic outbreaks

      We agree that RVF is classically described as producing periodic epidemics interspersed with long inter-epidemic periods, often linked to extreme rainfall events. Our model predicts more regular seasonal dynamics, which reflects the endemic transmission patterns we have observed in The Gambia through serological surveys. In the revision, we will:

      Clarify that while epidemics occur in other parts of sub-Saharan Africa, our results may indicate a different epidemiological narrative in The Gambia, with sustained but low-level circulation (hyperendemicity).

      Discuss how model assumptions (e.g. seasonality, homogenous mixing) may bias results toward stable dynamics.

      Highlight the implications of this for interpretation and for public health decision-making.

      (2) Use of network analysis

      We acknowledge the reviewer’s concern. The network analysis was conducted descriptively to characterize cattle movement patterns and the structure of herd connections, but it was not formally incorporated into the model. In revisions we will:

      Clarify this distinction in the manuscript to avoid overinterpretation.

      Emphasize the need for future modelling work using finer-scale movement data, which could support more realistic herd metapopulation dynamics and better capture heterogeneity in transmission.

      (3) RVFV reproductive impacts

      While RVF outbreaks are known to cause abortions and neonatal deaths, these occur during relatively rare epidemics. In the Gambian context, where we’re not observing such large episodic outbreaks but rather low-level circulation, the annual impact of RVF infection on births is likely modest compared to baseline herd turnover. Moreover, cattle demography is partly managed, with replacement and movement buffering birth rates against short-term losses.

      Our model includes birth as a constant demographic process, it’s reasonable to assume stable population since we are not explicitly modelling outbreak-scale reproductive losses. This is consistent with other RVF transmission models that adopt a similar simplifying assumption. However, we will acknowledge this simplification as a limitation in the revised manuscript.

      (4) Missing ODEs for M herds in the dry season

      We thank the reviewer for identifying this omission. The ODEs for M herds in the dry season were not included in the appendix due to an oversight, though demographic turnover was incorporated in the model code. We will add the missing equations to the appendix.

      (5) Role of immunity loss and model structure (SIR vs. SIRS)

      We acknowledge that the decline of detectable antibodies over time (seropositivity decay/seroreversion) is an important consideration in RVFV serology, but whether this reflects true loss of protective immunity after natural infection remains unknown. Biologically, it is plausible that infected cattle develop long-lasting protection, as suggested by studies in humans, but there is an absence of longitudinal field data. From a modelling perspective, our aim was to predict age-seroprevalence curve dependent on FOI estimates and assess its ability to reproduce observed cross-sectional seroprevalence patterns. We therefore adopted a parsimonious SIR framework, treating loss of seropositivity as a potential explanation for the observed age disparity rather than modelling it as loss of immunity. In revisions we will:

      Clarify this rationale, emphasising that there is no direct evidence for waning immunity following natural RVFV infection in cattle, although evidence of seropositivity decay has been suggested in human.

      Further discuss the seropositivity decay rates predicted in our survey and their possible relation to test sensitivity.

      Highlight that while a SIRS structure could generate different long-term dynamics, evaluating this requires stronger evidence for true immunity loss; we consider this an important future modelling direction.

      (6) RVFV induced mortality in serocatalytic model

      We thank the reviewer for this comment. Disease-induced mortality was included in the serocatalytic model through the mortality parameter (γ), but we recognise that this might not have been sufficiently clear in the text. In revisions we will clarify in the Methods and Appendix.

      (7) Clarifying previous vs. current study components

      We will revise the Methods and Appendix to make clearer distinctions between our previous work (e.g. household survey data collection, seroprevalence estimates) and the analyses undertaken for this manuscript (e.g. model development and fitting).

      (8) Limitations paragraph

      We will expand the limitations section to specifically identify the assumptions contributing most to uncertainty. We will then outline how these may bias transmission dynamics and intervention estimates.

      (9) Movement ban simulations & suitability of model for vaccination interventions

      We appreciate the reviewer’s concerns regarding the movement ban simulation. On reassessment, we agree that our model structure might not be ideally suited to exploring them. In the revised manuscript, we will remove this analysis and emphasize how our modelling framework is more suited to exploring cattle vaccination scenarios, including targeting of specific herd types (e.g. T vs. M vs. L). We note that we are currently developing separate work focused on vaccination strategies in cattle, where this model structure might be more directly applicable, and will reserve a deeper investigation of vaccination interventions for that forthcoming publication.

    1. eLife Assessment

      This important study identifies a putative iron and zinc transporter in the plasma membrane of the obligate intracellular pathogen, Toxoplasma gondii. Using an array of different approaches, the authors convincingly demonstrate that this transporter regulates diverse cellular processes, including parasite metabolism and differentiation. This work will be of broad interest to cell biologists and biochemists studying metal ion transport mechanisms.

    2. Reviewer #1 (Public review):

      In this manuscript, Aghabi et al. present a comprehensive characterization of ZFT, a metal transporter located at the plasma membrane of the eukaryotic parasite Toxoplasma gondii. The authors provide convincing evidence that ZFT plays a crucial role in parasite fitness, as demonstrated by the generation of a conditional knockdown mutant cell line, which exhibits a marked impact on mitochondrial respiration, a process dependent on several iron-containing proteins. Consistent with previous reports, the authors also show that disruption of mitochondrial metabolism leads to conversion into the persistent bradyzoite stage. The study then employed advanced techniques, such as inductively coupled plasma-mass spectrometry (ICP-MS) and X-ray fluorescence microscopy (XFM), to demonstrate that ZFT depletion results in reduced parasite-associated metals, particularly iron and zinc. Additionally, the authors show that ZFT expression is modulated by the availability of these metals, although defects in the transporter could not be compensated for by exogenous addition of iron or zinc.

      While the manuscript does not directly investigate the transport function of ZFT through biochemical assays, the authors indirectly support the notion that ZFT can transport zinc by demonstrating its ability to compensate for a lack of zinc transport in a yeast heterologous system. Furthermore, phenotypic analyses suggest defects in iron availability, particularly with regard to Fe-S mitochondrial proteins and mitochondrial function. Overall, the manuscript provides a solid, well-rounded argument for ZFT's role in metal transport, using a combination of complementary approaches. Although direct biochemical evidence for the transporter's substrate specificity and transport activity is lacking, the converging evidence, including changes in metal concentrations upon ZFT depletion, yeast complementation data, and phenotypic changes linked to iron deficiency, presents a convincing case. Some aspects of the results may appear somewhat unbalanced, particularly since iron transport could not be confirmed through heterologous complementation, while zinc-related phenotypes in the parasites have not been thoroughly explored (which is challenging given the limited number of zinc-dependent proteins characterized in Toxoplasma). Nevertheless, given that metal acquisition remains largely uncharacterized in Toxoplasma, this manuscript provides an important first step in identifying a metal transporter in these parasites, and the data presented are generally convincing and insightful.

    3. Reviewer #2 (Public review):

      Summary:

      The intracellular pathogen Toxoplasma gondii scavenges metal ions such as iron and zinc to support its replication; however, mechanistic studies of iron and zinc uptake are limited. This study investigates the function of a putative iron and zinc transporter, ZFT. In this paper, the authors provide evidence that ZFT mediates iron and zinc uptake by examining the regulation of ZFT expression by iron and zinc levels, the impact of altered ZFT expression on iron sensitivity, and the effects of ZFT depletion on intracellular iron and zinc levels in the parasite. The effects of ZFT depletion on parasite growth are also investigated, showing the importance of ZFT function for the parasite.

      Strengths:

      A key strength of the study is the use of multiple complementary approaches to demonstrate that ZFT is involved in iron and zinc uptake. Additionally, the authors build on their finding that loss of ZFT impairs parasite growth by showing that ZFT depletion induces stage conversion and leads to defects in both the apicoplast and mitochondrion.

      Weaknesses:

      (1) Excess zinc was shown not to alter ZFT expression, but a cation chelator (TPEN) did lead to decreased expression. While TPEN is often used to reduce zinc levels, does it have any effect on iron levels? Could the reduction in ZFT after TPEN treatment be due to a reduction in the level of iron or another cation?

      (2) ZFT expression was found to be dynamic depending on the size of the vacuole, based on mean fluorescence intensity measurements. Looking at protein levels by Western blot at different times during infection would strengthen this finding.

      (3) ZFT localization remained at the parasite periphery under low iron conditions. However, in the images shown in Figure S1c, larger vacuoles (containing 4-8 parasites) are shown for the untreated conditions, and single parasite-containing vacuoles are shown for the low iron condition. As ZFT localization is predominantly at the basal end of the parasite in larger PV and at the parasite periphery for smaller vacuoles, it would be better to compare vacuoles of similar size between the untreated and low-iron conditions.

    4. Reviewer #3 (Public review):

      Summary:

      Aghabi et al set out to characterize a T. gondii transmembrane protein with a ZIP domain, termed ZFT. The authors investigate the consequences of ZFT downregulation and overexpression for parasite fitness. Downregulation of ZFT causes defects in the parasite's endosymbiotic organelles, the apicoplast and the mitochondrion. Specifically, lack of ZFT causes a decrease in mitochondrial respiration, consistent with its role as an iron transporter. This impact on the mitochondria appears to trigger partial differentiation to bradyzoites. The authors furthermore demonstrate that expression of TgZFT can rescue a yeast mutant lacking its zinc transporter and perform an array of direct metal ion measurements, including X-ray fluorescence microscopy and inductively coupled mass spectrometry (ICP-MS). These reveal reduced metal ions in parasites depleted in ZFT. Overall, the data by Aghabi et al. reveal that ZFT is a major metal ion transporter in T. gondii, importing iron and zinc for diverse essential processes.

      Strengths:

      This study's strength lies in the thorough characterization of the transporter. The authors combine a number of techniques to measure the impact of ZFT depletion, ranging from the direct measurement of metal ions to determining the consequences for the parasite's metabolism (mitochondrial respiration), as well as performing a yeast mutant complementation. This work is very thorough and clearly presented, leaving little doubt about this protein's function.

      Weaknesses:

      This study offers no major novel insights into the biology of T. gondii. The transporter was already annotated as a zinc transporter (ToxoDB), was deemed essential (PMID: 27594426), and localized to the plasma membrane (PMID: 33053376). This study mostly confirms and validates these previous datasets. The authors identify three other proteins with a ZIT domain. Particularly, the role of TGME49_225530 is intriguing, as it is likely fitness-conferring (score: -2.8, PMID: 27594426) and has no subcellular localization assigned. Characterizing this protein as well, revealing its localization, and identifying if and how these transporters coordinate metal ion transport would have been worthwhile.

      Another weakness is the data related to the impact of ZFT downregulation on the apicoplast in Figure 4. The authors show that downregulation of ZFT causes an increase in elongated apicoplasts (Figure 4d). The subsequent panels seem to show that the parasites present a dramatic growth defect at that time point. This growth arrest can directly explain the elongated apicoplast, but does not allow any conclusion about an impact on the organelle. In any case, an assessment of 'delayed death' as presented in Figure 4c seems futile, since the many other processes affected by zinc and iron depletion likely cause a rapid death, masking any potential delayed death.

    1. eLife Assessment

      In this manuscript, the authors report the fundamental finding that a secreted ubiquitin ligase of Shigella, called IpaH1.4, mediates the degradation of a host defense factor, RNF213. The data are convincing and represent a major contribution to our understanding of cell-autonomous immunity and bacterial pathogenesis as they provide new mechanistic insight into how the cytosolic bacterial pathogen Shigella flexneri evades IFN-induced host immunity.

    2. Reviewer #1 (Public review):

      Shigella flexneri is a bacterial pathogen that is an important globally significant cause of diarrhea. Shigella pathogenesis remains poorly understood. In their manuscript, Saavedra-Sanchez et al report their discovery that a secreted E3 ligase effector of Shigella, called IpaH1.4, mediates the degradation of a host E3 ligase called RNF213. RNF213 was previously described to mediate ubiquitylation of intracellular bacteria, an initial step in their targeting to xenophagosomes. Thus, Shigella IpaH1.4 appears to be an important factor to permit evasion of RNF213-mediated host defense. Strengths: The work is focused, convincing, well-performed and important, and the manuscript is well-written. The revised version addressed all the concerns raised during the initial review.

    3. Reviewer #2 (Public review):

      Summary:

      The authors find that the bacterial pathogen Shigella flexneri uses the T3SS effector IpaH1.4 to induce degradation of the IFNg-induced protein RNF213. They show that in the absence of IpaH1.4, cytosolic Shigella is bound by RNF213. Furthermore, RNF213 conjugates linear and lysine-linked ubiquitin to Shigella independently of LUBAC. Intriguingly, they find that Shigella lacking ipaH1.4 or mxiE, which regulates the expression of some T3SS effectors, are not killed even when ubiquitylated by RNF213 and that these mutants are still able to replicate within the cytosol, suggesting that Shigella encodes additional effectors to escape from host defenses mediated by RNF213-driven ubiquitylation.

      Strengths:

      The authors take a variety of approaches, including host and bacterial genetics, gain-of-function and loss-of-function assays, cell biology, biochemistry, . Overall, the experiments are elegantly designed, rigorous, and convincing.

    4. Reviewer #3 (Public review):

      Summary:

      In this study the authors set out to investigate whether and how Shigella avoids cell autonomous immunity initiated through M1-linked ubiquitin and the immune sensor and E3 ligase RNF213. The key findings are that the Shigella flexneri T3SS effector, IpaH1.4 induces degradation of RNF213. Without IpaH1.4, the bacteria are marked with RNF213 and ubiquitin following stimulation with IFNg. Interestingly, this is not sufficient to initiate the destruction of the bacteria, leading the authors to conclude that Shigella deploys additional virulence factors to avoid this host immune response. The second key finding of this study is that M1 chains decorate the mxiE/ipaH Shigella mutant independent of LUBAC, which is by and large, considered the only enzyme capable of generating M1-linked ubiquitin chains. These findings are fundamental in nature and of general interest.

      Strengths and weaknesses:

      The data is well-controlled and clearly presented with appropriate methodology. The authors provide compelling evidence that demonstrates that IpaH1.4 is the effector responsible for the degradation of RNF213 via the proteasome and their conclusions are well supported. They have clearly demonstrated how Shigella disarms RNF213-mediated immunity.

      This work builds on prior work from the same laboratory that suggests that M1 ubiquitin chains can be formed independently of LUBAC (in the prior publication this related to Chlamydia inclusions). Two key pieces of evidence support this statement - fluorescence microscopy-based images and accompanying quantification in Hoip and Hoil knockout cells for association of M1-ub, using an M1 specific antibody, and the use of an internally tagged Ub-K7R mutant. Whilst it remains possible that the M1 antibody is non-specific, as acknowledged by the authors, the data in supplementary figure 1, comparing K7R-ub and the N-terminally tagged K7R ub variant, provides evidence that during Shigella infection, LUBAC independent M1-ubiquitin chains are indeed formed. This represents an important new angle in ubiquitin biology.

      The importance of IFNgamma priming for RNF213 association to the mxiE or ipaH1.4 remains an interesting question that awaits future studies that compare different intracellular bacteria and the role of RNF213.

      Overall, the findings are important for the host-pathogen field, cell autonomous/innate immune signaling fields and microbial pathogenesis fields and the work is a very valuable addition to the recent advances in understanding the role of RNF213 in host immune responses to bacteria.

    5. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Shigella flexneri is a bacterial pathogen that is an important globally significant cause of diarrhea. Shigella pathogenesis remains poorly understood. In their manuscript, Saavedra-Sanchez et al report their discovery that a secreted E3 ligase effector of Shigella, called IpaH1.4, mediates the degradation of a host E3 ligase called RNF213. RNF213 was previously described to mediate ubiquitylation of intracellular bacteria, an initial step in their targeting of xenophagosomes. Thus, Shigella IpaH1.4 appears to be an important factor in permitting evasion of RNF213-mediated host defense.

      Strengths:

      The work is focused, convincing, well-performed, and important. The manuscript is well-written.

      We would like to thank the reviewer for their time evaluating our manuscript and the positive assessment of the novelty and importance of our study. We provide a comprehensive response to each of the reviewer’s specific recommendations below and highlight any changes made to the manuscript in response to those recommendations.

      Reviewer #1 (Recommendations for the authors):

      (1) In the abstract (and similarly on p.10), the authors claim to have shown "IpaH1.4 protein as a direct inhibitor of mammalian RNF213". However, they do not show the interaction is direct. This, in my opinion, would require demonstrating an interaction between purified recombinant proteins. I presume that the authors are relying on their UBAIT data to support the direct interaction, but this is a fairly artificial scenario that might be prone to indirect substrates. I would therefore prefer that the 'direct' statement be modified (or better supported with additional data). Similarly, on p.7, the section heading states "S. flexneri virulence factors IpaH1.4 and IpaH2.5 are sufficient to induce RNF213 degradation". The corresponding experiment is to show sufficiency in a 293T cell, but this leaves open the participation of additional 293T-expressed factors. So I would remove "are sufficient to", or alternatively add "...in 293T cells".

      We agree with the reviewer and made the recommended changes to the text in the abstract, in the results section on page 7, and in the Discussion on page 11. During the revision of our manuscript two additional studies were published that provide convincing biochemical evidence for the direct interaction between IpaH1.4 and RNF213 (PMID: 40205224; PMID: 40164614). These studies address the reviewer’s concern extensively and are now briefly discussed and cited in our revised MS.

      (2) In the abstract the authors state "Linear (M1-) and lysine-linked ubiquitin is conjugated to bacteria by RNF213 independent of the linear ubiquitin chain assembly complex (LUBAC)." However, it is not shown that RNF213 is able to directly perform M1-ubiquitylation. It is shown that RNF213 is required for M1-linked ubiquitylation in IpaH1.4 or MxiE mutants, this is different than showing conjugation is done by RNF213 itself. This should be reworded.

      We agree and edited the text accordingly

      (3) Introduction: one of the main points of the paper is that RNF213 conjugates linear ubiquitin to the surface of bacteria in a manner independent of the previously characterized linear ubiquitin conjugation (LUBAC) complex. This is indeed an interesting result, but the introduction does not put this discovery in much context. I would suggest adding some discussion of what was known, if anything, about the type of Ub chain formed by RNF213, and specifically whether linear Ub had previously been observed or not.

      We now provide context in the Introduction on page 3 and briefly discuss previous work that had implicated LUBAC in the ubiquitylation of cytosolic bacteria. We emphasize that LUBAC specifically generates linear (M1-linked) ubiquitin chains, while the types of ubiquitin linkages deposited on bacteria through RNF213-dependent pathways had remained unidentified.

      (4) Figure 3C: is the difference in 7KR-Ub between WT and HOIP KO cells significant? If so, the authors may wish to acknowledge the possibility that HOIP partially contributes to M1-Ub of MxiE mutant Shigella

      The frequencies at which bacteria are decorated with 7KR-Ub is not statistically different between WT and HOIP KO cells. We have included this information in the panel description of Figure 3.

      (5) On page 11, the authors state that "...we observed that LUBAC is dispensable for M1-linked ubiquitylation of cytosolic S. flexneri ∆ipaH1.4. We found that lysine-less internally tagged ubiquitin or an M1-specific antibody bound to S. flexneri ∆ipaH1.4 in cells lacking LUBAC (HOIL-1KO or HOIPKO) but failed to bind bacteria in RNF213-deficient cells". In fact, what is shown is that M1-ubiquitylation in ∆ipaH1.4 infection is RNF213-dependent (5E), but the work with lysine mutants, HOIP or HOIL-1 KOs are all with ∆mxiE, not ∆ipaH1.4 (3B) in this version of the manuscript. Ideally, the data with ∆ipaH1.4 could be added, but alternatively, the conclusion could be re-worded.

      We now include the data demonstrating that staining of ∆ipaH1.4 with an M1-specific antibody is unchanged from WT cells in HOIL-1 KO and HOIP KO cells. These data are shown in supplementary data (Fig. S3E) and referred to on page 9 of the revised manuscript.

      (6) The UBAIT experiment should be explained in a bit more detail in the text. The approach is not necessarily familiar to all readers, and the rationale for using Salmonella-infected ceca/colons is not well explained (and seems odd). Some appropriate caution about interpreting these data might also be welcome. Did HOIP or HOIL show up in the UBAIT? This perhaps also deserves some discussion.

      As expected, HOIP (listed under its official gene name Rnf31 in the table of Fig.S2B) was identified as a candidate IpaH1.4 interaction partner as the third most abundant hit from the UBAIT screen. Remarkably, Rnf213 was the hit with the highest abundance in the IpaH1.4 UBAIT screen. To address the reviewer’s comments, we now explain the UBAIT approach in more detail and provide the rational for using intestinal protein lysates from Salmonella infected mice. The text on page 8 reads as follows: “To investigate potential physical interactions between IpaH1.4 and IpaH2.5, we reanalyzed a previously generated dataset that employed a method known as ubiquitin-activated interaction traps (UBAITs) (32). As shown in Fig. S2A, the human ubiquitin gene was fused to the 3′ end of IpaH2.5, producing a C-terminal IpaH2.5-ubiquitin fusion protein. When incubated with ATP, ubiquitin-activating enzyme E1, and ubiquitin-conjugating enzyme E2, the IpaH2.5-ubiquitin "bait" protein is capable of binding to and ubiquitylating target substrates. This ubiquitylation creates an iso-peptide bond between the IpaH2.5 bait and its substrate, thereby enabling purification via a Strep affinity tag incorporated into the fusion construct (32). IpaH2.5-ubiquitin bait and IpaH3-ubiquitin control proteins were incubated with lysates from murine intestinal tissue. To detect interaction partners in a physiologically relevant setting, we used intestinal lysates derived from mice infected with Salmonella, which in contrast to Shigella causes pronounced inflammation in WT mice and therefore better simulates human Shigellosis in an animal model. Using UBAIT we identified HOIP (Rnf31) as a likely IpaH2.5 binding partner (Fig. S2B), thus confirming previous observations (28) and validating the effectiveness our approach. Strikingly, we identified mouse Rnf213 as the most abundant interaction partner of the IpaH2.5-ubiquitin bait protein (Fig. S2B). Collectively, our data and concurrent reports showing direct interactions between IpaH1.4 and human RNF213 (36, 37) indicate that the virulence factors IpaH1.4 and IpaH2.5 directly bind and degrade mouse as well as human RNF213.”

      (7) It would be helpful if the authors discussed their results in the context of the prior work showing IpaH1.4/2.5 mediate the degradation of HOIP. Do the authors see HOIP degradation? If indeed HOIP and RNF213 are both degraded by IpaH1.4 and IpaH2.5, are there conserved domains between RNF213 and HOIP being targeted? Or is only one the direct target? A HOIP-RNF213 interaction has previously been shown (https://doi.org/10.1038/s41467-024-47289-2). Since they interact, is it possible one is degraded indirectly? To help clarify this, a simple experiment would be to test if RNF213 degraded in HOIP KO cells (or vice-versa)?

      We appreciate the reviewer’s suggestions. We conducted the proposed experiments and found that WT S. flexneri infections result in RNF213 degradation in both WT and HOIP KO cells. Similarly, we found that HOIP degradation was independent of RNF213. We have included these data in Figs. 5A and S3B of our revised submission. A study published during revisions of our paper demonstrates that the LRR of IpaH1.4 binds to the RING domains of both RNF213 and LUBAC (PMID: 40205224). We refer to this work in our revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      The authors find that the bacterial pathogen Shigella flexneri uses the T3SS effector IpaH1.4 to induce degradation of the IFNg-induced protein RNF213. They show that in the absence of IpaH1.4, cytosolic Shigella is bound by RNF213. Furthermore, RNF213 conjugates linear and lysine-linked ubiquitin to Shigella independently of LUBAC. Intriguingly, they find that Shigella lacking ipaH1.4 or mxiE, which regulates the expression of some T3SS effectors, are not killed even when ubiquitylated by RNF213 and that these mutants are still able to replicate within the cytosol, suggesting that Shigella encodes additional effectors to escape from host defenses mediated by RNF213-driven ubiquitylation.

      Strengths:

      The authors take a variety of approaches, including host and bacterial genetics, gain-of-function and loss-of-function assays, cell biology, and biochemistry. Overall, the experiments are elegantly designed, rigorous, and convincing.

      Weaknesses:

      The authors find that ipaH1.4 mutant S. flexneri no longer degrades RNF213 and recruits RNF213 to the bacterial surface. The authors should perform genetic complementation of this mutant with WT ipaH1.4 and the catalytically inactive ipaH1.4 to confirm that ipaH1.4 catalytic activity is indeed responsible for the observed phenotype.

      We would like to thank the reviewer for their time evaluating our manuscript and the positive assessment of our work, especially its scientific rigor. We conducted the experiment suggested by the reviewer and included the new data in the revised manuscript. As expected, complementation of the ∆ipaH1.4 with WT IpaH1.4 but not with the catalytically dead C338S mutant restored the ability of Shigella to efficiently escape from recognition by RNF213 (Figs. 5C-D).

      Reviewer #2 (Recommendations for the authors):

      The authors should perform genetic complementation of the ipaH1.4 mutant with WT ipaH1.4 and the catalytically inactive ipaH1.4 to confirm that ipaH1.4 catalytic activity is indeed responsible for the observed phenotype.

      We performed the suggested experiment and show in Figs. 5C-D that complementation of the ∆ipaH1.4 mutant with WT IpaH1.4 but not with the catalytically dead C338S mutant restored the ability of Shigella to efficiently escape from recognition by RNF213. These data demonstrate that the catalytic activity of IpaH1.4 is required for evasion of RNF213 binding to the bacteria.

      Reviewer #3 (Public review):

      Summary:

      In this study, the authors set out to investigate whether and how Shigella avoids cell-autonomous immunity initiated through M1-linked ubiquitin and the immune sensor and E3 ligase RNF213. The key findings are that the Shigella flexneri T3SS effector, IpaH1.4 induces degradation of RNF213. Without IpaH1.4, the bacteria are marked with RNF213 and ubiquitin following stimulation with IFNg. Interestingly, this is not sufficient to initiate the destruction of the bacteria, leading the authors to conclude that Shigella deploys additional virulence factors to avoid this host immune response. The second key finding of this paper is the suggestion that M1 chains decorate the mxiE/ipaH Shigella mutant independent of LUBAC, which is, by and large, considered the only enzyme capable of generating M1-linked ubiquitin chains.

      Strengths:

      The data is for the most part well controlled and clearly presented with appropriate methodology. The authors convincingly demonstrate that IpaH1.4 is the effector responsible for the degradation of RNF213 via the proteasome, although the site of modification is not identified.

      Weaknesses:

      (1)The work builds on prior work from the same laboratory that suggests that M1 ubiquitin chains can be formed independently of LUBAC (in the prior publication this related to Chlamydia inclusions). In this study, two pieces of evidence support this statement -fluorescence microscopy-based images and accompanying quantification in Hoip and Hoil knockout cells for association of M1-ub, using an antibody, to Shigella mutants and the use of an internally tagged Ub-K7R mutant, which is unable to be incorporated into ubiquitin chains via its lysine residues. Given that clones of the M1-specific antibody are not always specific for M1 chains, and because it remains formally possible that the Int-K7R Ub can be added to the end of the chain as a chain terminator or as mono-ub, the authors should strengthen these findings relating to the claim that another E3 ligase can generate M1 chains de novo.

      (2) The main weakness relating to the infection work is that no bacterial protein loading control is assayed in the western blots of infected cells, leaving the reader unable to determine if changes in RNF213 protein levels are the result of the absent bacterial protein (e.g. IpaH1.4) or altered infection levels.

      (3)The importance of IFNgamma priming for RNF213 association to the mxiE or ipaH1.4 strain could have been investigated further as it is unclear if RNF213 coating is enhanced due to increased protein expression of RNF213 or another factor. This is of interest as IFNgamma priming does not seem to be needed for RNF213 to detect and coat cytosolic Salmonella.<br /> Overall, the findings are important for the host-pathogen field, cell-autonomous/innate immune signaling fields, and microbial pathogenesis fields. If further evidence for LUBAC independent M1 ubiquitylation is achieved this would represent a significant finding.

      We would like to thank the reviewer for their time evaluating our manuscript and the positive assessment of our work and its significance. We provide a comprehensive response to the main three critiques listed under ‘weaknesses’ and also have responded to each of the reviewer’s specific recommendations below. We highlight any changes made to the manuscript in response to those recommendations.

      (1) As the reviewer correctly pointed out, 7KR ubiquitin cannot only be used for linear ubiquitylation but can also function as a donor ubiquitin and can be attached as mono-ubiquitin to a substrate or to an existing ubiquitin chain as a chain terminator. To distinguish between 7KR INT-Ub signals originating from linear versus mono-ubiquitylation, we followed the reviewer’s advice and generated a N-terminally tagged 7KR INT-Ub variant. The N-terminal tag prevents linear ubiquitylation but still allows 7KR INT-Ub to be attached as a mono-ubiquitin. We found that the addition of this N-terminal tag significantly reduced but not completely abolished the number of Δ_mxiE_ bacteria decorated with 7KR INT-Ub. These data are shown in a new Fig. S1 and indicate that 7KR lacking the N-terminal tag is attached to bacteria both in the form of linear (M1-linked) ubiquitin and as donor ubiquitin, possibly as a chain terminator. While we cannot rule out that the anti-M1 antibodies used here cross-react with other ubiquitin linkages, we reason that the 7KR data strongly argues that linear ubiquitin is part of the ubiquitin coat encasing IpaH1.4-deficient cytosolic Shigella. Collectively, our data show that both linear and lysine-linked (especially K27 and K63) ubiquitin chains are part of the RNF213-dependent ubiquitin coat on the surface of IpaH1.4 mutants. And furthermore, our data strongly indicate that this ubiquitylation of IpaH1.4 mutants is independent of LUBAC.

      (2) We used GFP-expressing strains of S. flexneri for our infection studies and were therefore able to use GFP expression as a loading control. We have incorporated these data into our revised figures. These new data (Figs. 4A, 5A, and S3B) show that bacterial infection levels were comparable between WT and mutant infections and that therefore the degradation of RNF213 (or HOIP – see new data in Fig. S3B) is not due to differences in infection efficiency.

      (3) We agree with the reviewer that the mechanism by which RNF213 binds to bacteria is an important unanswered question. Similarly, whether other ISGs have auxiliary functions in this process or whether binding efficiencies vary between different bacterial species are important questions in the field. However, these questions go far beyond the scope of this study and were therefore not addressed in our revisions.

      Reviewer #3 (Recommendations for the authors):

      (1) An N-terminally tagged K7R-ub should be used as a control to test whether the signal found around the mutant shigella is being added via the N terminal Met into chains. As it is known that certain batches of the M1-specific antibodies are in fact not specific and able to detect other chain types, the authors should test the specificity of the antibody used in this study (eg against different di-Ub linkage types) and include this data in the manuscript.

      We agree with the reviewer in principle. The anti-linear ubiquitin (anti-M1) monoclonal antibody, clone 1E3, prominently used in this study was tested by the manufacturer (Sigma) by Western blotting analysis and according to the manufacturer “this antibody detected ubiquitin in linear Ub, but not Ub K11, Ub K48, Ub K63.” However, this analysis did not include all possible Ub linkage types and thus the reviewer is correct that the anti-M1 antibody could theoretically also detect some other linkage types. To address this concern, we added new data during revisions demonstrating that 7KR INT-Ub targeting to S. flexneri is largely dependent on the N-terminus (M1) of ubiquitin. Our combined observations therefore overwhelmingly support the conclusion that linear (M1-linked) as well as K-linked ubiquitin is being attached to the surface of IpH1.4 S. flexneri bacteria in an RNF213-dependent and LUBAC-independent manner.

      (2) The M1 signal detected on bacteria with the antibody is still present in either Hoip or Hoil KO’s but due to the potential non-specificity of the antibody, the authors should test whether K7R ub is detected on bacteria in the Hoil ko (in addition to Hoip KO). This would strengthen the authors’ data on LUBAC-independent M1 and is important because Hoil can catalyse non-canonical ubiquitylation.

      The specific linear ubiquitin-ligating activity of LUBAC is enacted by HOIP. We show that linear ubiquitylation of susceptible S. flexneri mutants as assessed by anti-M1 ubiquitin staining or 7KR INT-Ub recruitment occurs in HOIPKO cells at WT levels (Figs. 3B, 3C, S3E [new data]). In our view , these data unequivocally show that the observed linear ubiquitylation of cytosolic S. flexneri ipaH1.4 and mxiE mutants is independent of LUBAC.

      (3) For Figure 4A, do mxiE bacteria show similar invasion - authors should include a bacterial protein control to show levels of bacteria in WT and mxiE infected conditions. A similar control should be included in Figure 5A.

      We used GFP-expressing strains of S. flexneri for our infection studies and were therefore able to use GFP expression as a loading control. We have incorporated these data into our revised figures. These new data (Figs. 4A, 5A, and S3B) show that bacterial infection levels were comparable between WT and mutant infections and that therefore the degradation of RNF213 (or HOIP – see new data in Fig. S3B) is not due to differences in infection efficiency.

      (4) Can the authors speculate why IFNg priming is needed for the coating of Shigella mxiE mutant but not in the case of Salmonella or Burkholderia? Is this just amounts of RNF213 or something else?

      In our studies we did not directly compare ubiquitylation rates of cytosolic Shigella, Burkholderia, and Salmonella bacteria with each other under the same experimental conditions. However, such a direct comparison is needed to determine whether IFNgamma priming is required for RNF213-dependent bacterial ubiquitylation of some but not other pathogens. Two papers published during the revisions of our manuscript (PMID: 40164614, PMID: 40205224) reports robust RNF213 targeting to IpaH1.4 Shigella mutants in unprimed cells HeLa cells (whereas we used A549 and HT29 cells). Therefore, differences in reagents, cell lines, and/or other experimental conditions may determine whether IFNgamma priming is necessary to observe substantial RNF213 translocation to cytosolic bacteria.

      (5) Typos - there are several, but this is hard to annotate with line numbers so the authors should proofread again carefully.

      We proofread the manuscript and corrected the small number of typos we identified

    1. eLife Assessment

      This study presents important methodologies for repeated brain ultrasound localization microscopy (ULM) in awake mice and a set of results indicating that wakefulness reduces vascularity and blood flow velocity. The data supporting these findings are solid. This study is relevant for scientists investigating vascular physiology in the brain.

    2. Reviewer #1 (Public review):

      Summary:

      Wang and Colleagues present a study aimed at demonstrating the feasibility of repeated ultrasound localization microscopy (ULM) recording sessions on mice chronically implanted with a cranial window transparent to US. They provided quantitative information on their protocol, such as the required number of Contrast enhancing microbubbles (MBs) to get a clear image of the vasculature of a brain coronal section. Also, they quantified the co-registration quality over time-distant sessions and the vasodilator effect of isoflurane.

      Strengths:

      Strengths:

      The study showed a remarkable performance in recording precisely the same brain coronal section over repeated imaging sessions. In addition, it sheds light on the vasodilator effect of isoflurane (an anesthetic whose effects are not fully understood) on the different brain vasculature compartments, although, as the Authors stated, some insights in this aspect have already been published with other imaging techniques. The experimental setting and protocol are very well described.

      In this newly revised version, the Authors made evident efforts to strengthen the messages of their study. All the limitations of their research have been clearly acknowledged.

      A central issue remains. To answer my concerns about the need for multivariate analyses, the Author stated that: "Due to the limited number of animals used, the analyses presented in this work should be interpreted as example case studies." Although this sentence does not convince me, if the purpose of this study was to showcase the potentialities of ULM for future longitudinal awake studies, why don't they avoid any statistics? The trend for decreased vein size and increased arterial blood flow during wakefulness is evident from the plot and physiologically plausible. Why impose wrong statistics instead of dropping them altogether? I do not see the lack of statistics as detrimental to this study, based on the feedback received from the Authors.

    3. Reviewer #2 (Public review):

      Summary:

      The authors present a very interesting collection of methods and results using brain ultrasound localization microscopy (ULM) in awake mice. They emphasize the effect of the level of anesthesia on the quantifiable elements assessable with this technique (i.e. vessel diameter, flow speed, in veins and arteries, area perfused, in capillaries) and demonstrate the possibility of achieving longitudinal cerebrovascular assessment in one animal during several weeks with their protocol.

      The authors made a good rewriting the article based on the reviewers' comments. One of the message of the first version of the manuscript was that variability in measurements (vessel diameter, flow velocity, vascularity) were much more pronounced under changes of anesthesia than when considering longitudinal imaging across several weeks. This message is now not quite mitigated, as longitudinal imaging seems to show a certain variability close to the order of magnitude observed under anesthesia. In that sense, the review process was useful in avoiding hasty conclusion and calls for further caution in ULM awake longitudinal imaging, in particular regarding precision of positioning and cancellation of tissue motion.

      Strengths:

      Even if the methods elements considered separately are not new (brain ULM in rodents, setup for longitudinal awake imaging similar to those used in fUS imaging, quantification of vessel diameters/bubble flow/vessel area), when masterfully combined as it is done in this paper, they answer two questions that have been long-running in the community: what is the impact of anesthesia on the parameters measured by ULM (and indirectly in fUS and other techniques)? Is it possible to achieve ULM in awake rodents for longitudinal imaging? The manuscript is well constructed, well written, and graphics are appealing.

      The manuscript has been much strengthened by the round of review, with more animals for the longitudinal imaging study.

      Weaknesses:

      The manuscript has been only marginally modified since our last round of review, so there is probably not much we reviewers can additionally elaborate to improve it. Therefore my last concerns about the reliability of longitudinal quantifications and on certain discrepancies remains for this paper. As a general piece of advice, I would just say that every claim (' is higher', is lower', is stable') should be supported by evidence and statistical testing if it is not already the case.

      Response 06: the authors' response is not satisfactory. Even if the difference in terms of ROI boundaries between fig 4e and fig 4j has been underlined by the authors, they only provide a wordy comment and no additional quantitative analysis that could explain the discrepancy I pointed out. By doing so they take the risk of making misinterpretations. The reader is left with a discrepancy that could be explained by 2 mechanisms: -pial vessel population behave differently from penetrating arterioles and venules OR - the imaging of pial vessels with ULM is not good enough to enable proper quantification because the vessels are not clearly visible (out of plane extent). In any case Figure 4j does not "provides a more comprehensive representation of cortical vasculature" as stated. If the changes in pial vessels cannot be reliably measured, they should be excluded from the ROI.

      Line 161: be careful with the use of vessel density, as pointed by reviewer 1.

      Line 196: "the decrease in venous vessel area (averaging 55% across mice) was greater than that of arterial (averaging 35%)" no stat test has been performed.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Wang and Colleagues present a study aimed at demonstrating the feasibility of repeated ultrasound localization microscopy (ULM) recording sessions on mice chronically implanted with a cranial window transparent to US. They provided quantitative information on their protocol, such as the required number of Contrast enhancing microbubbles (MBs) to get a clear image of the vasculature of a brain coronal section. Also, they quantified the co-registration quality over time-distant sessions and the vasodilator effect of isoflurane.

      Strengths:

      The study showed a remarkable performance in recording precisely the same brain coronal section over repeated imaging sessions. In addition, it sheds light on the vasodilator effect of isoflurane (an anesthetic whose effects are not fully understood) on the different brain vasculature compartments, although, as the Authors stated, some insights in this aspect have already been published with other imaging techniques. The experimental setting and protocol are very well described.

      Wang and co-authors submitted a revised version of their study, which shows improvements in the clarity of the data description.

      However, the flaws and limitations of this study are substantially unchanged.

      The main issues are:

      Statistics are still inadequate. The TOST test proposed in this revised version is not equivalent to an ANOVA. Indeed, multivariate analyses should be the most appropriate, given that some quantifications were probably made on multiple vessels from different mice. The 3 reviewers mentioned the flaws in statistics as the primary concern.

      Response 01: We thank the reviewer for raising this important point. We fully acknowledge the limitations of our current statistical analysis. We would like to clarify that the TOST procedure was applied exclusively to the measurements taken from the same vessel segment in the same animal across different time points, with the purpose of evaluating the consistency of vessel diameter measurements. We recognize that the statistical analysis in this study remains limited, which we have acknowledged as a key limitation in the manuscript. This constraint arises primarily from the limited number of animals, and our analysis should be interpreted as a representative case study rather than a generalized statistical conclusion. We have revised the manuscript to clarify these points and to more explicitly acknowledge the statistical limitations.

      (Line 329) “Our current study primarily focused on demonstrating the feasibility of longitudinal ULM imaging in awake animals, instead of conducting a systematic investigation of how isoflurane anesthesia alters cerebral blood flow. Due to the limited number of animals used, the analyses presented in this work should be interpreted as example case studies. While the trends observed across animals were consistent, the small sample size restricts the scope of statistical inference. For future work, it would be valuable to design more rigorous control experiments with larger sample sizes to systematically compare the effects of isoflurane anesthesia, awake states, and other anesthetics that do not induce vasodilation on cerebral blood flow.”

      No new data has been added, such as testing other anesthetics.

      Response 02: We acknowledge that the current study does not include data involving other anesthetics, and we have also discussed this point in our initial response. In fact, we did attempt to use other anesthetics such as ketamine. However, we found it difficult to draw reliable conclusions due to experimental limitations such as variable anesthesia recovery profiles and injection timing, as elaborated in the following paragraphs. Therefore, we decided not to include these data in the current study to avoid potential misinterpretation.

      One major limitation of our experimental setup is that imaging in the awake state is necessarily conducted after a brief period of isoflurane-anesthesia. This brief anesthesia allows for the intravenous injection of microbubbles via the tail vein. Isoflurane is particularly suited for this purpose due to its rapid onset and offset. Mice can recover quickly once the gas is withdrawn, which enables relatively consistent post-anesthesia imaging in the awake state.

      In contrast, other anesthetic agents present challenges. Their recovery profiles are slower, more variable, and less controllable. Reversal drugs can be administered to awaken the animals, but they add another variability. These may lead to greater fluctuations in cerebral hemodynamics and factors introduce uncertainty in the timing of bolus microbubble injection. As such, our current setup is not ideal for systematically comparing different anesthetics and could yield misleading results.

      A more appropriate strategy for comparing awake ULM imaging with different anesthetics would be performing awake imaging first, followed by imaging under anesthesia. This would ensure that the awake condition is free from residual anesthetic effects. However, this method raises higher requirement in bubble delivery, as no anesthesia can be used for the intravenous injection.

      To address this, we are actively exploring another solution using indwelling jugular vein catheterization. By surgically implanting a catheter into the jugular vein prior to imaging, we can establish a stable and reproducible route for microbubble delivery in fully awake animals without any anesthesia induction. This method has the potential to enable direct and reliable comparisons across different physiological states. However, the implementation of this technique and the associated experimental findings go beyond the scope of the current study and will be presented in a future manuscript.

      In the present work, we have emphasized the methodological limitations of our approach and clarified that our primary goal is to highlight the necessity and feasibility of awake-state ULM imaging. The focus is not to comprehensively characterize the effects of different anesthetic agents on microvascular brain flow. We appreciate your understanding and interest in this important future direction. 

      Based the responses and previous revision, we have further refined the discussion of the relevant limitations:

      (Line 324) “Although isoflurane is widely used in ultrasound imaging because it provides long-lasting and stable anesthetic effects, it is important to note that the vasodilation observed with isoflurane is not representative of all anesthetics. Some anesthesia protocols, such as ketamine combined with medetomidine, do not produce significant vasodilation and are therefore preferred in experiments where vascular stability is essential, such as functional ultrasound imaging. Our current study primarily focused on demonstrating the feasibility of longitudinal ULM imaging in awake animals, instead of conducting a systematic investigation of how isoflurane anesthesia alters cerebral blood flow. Due to the limited number of animals used, the analyses presented in this work should be interpreted as example case studies. While the trends observed across animals were consistent, the small sample size restricts the scope of statistical inference. For future work, it would be valuable to design more rigorous control experiments with larger sample sizes to systematically compare the effects of isoflurane anesthesia, awake states, and other anesthetics that do not induce vasodilation on cerebral blood flow.”

      (Line 347) “Another limitation of this study is the potential residual vasodilatory effect of isoflurane anesthesia on awake imaging sessions and the short imaging window available after bolus injection. The awake imaging sessions were conducted shortly after the mice had emerged from isoflurane anesthesia, required for the MB bolus injections. The lasting vasodilatory effects of isoflurane may have influenced vascular responses, potentially contributing to an underestimation of differences in vascular dynamics between anesthetized and awake state. In addition, since microbubbles are rapidly cleared from circulation, the duration of effective imaging is limited to only a few minutes, which also overlaps with the anesthesia recovery period, constraining the usable awake-state imaging window. Future improvement on microbubble infusion using an indwelling jugular vein catheter presents a promising alternative to address these limitations. This method allows for stable microbubble infusion without the need for anesthesia induction, ensuring that the awake imaging condition is free from residual anesthetic effects. Moreover, it has the potential to extend the duration of imaging sessions, offering a longer and more stable time window for data acquisition. Furthermore, by performing ULM imaging in the awake state first, instead of starting with anesthetized imaging, researchers can achieve a more rigorous comparison of how various anesthetics influence cerebral microvascular dynamics relative to the awake baseline.”

      The Authors still insist on using the term Vascularity which they define as: 'proportion of the pixel count occupied by blood vessels within each ROI, obtained by binarizing the ULM vessel density maps and calculating the percentage of the pixels with MB signal.'. Why not use apparent cerebral blood volume or just CBV? Introducing an unnecessary and redundant term is not scientifically acceptable. In this revised version, vascularity is also used to indicate a higher vascular density (Line 275), which does not make sense: blood vessels do not generate from the isoflurane to the awake condition in a few minutes. Rev2 also raised this point.

      Response 03: Thank you for revisiting this important point. We acknowledge that the term vascularity is difficult to interpret for readers, and we also recognize that we did not sufficiently justify its use in the earlier version.

      Based on your suggestion, we have now replaced all instances of “vascularity” with “fractional vessel area”. While the underlying definition remains the same, fractional vessel area offers a more intuitive description. The term “fractional” denotes that the vessel area is normalized to the total area of the selected ROI. This normalization is essential for fair comparisons across ROIs of different sizes, such as Figures 4i–k to evaluate various brain regions. We would also like to clarify that this was not introduced as an unnecessary or redundant term, but rather as a more suitable metric for longitudinal ULM analysis. We did consider using apparent cerebral blood volume (CBV), estimated from microbubble counts. However, we found that it was less robust and meaningful in the context of longitudinal ULM comparisons. Below we provide further justification for using the vessel area instead:

      (1) Using the vessel area is more robust:

      In longitudinal ULM comparisons, normalization across time points is essential to enable fair and meaningful comparisons. In our study, we normalized the data based on a cumulative 5 million microbubbles (e.g., Fig. 2). Other normalization strategies could also be adopted, as long as the resulting vascular maps reach a sufficiently saturated state. However, even with normalization, it remains important to use a quantitative metric that is minimally biased and invariant to experimental fluctuations across time points. Vessel area, derived from binarized vessel maps, is less sensitive to variations in acquisition time and microbubble concentration. This is because repeated microbubble trajectories through the same location are not counted multiple times. In contrast, apparent CBV, calculated from the microbubble counts, is more susceptible to different concentration conditions. Since repeated detections in the same location accumulate, the metric can be dependent on injection efficiency and imaging duration. While CBV may still be valid under well-controlled, steady-state conditions, we found the vessel area to be a more robust and reliable metric for longitudinal analysis under our current bolus-injection protocol.

      (2) Using the vessel area is more meaningful:

      Compared to CBV, the vessel area provides a more direct representation of structural characteristics such as vessel diameter. Anesthesia-induced vasodilation leads to an increase in vessel diameter. Although local diameter changes can be assessed by manually selecting vessel segments, this approach is labor-intensive and prone to selection bias. To enable a more comprehensive and objective assessment of such morphological changes, fractional vessel area provides a more informative alternative to CBV, as it captures diameter-related variations at a global or regional scale, and avoids potential biases associated with manually selecting specific vessels or regions.

      In response to: vascularity is also used to indicate a higher vascular density (Line 275), which does not make sense: blood vessels do not generate from the isoflurane to the awake condition in a few minutes.

      We agree that blood vessels cannot be generated in a few minutes. Vascularity (now fractional vessel area) should be interpreted as apparent vessel density, which reflects a probabilistic estimate of vessel density based on the detectable microbubble. 

      Both apparent vessel density and apparent CBV are indirect, sampling-based approximations of vascular features, and both are fundamentally limited by microbubble detection sensitivity. Low microbubble concentrations lead to underestimation of both CBV and vessel area. A change from zero to non-zero in these metrics does not imply the physical appearance or disappearance of vessels, but rather reflects a change in the likelihood of detecting flow in each region.

      In summary, while neither fractional vessel area (vascularity in previous versions) nor apparent CBV is a perfect metric due to the inherent limitations of ULM, we believe the vessel area provides a more robust and meaningful parameter for our longitudinal comparisons. We have revised the main text to include this explanation and acknowledge the limitations and interpretation of fractional vessel area more explicitly.

      Revision in Results:

      (Line 181) “To validate the broader applicability of our findings, we conducted ROI-based analyses using fractional vessel area and mean velocity as primary metrics. These metrics extended the analysis of vessel diameter and flow velocity to entire brain regions or selected ROIs, which provides a more objective assessment of cerebral blood flow changes at a global scale and reduces the bias associated with manually selecting vessel segments. For vessel area measurements, the term fractional denotes that the vessel area is normalized to the total area of the selected ROI. This normalization is essential for fair comparisons across ROIs of different sizes.”

      Revision in Methods: definition of vascularity

      (Line 571) “In ROI-based analysis, we focused on two primary parameters: fractional vessel area and mean velocity. Fractional vessel area was defined as the proportion of the pixel count occupied by blood vessels within each ROI, obtained by binarizing the ULM vessel density maps and calculating the percentage of the pixels with MB signal. Mean velocity was calculated by averaging all non-zero pixel of velocity estimates within the ROI. The velocity distribution within each ROI was also visualized using violin plots, as shown in Fig. 2, 4 and 6, to illustrate the range and density of flow velocity estimates across different acquisition. In this study, we focused on these two metrics because they represent the most straightforward extension of single-vessel analysis to brain-wide vascular changes.”

      We put our ROI analysis code on GitHub and added a “Code availability” section. We hope it can serve as a foundation for users to explore different quantitative metrics in their own longitudinal ULM studies. We hope to provide an example to inspire further exploration.

      (Line 578) “Code availability

      To support quantitative longitudinal analysis of ULM data, we developed an open-source MATLAB application (https://github.com/ekerwang/ULMQuantitativeAnalysis). This tool is designed to facilitate ROI-based analysis of ULM images for longitudinal comparisons. It supports multiple quantification metrics, including but not limited to vessel area and mean velocity used in this study. Users can select and adapt different metrics based on their specific applications, as a wide range of ULM-based quantification metrics have been developed for different pathological and pharmacological studies.”

      The long-term recordings mentioned by the Authors refer to the 3-week time frame analyzed in this study. However, within each acquisition, the time available from imaging is only a few minutes (< 10', referring to most of the plots showing time courses) after the animals' arousal from isoflurane and before bubbles disappear. This limitation should be acknowledged.

      Response 04: Thank you for this comment. We agree that the current imaging sessions are constrained by the short time window available after the animal’s arousal from isoflurane and before bubbles disappear. This limitation indeed restricts the duration of usable awake-state imaging in our current bolus injection protocol. As discussed earlier, we are actively exploring the use of a jugular vein catheterization approach to address this limitation. This approach has the potential to extend the imaging session duration and provide a longer, more stable time window. We have now acknowledged this limitation more explicitly in the revised Discussion section.

      (Line 347) “Another limitation of this study is the potential residual vasodilatory effect of isoflurane anesthesia on awake imaging sessions and the short imaging window available after bolus injection. The awake imaging sessions were conducted shortly after the mice had emerged from isoflurane anesthesia, required for the MB bolus injections. The lasting vasodilatory effects of isoflurane may have influenced vascular responses, potentially contributing to an underestimation of differences in vascular dynamics between anesthetized and awake state. In addition, since microbubbles are rapidly cleared from circulation, the duration of effective imaging is limited to only a few minutes, which also overlaps with the anesthesia recovery period, constraining the usable awake-state imaging window. Future improvement on microbubble infusion using an indwelling jugular vein catheter presents a promising alternative to address these limitations. This method allows for stable microbubble infusion without the need for anesthesia induction, ensuring that the awake imaging condition is free from residual anesthetic effects. Moreover, it has the potential to extend the duration of imaging sessions, offering a longer and more stable time window for data acquisition. Furthermore, by performing ULM imaging in the awake state first, instead of starting with anesthetized imaging, researchers can achieve a more rigorous comparison of how various anesthetics influence cerebral microvascular dynamics relative to the awake baseline.”

      The more precise description of the number of mice and blood vessels analyzed in Figure 6 makes it apparent the limited number of independent samples used to support the findings of this work. A limitation that should be acknowledged. The newly provided information added as Supplementary Figure 1 should be moved to the main text, eventually in the figure legends. The limited data in support of the findings was also highlighted by Rev2 and, indirectly, by Rev3.

      Response 05: We acknowledge the limited number of independent samples used in this study. In the revised manuscript, we have explicitly emphasized this limitation in the Discussion section. Specifically, we added the following statement:

      (Line 329) “Our current study primarily focused on demonstrating the feasibility of longitudinal ULM imaging in awake animals, instead of conducting a systematic investigation of how isoflurane anesthesia alters cerebral blood flow. Due to the limited number of animals used, the analyses presented in this work should be interpreted as example case studies. While the trends observed across animals were consistent, the small sample size restricts the scope of statistical inference. For future work, it would be valuable to design more rigorous control experiments with larger sample sizes to systematically compare the effects of isoflurane anesthesia, awake states, and other anesthetics that do not induce vasodilation on cerebral blood flow.”

      Following your suggestion, we have also moved the newly provided information (the table in Supplementary Figure 1) into figure captions. In addition, we have modified in the Methods section to ensure that this information is clear.

      (Line 406) “Eight healthy female C57 mice (8-12 weeks) were used for this study, numbered as Mouse 1 to Mouse 8. Three mice (Mouse 1–3) were used to compare imaging results between awake and anesthetized states (Fig. 3 and 4). Three additional mice (Mouse 4–6) underwent longitudinal imaging over a three-week period (Fig. 5 and 6). Among them, Mouse 4 was also used as an example to demonstrate the overall system schematic and saturation conditions (Fig. 1 and 2). Several mice (Mouse 2, 6, 7, and 8) exhibited suboptimal cranial window quality or image artifacts and were included to illustrate common surgical or imaging issues (Supplementary Fig. 1). The specific usage of each animal is also annotated in the corresponding figure captions.”

      Reviewer #2 (Public Review):

      The authors present a very interesting collection of methods and results using brain ultrasound localization microscopy (ULM) in awake mice. They emphasize the effect of the level of anesthesia on the quantifiable elements assessable with this technique (i.e. vessel diameter, flow speed, in veins and arteries, area perfused, in capillaries) and demonstrate the possibility of achieving longitudinal cerebrovascular assessment in one animal during several weeks with their protocol.

      The authors made a good rewriting of the article based on the reviewers' comments. One of the message of the first version of the manuscript was that variability in measurements (vessel diameter, flow velocity, vascularity) were much more pronounced under changes of anesthesia than when considering longitudinal imaging across several weeks. This message is now not quite mitigated, as longitudinal imaging seems to show a certain variability close to the order of magnitude observed under anesthesia. In that sense, the review process was useful in avoiding hasty conclusion and calls for further caution in ULM awake longitudinal imaging, in particular regarding precision of positioning and cancellation of tissue motion.

      Strengths:

      Even if the methods elements considered separately are not new (brain ULM in rodents, setup for longitudinal awake imaging similar to those used in fUS imaging, quantification of vessel diameters/bubble flow/vessel area), when masterfully combined as it is done in this paper, they answer two questions that have been longrunning in the community: what is the impact of anesthesia on the parameters measured by ULM (and indirectly in fUS and other techniques)? Is it possible to achieve ULM in awake rodents for longitudinal imaging? The manuscript is well constructed, well written, and graphics are appealing.

      The manuscript has been much strengthened by the round of review, with more animals for the longitudinal imaging study.

      Weaknesses:

      Some weaknesses remain, not hindering the quality of the work, that the authors might want to answer or explain.

      When considering fig 4e and fig 4j together: it seems that in fig 4e the vascularity reduction in the cortical ROI is around 30% for downward flow, and around 55% for upward flow; but when grouping both cortical flows in fig 4j, the reduction is much smaller (~5%), even at the individual level (only mouse 1 is used in fig 4e). Can you comment on that?

      Response 06: Thank you for carefully pointing this out. This discrepancy arises primarily from differences in ROI selections.

      The vascularity metric (now we changed the term into fractional vessel area, based on Reviewer 1’s comments) is calculated as the proportion of vessel-occupied pixels relative to the total ROI area. As such, it is best suited for longitudinal comparisons within the same ROI rather than across-ROI comparisons, particularly when the size and vessel composition of the ROIs differ.

      In Fig. 4e, the cortical ROI includes mostly the penetrating vessels, which are selected due to their clear distinction between upward (venous) and downward (arterial) flow directions. Pial vessels were intentionally excluded because flow direction alone does not reliably distinguish arteries from veins in these surface vessels. Thus, the goal of this analysis was to indicate arteriovenous differences, rather than to represent the full cortical vascular changes.

      In contrast, the ROIs used in Fig. 4j aim to provide a more comprehensive view of cortical vascular responses without distinguishing flow direction. That’s why both penetrating and pial vessels are included. Since pial vessels showed relatively smaller vascularity changes within the coronal cross-sections analyzed in our study, their inclusion in the cortical ROI likely contributed to the smaller overall reduction in vascularity observed in Figure 4j.

      To address this potential confusion, we have added further clarification in the Results section of the revised manuscript.

      (Line 209) “It is worth noting that prior analyses (Fig. 4d–h) aimed to illustrate arteriovenous differences. Since pial vessels are difficult to distinguish as arteries or veins based on flow direction in coronal plane imaging, they were excluded from the ROI selection in those analyses. In the current whole-brain comparisons (Fig. 4i-k), the cortical ROIs no longer exclude pial vessels, since distinguishing between arteries and veins is not required. This aims to provide a more comprehensive representation of cortical vasculature.”

      When considering fig 4e, fig 4j, fig 6e and fig 6i altogether, it seems that vascularity can be highly variable, whether it be under anesthesia or vascular imaging, with changes between 5 to 40%. Is this vascularity quantification worth it (namely, reliable for example to quantify changes in a pathological model requiring longitudinal imaging)?

      Response 07: Thank you for raising this important point. We found that imaging in the awake state is inherently more variable than under anesthesia. In contrast, anesthetized imaging offers a more controlled and stable physiological condition, as anesthesia suppresses many sources of variation. For pathological studies, if the vascular or hemodynamic changes induced by anesthesia do not interfere with the scientific question being addressed, imaging under anesthesia can still be a practical and effective approach, due to its experimental simplicity and better physiological consistency.

      The higher variability observed in awake imaging arises from both physiological fluctuations in animals and unavoidable experimental inconsistencies, such as small misalignment on the imaging plane across sessions. If the research question aims to avoid the confounding effects of anesthesia, then instead of suppressing variation through anesthesia, it is important to acknowledge the natural baseline variation in the awake state. However, efforts should be made to minimize technical sources of variation. We have added a brief discussion of this issue at the end of the manuscript to reflect this consideration.

      (Line 396) “However, it is also important to note that although longitudinal awake imaging presents promise to avoid the confounding effects of anesthetics, imaging under anesthesia remains more convenient and controllable in many cases. For applications where the physiological question of interest is not sensitive to anesthesia-induced vascular effects, anesthetized imaging still offers a simpler and more stable approach. Awake imaging inherently exhibits greater physiological variability. However, care must be taken at the experimental level to minimize confounding sources of variation, such as stress level of the animal or handling inconsistencies, to ensure that the measurements are physiologically meaningful.”

      Regarding whether fractional vessel area (formerly referred to as vascularity) is a worthwhile metric for longitudinal quantification: based on our experience and comparisons, we found vessel area to be relatively robust and informative (see also Response 02 to Reviewer 1 for details). However, we acknowledge that other quantitative metrics—such as microbubble count, tortuosity, or flow directionality—may be more suitable depending on the specific pathological model or research question. How these metrics perform in awake imaging and longitudinal disease models is indeed an open and important question. We hope our work can serve as a foundation to inspire further investigation in this direction. To facilitate such exploration, we have developed and open-sourced a MATLAB-based analysis tool that supports multiple quantitative ULM metrics for longitudinal comparison. We encourage users to adapt and extend this framework to evaluate different quantitative metrics.

      (Line 578) “Code availability

      To support quantitative longitudinal analysis of ULM data, we developed an open-source MATLAB application (https://github.com/ekerwang/ULMQuantitativeAnalysis). This tool is designed to facilitate ROI-based analysis of ULM images for longitudinal comparisons. It supports multiple quantification metrics, including but not limited to vessel area and mean velocity used in this study. Users can select and adapt different metrics based on their specific applications, as a wide range of ULM-based quantification metrics have been developed for different pathological and pharmacological studies.”

      Reviewer #2 (Recommendations For The Authors):

      Images in figure 4 lack color bars.

      Response 08: Thank you for pointing this out. The color bars for the images in Figure 4 are the same as those used in the corresponding images in Figure 3. We have now added the explanation of color bars to the revised version of Figure 4 caption.

      Fig 4d: upward and downward are probably swapped.

      Response 09: Thank you for pointing this out, and we apologize for the oversight. They were mistakenly swapped. We have corrected this error in the revised figure.

      No quantitative conclusions are drawn regarding the changes in vessel diameter under anesthesia? Is it not significant? If it is not then why bring changes in diameter to our attention in fig 3 (white arrows) and figure 4b?

      Response 10: Our intention in highlighting diameter changes in Figure 3 (white arrows) and Figure 4b was to provide an illustrative example of isoflurane-induced diameter changes at the single-vessel level. These examples are meant to serve as case studies, not as the basis for broad statistical conclusions.

      In the initial version of the manuscript, we attempted to draw quantitative conclusions by measuring vessel diameters from ten manually selected vessel segments at each location. However, based on feedback from other reviewers, we decided to remove this analysis in the revised version. Manual selection of vessel segments is highly subjective and prone to bias, limiting its reliability for quantitative interpretation.

      Instead, we focused on ROI-based analysis using fractional vessel area (formerly referred to as vascularity), which reflects widespread changes in vessel diameter across regions. It is a more generalizable and less biased metric for quantifying vascular diameter changes.

      We further explained this in the Results section:

      (Line 181) “To validate the broader applicability of our findings, we conducted ROI-based analyses using fractional vessel area and mean velocity as primary metrics. These metrics extended the analysis of vessel diameter and flow velocity to entire brain regions or selected ROIs, which provides a more objective assessment of cerebral blood flow changes at a global scale and reduces the bias associated with manually selecting vessel segments. For vessel area measurements, the term fractional denotes that the vessel area is normalized to the total area of the selected ROI. This normalization is essential for fair comparisons across ROIs of different sizes.”

      Line 210 "In summary, statistical analysis revealed a decrease in individual vessel diameter" this does not seem to be supported by this version of the manuscript as no analysis is done on a representative group of vessels for the diameter.

      Response 11: Thank you for pointing out this important issue. In line with our previous response (Response 10), we would like to clarify that the analysis of individual vessel diameter was intended to serve as an example study, rather than a statistically supported conclusion based on a group of vessels. To avoid confusion, we have removed the phrase “statistical analysis revealed a decrease in individual vessel diameter” from the manuscript. 

      The meaning of the *** in fig 6b and 6c should be clarified as: -it is not explicitly stated - the equivalence test interpretation is less usual than other tests.

      Response 12: We thank the reviewer for pointing out this important issue. We agree that the use of asterisks (***) in Fig. 6b and 6c may have led to confusion, as such markers are typically associated with statistical significance in difference testing. In our case, the analysis was based on the two one-sided test (TOST) procedure to assess statistical equivalence, which is indeed less commonly used and could be misinterpreted.

      To address this, we have replaced the asterisks *** in the figure with the label “equiv.”, which more clearly reflects the intended interpretation. Additionally, we have revised the figure caption and the main text to explicitly state that these markers denote statistical equivalence (not difference) as determined by TOST, with the equivalence margin defined as three times the standard deviation of one week.

      (Figure 6 Caption) “Statistical analysis was performed using the two one-sided test (TOST) to evaluate consistency of measurement. The label “equiv.” indicates statistically equivalent measurements (p < 0.001), defined as interweek differences smaller than three times the standard deviation of one week.”

      (Line 240) “Statistical testing of equivalence was conducted using the two one-sided test (TOST) procedure, which evaluates whether the difference between two time points falls within a predefined equivalence margin. Specifically, equivalence is defined as the inter-week difference being smaller than three times the standard deviation of one week. A statistically significant result in TOST (p < 0.001) supports the interpretation that the measurements are statistically equivalent, which is denoted as “equiv.” in the figures.”

      Line 237 and following: please consider rephrasing into "To further generalize these findings and examine longitudinal variation in ROI-based analysis, we used Mouse 4 as an example to show the consistency of blood flow density across different flow directions in the cortex (Fig. 6d) and extended the quantitative analysis to all three mice (Fig. 6e) (individual ULM upward and downward flow images for all three mice over the threeweek longitudinal study period can be found in Supplementary Fig. 4)." The paragraph will make much more sense.

      Response 13: We appreciate your helpful rephrasing. We have fully adopted your proposed revision to enhance the clarity and coherence of the text. The sentence now reads exactly as you recommended:

      (Line 250): “To further generalize these findings and examine longitudinal variation in ROI-based analysis, we used Mouse 4 as an example to show the consistency of blood flow density across different flow directions in the cortex (Fig. 6d) and extended the quantitative analysis to all three mice (Fig. 6e) (individual ULM upward and downward flow images for all three mice over the three-week longitudinal study period can be found in Supplementary Fig. 4).”

      Line 248: "While arterial and venous flow velocity distributions exhibit clear distinctions, their variations over the three weeks remained acceptable" the meaning of acceptable remains elusive.

      Response 14: Thank you for pointing out the ambiguity in the phrase “remained acceptable”. To improve clarity and precision, we have revised the sentence to provide a more informative description. The updated sentence now reads:

      (Line 261) “While arterial and venous flow velocity distributions exhibit clear distinctions, the distribution shapes remained relatively consistent across the three weeks. Specifically, variation in median velocity were within 1 mm/s. In contrast, anesthesia-induced changes can lead to velocity shifts exceeding 1 mm/s.”

      Line 253: consider rephrasing in "Despite subcortical regions showing the largest vascularity variability consecutive to anesthesia-induced changes, vascularity in those regions was relatively stable values in the longitudinal study" as otherwise the link between the 2 parts of the sentence feels odd.

      Response 15: Thank you for your constructive suggestion regarding the logical flow of the sentence. We fully agree with your point and have revised the sentence exactly as you proposed.

      (Line 268) “Despite subcortical regions showing the largest vascularity variability consecutive to anesthesia-induced changes, vascularity in those regions was relatively stable values in the longitudinal study.”

    1. eLife Assessment

      This important study investigates why the 13-lined ground squirrel (13LGS) retina is unusually rich in cone photoreceptors, the cells responsible for color and daylight vision. The authors perform deep transcriptomic and epigenetic comparisons between the mouse and the 13-lined ground squirrel (13LGS) to provide convincing evidence that identifies mechanisms that drive rod vs cone-rich retina development. Overall, this key question is investigated using an impressive collection of new data, cross-species analysis, and subsequent in vivo experiments. However, the functional analysis showing the sufficiency and necessity of Zic3 and Mef2C remains incomplete, and further analyses are needed to support the claim that these enhancers are newly evolved in 13LGS.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Weir et al. investigate why the 13-lined ground squirrel (13LGS) retina is unusually rich in cone photoreceptors, the cells responsible for color and daylight vision. Most mammals, including humans, have rod-dominant retinas, making the 13LGS retina both an intriguing evolutionary divergence and a valuable model for uncovering novel mechanisms of cone generation. The developmental programs underlying this adaptation were previously unknown.

      Using an integrated approach that combines single-cell RNA sequencing (scRNAseq), scATACseq, and histology, the authors generate a comprehensive atlas of retinal neurogenesis in 13LGS. Notably, comparative analyses with mouse datasets reveal that in 13LGS, cones can arise from late-stage neurogenic progenitors, a striking contrast to mouse and primate retinas, where late progenitors typically generate rods and other late-born cell types but not cones. They further identify a shift in the timing (heterochrony) of expression of several transcription factors. Further, the authors show that these factors act through species-specific regulatory elements. And overall, functional experiments support a role for several of these candidates in cone production.

      Strengths:

      This study stands out for its rigorous and multi-layered methodology. The combination of transcriptomic, epigenomic, and histological data yields a detailed and coherent view of cone development in 13LGS. Cross-species comparisons are thoughtfully executed, lending strong evolutionary context to the findings. The conclusions are, in general, well supported by the evidence, and the datasets generated represent a substantial resource for the field. The work will be of high value to both evolutionary neurobiology and regenerative medicine, particularly in the design of strategies to replace lost cone photoreceptors in human disease.

      Weaknesses:

      (1) Overall, the conclusions are strongly supported by the data, but the paper would benefit from additional clarifications. In particular, some of the conclusions could be toned down slightly to reflect that the observed changes in candidate gene function, such as those for Zic3 by itself, are modest and may represent part of a more complex regulatory network.

      (2) Additional explanations about the cell composition of the 13LGS retina are needed. The ratios between cone and rod are clearly detailed, but do those lead to changes in other cell types?

      (3) Could the lack of a clear trajectory for rod differentiation be just an effect of low cell numbers for this population?

      (4) The immunohistochemistry and RNA hybridization experiments shown in Figure S2 would benefit from supporting controls to strengthen their interpretability. While it has to be recognized that performing immunostainings on non-conventional species is not a simple task, negative controls are necessary to establish the baseline background levels, especially in cases where there seems to be labeling around the cells. The text indicates that these experiments are both immunostainings and ISH, but the figure legend only says "immunohistochemistry". Clarifying these points would improve readers' confidence in the data.

      (5) Figure S3: The text claims that overexpression of Zic3 alone is sufficient to induce the cone-like photoreceptor precursor cells as well as horizontal cell-like precursors, but this is not clear in Figure S3A nor in any other figure. Similarly, the effects of Pou2f1 overexpression are different in Figure S3A and Figure S3B. In Figure S3B, the effects described (increased presence of cone-like and horizontal-like precursors) are very clear, whereas it is not in Figure S3A. How are these experiments different?

      (6) The analyses of Zic3 conditional mutants (Figure S4) reveal an increase in many cone, rod, and pan-photoreceptor genes with only a reduction in some cone genes. Thus, the overall conclusion that Zic3 is essential for cones while repressing rod genes doesn't seem to match this particular dataset.

      (7) Throughout the text, the authors used the term "evolved". To substantiate this claim, it would be important to include sequence analyses or to rephrase to a more neutral term that does not imply evolutionary inference.

    3. Reviewer #2 (Public review):

      Summary:

      This paper aims to elucidate the gene regulatory network governing the development of cone photoreceptors, the light-sensing neurons responsible for high acuity and color vision in humans. The authors provide a comprehensive analysis through stage-matched comparisons of gene expression and chromatin accessibility using scRNA-seq and scATAC-seq from the cone-dominant 13-lined ground squirrel (13LGS) retina and the rod-dominant mouse retina. The abundance of cones in the 13LGS retina arises from a dominant trajectory from late retinal progenitor cells (RPCs) to photoreceptor precursors and then to cones, whereas only a small proportion of rods are generated from these precursors.

      Strengths:

      The paper presents intriguing insights into the gene regulatory network involved in 13LGS cone development. In particular, the authors highlight the expression of cone-promoting transcription factors such as Onecut2, Pou2f1, and Zic3 in late-stage neurogenic progenitors, which may be driven by 13LGS-specific cis-regulatory elements. The authors also characterize candidate cone-promoting genes Zic3 and Mef2C, which have been previously understudied. Overall, I found that the across-species analysis presented by this study is a useful resource for the field.

      Weaknesses:

      The functional analysis on Zic3 and Mef2C in mice does not convincingly establish that these factors are sufficient or necessary to promote cone photoreceptor specification. Several analyses lack clarity or consistency, and figure labeling and interpretation need improvement.

    4. Reviewer #3 (Public review):

      Summary:

      The authors perform deep transcriptomic and epigenetic comparisons between mouse and 13-lined ground squirrel (13LGS) to identify mechanisms that drive rod vs cone-rich retina development. Through cross-species analysis, the authors find extended cone generation in 13LGS, gene expression within progenitor/photoreceptor precursor cells consistent with a lengthened cone window, and differential regulatory element usage. Two of the transcription factors, Mef2c and Zic3, were subsequently validated using OE and KO mouse lines to verify the role of these genes in regulating competence to generate cone photoreceptors.

      Strengths:

      Overall, this is an impactful manuscript with broad implications toward our understanding of retinal development, cell fate specification, and TF network dynamics across evolution and with the potential to influence our future ability to treat vision loss in human patients. The generation of this rich new dataset profiling the transcriptome and epigenome of the 13LGS is a tremendous addition to the field that assuredly will be useful for numerous other investigations and questions of a variety of interests. In this manuscript, the authors use this dataset and compare it to data they previously generated for mouse retinal development to identify 2 new regulators of cone generation and shed insights into their regulation and their integration into the network of regulatory elements within the 13LGS compared to mouse.

      Weaknesses:

      (1) The authors chose to omit several cell classes from analyses and visualizations that would have added to their interpretations. In particular, I worry that the omission of 13LGS rods, early RPCs, and early NG from Figures 2C, D, and F is notable and would have added to the understanding of gene expression dynamics. In other words, (a) are these genes of interest unique to late RPCs or maintained from early RPCs, and (b) are rod networks suppressed compared to the mouse?

      (2) The authors claim that the majority of cones are generated by late RPCs and that this is driven primarily by the enriched enhancer network around cone-promoting genes. With the temporal scRNA/ATACseq data at their disposal, the authors should compare early vs late born cones and RPCs to determine whether the same enhancers and genes are hyperactivated in early RPCs as well as in the 13LGS. This analysis will answer the important question of whether the enhancers activated/evolved to promote all cones, or are only and specifically activated within late RPCs to drive cone genesis at the expense of rods.

      (3) The authors repeatedly use the term 'evolved' to describe the increased number of local enhancer elements of genes that increase in expression in 13LGS late RPCs and cones. Evolution can act at multiple levels on the genome and its regulation. The authors should consider analysis of sequence level changes between mouse, 13LGS, and other species to test whether the enhancer sequences claimed to be novel in the 13LGS are, in fact, newly evolved sequence/binding sites or if the binding sites are present in mouse but only used in late RPCs of the 13LGS.

      (4) The authors state that 'Enhancer elements in 13LGS are predicted to be directly targeted by a considerably greater number of transcription factors than in mice'. This statement can easily be misread to suggest that all enhancers display this, when in fact, this is only the cone-promoting enhancers of late 13LGS RPCs. In a way, this is not surprising since these genes are largely less expressed in mouse vs 13LGS late RPCs, as shown in Figure 2. The manuscript is written to suggest this mechanism of enhancer number is specific to cone production in the 13LGS- it would help prove this point if the authors asked the opposite question and showed that mouse late RPCs do not have similar increased predicted binding of TFs near rod-promoting genes in C7-8.

    1. eLife Assessment

      This important study shows that calcium stores in the endoplasmic reticulum of the parasitic protozoan, Toxoplasma gondii play a major role in buffering calcium levels in the cytosol as well as other organelles such as the mitochondrion. Advanced imaging techniques, including use of genetically encoded calcium indicators provide compelling evidence for the role of the SERCA-Ca2+ ATPase pump in regulating organellar calcium levels. However, it remains unclear whether intra-organellar calcium transport occurs via ER-mitochondria membrane contact sites or other mechanisms. This work will be of interest to cell and molecular biologists interested in calcium signalling in divergent eukaryotes.

    2. Reviewer #1 (Public review):

      Li et al. investigate Ca2+ signaling in T. gondii and argue that Ca2+ tunnels through the ER to other organelles to fuel multiple aspects of T. gondii biology. They focus in particular on TgSERCA as the presumed primary mechanism for ER Ca2+ filling. Although, when TgSERCA was knocked out there was still a Ca2+ release in response to TG present. Overall the data supports a model where the Ca2+ filling state of the ER modulates Ca2+ dynamics in other organelles.

      Comments on revisions:

      I thank the authors for their careful revisions and response to my comments, which have been addressed.

      Regarding the most critical point of the paper that is Ca2+ transfer from the ER to other organelles, the authors in their rebuttal and in the revised manuscript argue that ER Ca2+ is critical to redistribute and replenish Ca2+ in other organelles in the cell. I agree this conclusion and think it is best stated in the authors' response to point #7: "We propose that this leaked calcium is subsequently taken up by other intracellular compartments. This effect is observed immediately upon TG addition. However, pre-incubation with TG or knockdown of SERCA reduces calcium storage in the ER, thereby diminishing the transfer of calcium to other stores."

      In their rebuttal the authors particularly highlight experiments in Figures 1H-K, 4G-H, and 5H-K in support of this conclusion. The data in Fig 1H-K show that with TG there is increased Ca2+ release from acidic stores. In all cases TG results in a rise in cytoplasmic Ca2+ that could load the acidic stores. So under those conditions the increased acidic organelle Ca2+ is likely due to a preceding high cytosolic Ca2+ transient due to TG. The experiments in 4G-H and 5H-K are more convincing and supportive of an important role of ER Ca2+ to maintain Ca2+ levels in other organelles. Overall, and to avoid a detailed, lengthy discussion of every point, the data support a model where in the absence of SERCA activity ER Ca2+ is reduced as well as Ca2+ in other organelles. I think it would be helpful to present and discuss this finding throughout the manuscript as under physiological conditions ER Ca2+ is regularly mobilized for signaling and homeostasis and this maintains Ca2+ levels in other organelles. This is supported by the new experiment in Supp Fig. 2A.

    1. eLife Assessment

      Whole-brain imaging of neuronal activity in freely behaving animals holds great promise for neuroscience, but numerous technical challenges limit its use. In this important study, the authors describe a new set of deep learning-based tools to track and identify the activity of head neurons in freely moving nematodes (C. elegans) and jellyfish (Clytia hemisphaerica). While the tools convincingly enable high tracking speed and accuracy in the settings in which the authors have evaluated them, the claim that these tools should be easily generalizable to a wide variety of datasets is incompletely supported.

    2. Reviewer #1 (Public review):

      In this important study, the authors develop a suite of machine vision tools to identify and align fluorescent neuronal recording images in space and time according to neuron identity and position. The authors provide compelling evidence for the speed and utility of these tools. While such tools have been developed in the past (including by the authors), the key advancement here is the speed and broad utility of these new tools. While prior approaches based on steepest descent worked, they required hundreds of hours of computational time, while the new approaches outlined here are >600-fold faster. The machine vision tools here should be immediately useful to readers specifically interested in whole-brain C. elegans data, but also for more general readers who may be interested in using BrainAlignNet for tracking fluorescent neuronal recordings from other systems.

      I really enjoyed reading this paper. The authors had several ground truth examples to quantify the accuracy of their algorithms and identified several small caveats users should consider when using these tools. These tools were primarily developed for C. elegans, an animal with stereotyped development, but whose neurons can be variably located due to internal motion of the body. The authors provide several examples of how BrainAlignNet reliably tracked these neurons over space and time. Neuron identity is also important to track, and the authors showed how AutoCellLoader can reliably identify neurons based on their fluorescence in the NeuroPAL background. A challenge with NeuroPAL though, is the high expression of several fluorophores, which compromises behavioral fidelity. The authors provide some possible avenues where this problem can be addressed by expressing fewer fluorophores. While using all four channels provided the best performance, only using the tagRFP and CyOFP channels was sufficient for performance that was close to full performance using all 4 NeuroPAL channels. This result indicates that the development of future lines with less fluorophore expression could be sufficient for reliable neuronal identification, which would decrease the genetic load on the animal, but also open other fluorescent channels that could be used for tracking other fluorescent tools/markers. Even though these tools were developed for C. elegans specifically, they showed BrainAlignNet can be applied to other organisms as well (in their case, the cnidarian C. hemisphaerica), which broadens the utility of their tools.

      Strengths:

      (1) The authors have a wealth of ground-truth training data to compare their algorithms against, and provide a variety of metrics to assess how well their new tools perform against hand annotation and/or prior algorithms.

      (2) For BrainAlignNet, the authors show how this tool can be applied to other organisms besides C. elegans.

      (3) The tools are publicly available on GitHub, which includes useful README files and installation guidance.

      Weaknesses:

      (1) Most of the utility of these algorithms is for C. elegans specifically. Testing their algorithms (specifically BrainAlignNet) on more challenging problems, such as whole-brain zebrafish, would have been interesting. This is a very, very minor weakness, though.

      (2) The tools are benchmarked against their own prior pipeline, but not against other algorithms written for the same purpose.

      (3) Considerable pre-processing was done before implementation. Expanding upon this would improve accessibility of these tools to a wider audience.

    3. Reviewer #2 (Public review):

      Summary:

      The paper introduced the pipeline to analyze brain imaging of freely moving animals: registering deforming tissues and maintaining consistent cell identities over time. The pipeline consists of three neural networks that are built upon existing models: BrainAlignNet for non-rigid registration, AutoCellLabeler for supervised annotation of over 100 neuronal types, and CellDiscoveryNet for unsupervised discovery of cell identities. The ambition of the work is to enable high-throughput and largely automated pipelines for neuron tracking and labeling in deforming nervous systems.

      Strengths:

      (1) The paper tackles a timely and difficult problem, offering an end-to-end system rather than isolated modules.

      (2) The authors report high performance within their dataset, including single-pixel registration accuracy, nearly complete neuron linking over time, and annotation accuracy that exceeds individual human labelers.

      (3) Demonstrations across two organisms suggest the methods could be transferable, and the integration of supervised and unsupervised modules is of practical utility.

      Weaknesses:

      (1) Lack of solid evaluation. Despite strong results on their own data, the work is not benchmarked against existing methods on community datasets, making it hard to evaluate relative performance or generality.

      (2) Lack of novelty. All three models do not incorporate state-of-the-art advances from the respective fields. BrainAlignNet does not learn from the latest optical flow literature, relying instead on relatively conventional architectures. AutoCellLabeler does not utilize the advanced medNeXt3D architectures for supervised semantic segmentation. CellDiscoveryNet is presented as unsupervised discovery but relies on standard clustering approaches, with limited evaluation on only a small test set.

      (3) Lack of robustness. BrainAlignNet requires dataset-specific training and pre-alignment strategies, limiting its plug-and-play use. AutoCellLabeler depends heavily on raw intensity patterns of neurons, making it brittle to pose changes. By contrast, current state-of-the-art methods incorporate spatial deformation atlases or relative spatial relationships, which provide robustness across poses and imaging conditions. More broadly, the ANTSUN 2.0 system depends on numerous manually tuned weights and thresholds, which reduces reproducibility and generalizability beyond curated conditions.

      Evaluation:

      To make the evaluation more solid, it would be great for the authors to (1) apply the new method on existing datasets and (2) apply baseline methods on their own datasets. Otherwise, without comparison, it is unclear if the proposed method is better or not. The following papers have public challenging tracking data: https://elifesciences.org/articles/66410, https://elifesciences.org/articles/59187, https://www.nature.com/articles/s41592-023-02096-3.

      Methodology:

      (1) The model innovations appear incrementally novel relative to existing work. The authors should articulate what is fundamentally different (architectural choices, training objectives, inductive biases) and why those differences matter empirically. Ablations isolating each design choice would help.

      (2) The pipeline currently depends on numerous manually set hyperparameters and dataset-specific preprocessing. Please provide principled guidelines (e.g., ranges, default settings, heuristics) and a robustness analysis (sweeps, sensitivity curves) to show how performance varies with these choices across datasets; wherever possible, learn weights from data or replace fixed thresholds with data-driven criteria.

      Appraisal:

      The authors partially achieve their aims. Within the scope of their dataset, the pipeline demonstrates impressive performance and clear practical value. However, the absence of comparisons with state-of-the-art algorithms such as ZephIR, fDNC, or WormID, combined with small-scale evaluation (e.g., ten test volumes), makes the strength of evidence incomplete. The results support the conclusion that the approach is useful for their lab's workflow, but they do not establish broader robustness or superiority over existing methods.

      Impact:

      Even though the authors have released code, the pipeline requires heavy pre- and post-processing with numerous manually tuned hyperparameters, which limits its practical applicability to new datasets. Indeed, even within the paper, BrainAlignNet had to be adapted with additional preprocessing to handle the jellyfish data. The broader impact of the work will depend on systematic benchmarking against community datasets and comparison with established methods. As such, readers should view the results as a promising proof of concept rather than a definitive standard for imaging in deformable nervous systems.

    4. Reviewer #3 (Public review):

      Context:

      Tracking cell trajectories in deformable organs, such as the head neurons of freely moving C. elegans, is a challenging task due to rapid, non-rigid cellular motion. Similarly, identifying neuron types in the worm brain is difficult because of high inter-individual variability in cell positions.

      Summary:

      In this study, the authors developed a deep learning-based approach for cell tracking and identification in deformable neuronal images. Several different CNN models were trained to: (1) register image pairs without severe deformation, and then track cells across continuous image sequences using multiple registration results combined with clustering strategies; (2) predict neuron IDs from multicolor-labeled images; and (3) perform clustering across multiple multicolor images to automatically generate neuron IDs.

      Strengths:

      Directly using raw images for registration and identification simplifies the analysis pipeline, but it is also a challenging task since CNN architectures often struggle to capture spatial relationships between distant cells. Surprisingly, the authors report very high accuracy across all tasks. For example, the tracking of head neurons in freely moving worms reportedly reached 99.6% accuracy, neuron identification achieved 98%, and automatic classification achieved 93% compared to human annotations.

      Weaknesses:

      (1) The deep networks proposed in this study for registration and neuron identification require dataset-specific training, due to variations in imaging conditions across different laboratories. This, in turn, demands a large amount of manually or semi-manually annotated training data, including cell centroid correspondences and cell identity labels, which reduces the overall practicality and scalability of the method.

      (2) The cell tracking accuracy was not rigorously validated, but rather estimated using a biased and coarse approach. Specifically, the accuracy was assessed based on the stability of GFP signals in the eat-4-labeled channel. A tracking error was assumed to occur when the GFP signal switched between eat-4-negative and eat-4-positive at a given time point. However, this estimation is imprecise and only captures a small subset of all potential errors. Although the authors introduced a correction factor to approximate the true error rate, the validity of this correction relies on the assumption that eat-4 neurons are uniformly distributed across the brain - a condition that is unlikely to hold.

      (3) Figure S1F demonstrates that the registration network, BrainAlignNet, alone is insufficient to accurately align arbitrary pairs of C. elegans head images. The high tracking accuracy reported is largely due to the use of a carefully designed registration sequence, matching only images with similar postures, and an effective clustering algorithm. Although the authors address this point in the Discussion section, the abstract may give the misleading impression that the network itself is solely responsible for the observed accuracy.

      (4) The reported accuracy for neuron identification and automatic classification may be misleading, as it was assessed only on a subset of neurons labeled as "high-confidence" by human annotators. Although the authors did not disclose the exact proportion, various descriptions (such as Figure 4f) imply that this subset comprises approximately 60% of all neurons. While excluding uncertain labels is justifiable, the authors highlight the high accuracy achieved on this subset without clearly clarifying that the reported performance pertains only to neurons that are relatively easy to identify. Furthermore, they do not report what fraction of the total neuron population can be accurately identified using their methods-an omission of critical importance for prospective users.

    5. Author response:

      Reviewer #1 (Public review):

      In this important study, the authors develop a suite of machine vision tools to identify and align fluorescent neuronal recording images in space and time according to neuron identity and position. The authors provide compelling evidence for the speed and utility of these tools. While such tools have been developed in the past (including by the authors), the key advancement here is the speed and broad utility of these new tools. While prior approaches based on steepest descent worked, they required hundreds of hours of computational time, while the new approaches outlined here are >600-fold faster. The machine vision tools here should be immediately useful to readers specifically interested in whole-brain C. elegans data, but also for more general readers who may be interested in using BrainAlignNet for tracking fluorescent neuronal recordings from other systems.

      I really enjoyed reading this paper. The authors had several ground truth examples to quantify the accuracy of their algorithms and identified several small caveats users should consider when using these tools. These tools were primarily developed for C. elegans, an animal with stereotyped development, but whose neurons can be variably located due to internal motion of the body. The authors provide several examples of how BrainAlignNet reliably tracked these neurons over space and time. Neuron identity is also important to track, and the authors showed how AutoCellLoader can reliably identify neurons based on their fluorescence in the NeuroPAL background. A challenge with NeuroPAL though, is the high expression of several fluorophores, which compromises behavioral fidelity. The authors provide some possible avenues where this problem can be addressed by expressing fewer fluorophores. While using all four channels provided the best performance, only using the tagRFP and CyOFP channels was sufficient for performance that was close to full performance using all 4 NeuroPAL channels. This result indicates that the development of future lines with less fluorophore expression could be sufficient for reliable neuronal identification, which would decrease the genetic load on the animal, but also open other fluorescent channels that could be used for tracking other fluorescent tools/markers. Even though these tools were developed for C. elegans specifically, they showed BrainAlignNet can be applied to other organisms as well (in their case, the cnidarian C. hemisphaerica), which broadens the utility of their tools.

      Strengths:

      (1) The authors have a wealth of ground-truth training data to compare their algorithms against, and provide a variety of metrics to assess how well their new tools perform against hand annotation and/or prior algorithms.

      (2) For BrainAlignNet, the authors show how this tool can be applied to other organisms besides C. elegans.

      (3) The tools are publicly available on GitHub, which includes useful README files and installation guidance.

      We thank the reviewer for noting these strengths of our study.

      Weaknesses:

      (1) Most of the utility of these algorithms is for C. elegans specifically. Testing their algorithms (specifically BrainAlignNet) on more challenging problems, such as whole-brain zebrafish, would have been interesting. This is a very, very minor weakness, though.

      We appreciate the reviewer’s point that expanding to additional animal models would be valuable. In the study, we have so far tested our approaches on C. elegans and Jellyfish. Given that this is considered a ‘very, very minor weakness’ and that it does not directly affect the results or analyses in the paper, we think this might be better to address in future work.

      (2) The tools are benchmarked against their own prior pipeline, but not against other algorithms written for the same purpose.

      We agree that it would be valuable to benchmark other labs’ software pipelines on our datasets. We note that most papers in this area, which describe those pipelines, provide the same performance metrics that we do (accuracy of neuron identification, tracking accuracy, etc), so a crude, first-order comparison can be obtained by comparing the numbers in the papers. But, we agree that a rigorous head-to-head comparison would require applying these different pipelines to a common dataset. We considered performing these analyses, but we were concerned that using other labs’ software ‘off the shelf’ on our data might not represent those pipelines in their best light when compared to our pipeline that was developed with our data in mind. Data from different microscopy platforms can be surprisingly different and we wouldn’t want to perform an analysis that had this bias. Therefore, we feel that this comparison would be best pursued by all of these labs collaboratively (so that they can each provide input on how to run their software optimally). Indeed, this is an important area for future study. In this spirit, we have been sharing our eat-4::GFP datasets (that permit quantification of tracking accuracy) with other labs looking for additional ways to benchmark their tracking software.

      We also note that there are not really any pipelines to directly compare against CellDiscoveryNet, as we are not aware of any other fully unsupervised approach for neuron identification in C. elegans.

      (3) Considerable pre-processing was done before implementation. Expanding upon this would improve accessibility of these tools to a wider audience.

      Indeed, some pre-processing was performed on images before registration and neuron identification -- understanding these nuances can be important. The pre-processing steps are described in the Results section and detailed in the Methods. They are also all available in our open-source software. For BrainAlignNet, the key steps were: (1) selecting image registration problems, (2) cropping, and (3) Euler alignment. Steps (1) and (3) were critically important and are extensively discussed in the Results and Discussion sections of our study (lines 142-144, 218-234, 318-323, 704-712). Step (2) is standard in image processing. For AutoCellLabeler and CellDiscoveryNet, the pre-processing was primarily to align the 4 NeuroPAL color channels to each other (i.e. make sure the blue/red/orange/etc channels for an animal are perfectly aligned). This is also just a standard image processing step to ensure channel alignment. Thus, the more “custom” pre-processing steps were extensively discussed in the study and the more “common” steps are still described in the Methods. The implementation of all steps is available in our open-source software.

      Reviewer #2 (Public review):

      Summary:

      The paper introduced the pipeline to analyze brain imaging of freely moving animals: registering deforming tissues and maintaining consistent cell identities over time. The pipeline consists of three neural networks that are built upon existing models: BrainAlignNet for non-rigid registration, AutoCellLabeler for supervised annotation of over 100 neuronal types, and CellDiscoveryNet for unsupervised discovery of cell identities. The ambition of the work is to enable high-throughput and largely automated pipelines for neuron tracking and labeling in deforming nervous systems.

      Strengths:

      (1) The paper tackles a timely and difficult problem, offering an end-to-end system rather than isolated modules.

      (2) The authors report high performance within their dataset, including single-pixel registration accuracy, nearly complete neuron linking over time, and annotation accuracy that exceeds individual human labelers.

      (3) Demonstrations across two organisms suggest the methods could be transferable, and the integration of supervised and unsupervised modules is of practical utility.

      We thank the reviewer for noting these strengths of our study.

      Weaknesses:

      (1) Lack of solid evaluation. Despite strong results on their own data, the work is not benchmarked against existing methods on community datasets, making it hard to evaluate relative performance or generality.

      We agree that it would be valuable to benchmark many labs’ software pipelines on some common datasets, ideally from several different research labs. We note that most papers in this area, which describe the other pipelines that have been developed, provide the same performance metrics that we do (accuracy of neuron identification, tracking accuracy, etc), so a crude, first-order comparison can be obtained by comparing the numbers in the papers. But, we agree that a rigorous head-to-head comparison would require applying these different pipelines to a common dataset. We considered performing these analyses, but we were concerned that using other labs’ software ‘off the shelf’ and comparing the results to our pipeline (where we have extensive expertise) might bias the performance metrics in favor of our software. Therefore, we feel that this comparison would be best pursued by all of these labs collaboratively (so that they can each provide input on how to run their software optimally). Indeed, this is an important area for future study. In this spirit, we have been sharing our eat-4::GFP datasets (that permit quantification of tracking accuracy) with other labs looking for additional ways to benchmark their tracking software.

      We also note that there are not really any pipelines to directly compare against CellDiscoveryNet, as we are not aware of any other fully unsupervised approach for neuron identification in C. elegans.

      (2) Lack of novelty. All three models do not incorporate state-of-the-art advances from the respective fields. BrainAlignNet does not learn from the latest optical flow literature, relying instead on relatively conventional architectures. AutoCellLabeler does not utilize the advanced medNeXt3D architectures for supervised semantic segmentation. CellDiscoveryNet is presented as unsupervised discovery but relies on standard clustering approaches, with limited evaluation on only a small test set.

      We appreciate that the machine learning field moves fast. Our goal was not to invent entirely novel machine learning tools, but rather to apply and optimize tools for a set of challenging, unsolved biological problems. We began with the somewhat simpler architectures described in our study and were largely satisfied with their performance. It is conceivable that newer approaches would perhaps lead to even greater accuracy, flexibility, and/or speed. But, oftentimes, simple or classical solutions can adequately resolve specific challenges in biological image processing.

      Regarding CellDiscoveryNet, our claim of unsupervised training is precise: CellDiscoveryNet is trained end-to-end only on raw images, with no human annotations, pseudo-labels, external classifiers, or metadata used for training, model selection, or early stopping. The loss is defined entirely from the input data (no label signal). By standard usage in machine learning, this constitutes unsupervised (often termed “self-supervised”) representation learning. Downstream clustering is likewise unsupervised, consuming only image pairs registered by CellDiscoveryNet and neuron segmentations produced by our previously-trained SegmentationNet (which provides no label information).

      (3) Lack of robustness. BrainAlignNet requires dataset-specific training and pre-alignment strategies, limiting its plug-and-play use. AutoCellLabeler depends heavily on raw intensity patterns of neurons, making it brittle to pose changes. By contrast, current state-of-the-art methods incorporate spatial deformation atlases or relative spatial relationships, which provide robustness across poses and imaging conditions. More broadly, the ANTSUN 2.0 system depends on numerous manually tuned weights and thresholds, which reduces reproducibility and generalizability beyond curated conditions.

      Regarding BrainAlignNet: we agree that we trained on each species’ own data (worm, jellyfish) and we would suggest other labs working on new organisms to do the same based on our current state of knowledge. It would be fantastic if there was an alignment approach that generalized to all possible cases of non-rigid-registration in all animals – an important area for future study. We also agree that pre-alignment was critical in worms and jellyfish, which we discuss extensively in our study (lines 142-144, 318-321, 704-712).

      Regarding AutoCellLabeler: the animals were not recorded in any standardized pose and were not aligned to each other beforehand – they were basically in a haphazard mix of poses and we used image augmentation to allow the network to generalize to other poses, as described in our study. It is still possible that AutoCellLabeler is somehow brittle to pose changes (e.g. perhaps extremely curved worms) – while we did not detect this in our analyses, we did not systematically evaluate performance across all possible poses. However, we do note that this network was able to label images taken from freely-moving worms, which by definition exhibit many poses (Figure 5D, lines 500-525); aggregating the network’s performance across freely-moving data points allowed it to nearly match its performance on high-SNR immobilized data. This suggests a degree of robustness of the AutoCellLabeler network to pose changes.

      Regarding ANTSUN 2.0: we agree that there are some hyperparameters (described in our study) that affect ANTSUN performance. We agree that it would be worthwhile to fully automate setting these in future iterations of the software.

      Evaluation:

      To make the evaluation more solid, it would be great for the authors to (1) apply the new method on existing datasets and (2) apply baseline methods on their own datasets. Otherwise, without comparison, it is unclear if the proposed method is better or not. The following papers have public challenging tracking data: https://elifesciences.org/articles/66410, https://elifesciences.org/articles/59187, https://www.nature.com/articles/s41592-023-02096-3.

      Please see our response to your point (1) under Weaknesses above.

      Methodology:

      (1) The model innovations appear incrementally novel relative to existing work. The authors should articulate what is fundamentally different (architectural choices, training objectives, inductive biases) and why those differences matter empirically. Ablations isolating each design choice would help.

      There are other efforts in the literature to solve the neuron tracking and neuron identification problems in C. elegans (please see paragraphs 4 and 5 of our Introduction, which are devoted to describing these). However, they are quite different in the approaches that they use, compared to our study. For example, for neuron tracking they use t->t+1 methods, or model neurons as point clouds, etc (a variety of approaches have been tried). For neuron identification, they work on extracted features from images, or use statistical approaches rather than deep neural networks, etc (a variety of approaches have been tried). Our assessment is that each of these diverse approaches has strengths and drawbacks; we agree that a meta-analysis of the design choices used across studies could be valuable.

      We also note that there are not really any pipelines to directly compare against CellDiscoveryNet, as we are not aware of any other fully unsupervised approach for neuron identification in C. elegans.

      (2) The pipeline currently depends on numerous manually set hyperparameters and dataset-specific preprocessing. Please provide principled guidelines (e.g., ranges, default settings, heuristics) and a robustness analysis (sweeps, sensitivity curves) to show how performance varies with these choices across datasets; wherever possible, learn weights from data or replace fixed thresholds with data-driven criteria.

      We agree that there are some ANTSUN 2.0 hyperparameters (described in our Methods section) that could affect the quality of neuron tracking. It would be worthwhile to fully automate setting these in future iterations of the software, ensuring that the hyperparameter settings are robust to variation in data/experiments.

      Appraisal:

      The authors partially achieve their aims. Within the scope of their dataset, the pipeline demonstrates impressive performance and clear practical value. However, the absence of comparisons with state-of-the-art algorithms such as ZephIR, fDNC, or WormID, combined with small-scale evaluation (e.g., ten test volumes), makes the strength of evidence incomplete. The results support the conclusion that the approach is useful for their lab's workflow, but they do not establish broader robustness or superiority over existing methods.

      We wish to remind the reviewer that we developed BrainAlignNet for use in worms and jellyfish. These two animals have different distributions of neurons and radically different anatomy and movement patterns. Data from the two organisms was collected in different labs (Flavell lab, Weissbourd lab) on different types of microscopes (spinning disk, epifluorescence). We believe that this is a good initial demonstration that the approach has robustness across different settings.

      Regarding comparisons to other labs’ C. elegans data processing pipelines, we agree that it will be extremely valuable to compare performance on common datasets, ideally collected in multiple different research labs. But we believe this should be performed collaboratively so that all software can be utilized in their best light with input from each lab, as described above. We agree that such a comparison would be very valuable.

      Impact:

      Even though the authors have released code, the pipeline requires heavy pre- and post-processing with numerous manually tuned hyperparameters, which limits its practical applicability to new datasets. Indeed, even within the paper, BrainAlignNet had to be adapted with additional preprocessing to handle the jellyfish data. The broader impact of the work will depend on systematic benchmarking against community datasets and comparison with established methods. As such, readers should view the results as a promising proof of concept rather than a definitive standard for imaging in deformable nervous systems.

      Regarding worms vs jellyfish pre-processing: we actually had the exact opposite reaction to that of the reviewer. We were surprised at how similar the pre-processing was for these two very different organisms. In both cases, it was essential to (1) select appropriate registration problems to be solved; and (2) perform initialization with Euler alignment. Provided that these two challenges were solved, BrainAlignNet mostly took care of the rest. This suggests a clear path for researchers who wish to use this approach in another animal. Nevertheless, we also agree with the reviewer’s caution that a totally different use case could require some re-thinking or re-strategizing. For example, the strategy of how to select good registration problems could depend on the form of the animal’s movement.

      Reviewer #3 (Public review):

      Context:

      Tracking cell trajectories in deformable organs, such as the head neurons of freely moving C. elegans, is a challenging task due to rapid, non-rigid cellular motion. Similarly, identifying neuron types in the worm brain is difficult because of high inter-individual variability in cell positions.

      Summary:

      In this study, the authors developed a deep learning-based approach for cell tracking and identification in deformable neuronal images. Several different CNN models were trained to: (1) register image pairs without severe deformation, and then track cells across continuous image sequences using multiple registration results combined with clustering strategies; (2) predict neuron IDs from multicolor-labeled images; and (3) perform clustering across multiple multicolor images to automatically generate neuron IDs.

      Strengths:

      Directly using raw images for registration and identification simplifies the analysis pipeline, but it is also a challenging task since CNN architectures often struggle to capture spatial relationships between distant cells. Surprisingly, the authors report very high accuracy across all tasks. For example, the tracking of head neurons in freely moving worms reportedly reached 99.6% accuracy, neuron identification achieved 98%, and automatic classification achieved 93% compared to human annotations.

      We thank the reviewer for noting these strengths of our study.

      Weaknesses:

      (1) The deep networks proposed in this study for registration and neuron identification require dataset-specific training, due to variations in imaging conditions across different laboratories. This, in turn, demands a large amount of manually or semi-manually annotated training data, including cell centroid correspondences and cell identity labels, which reduces the overall practicality and scalability of the method.

      We performed dataset-specific training for image registration and neuron identification, and we would encourage new users to do the same based on our current state of knowledge. This highlights how standardization of whole-brain imaging data across labs is an important issue for our field to address and that, without it, variations in imaging conditions could impact software utility. We refer the reviewer to an excellent study by Sprague et al. (2025) on this topic, which is cited in our study.

      However, at the same time, we wish to note that it was actually reasonably straightforward to take the BrainAlignNet approach that we initially developed in C. elegans and apply it to jellyfish. Some of the key lessons that we learned in C. elegans generalized: in both cases, it was critical to select the right registration problems to solve and to preprocess with Euler registration for good initialization. Provided that those problems were solved, BrainAlignNet could be applied to obtain high-quality registration and trace extraction. Thus, our study provides clear suggestions on how to use these tools across multiple contexts.

      (2) The cell tracking accuracy was not rigorously validated, but rather estimated using a biased and coarse approach. Specifically, the accuracy was assessed based on the stability of GFP signals in the eat-4-labeled channel. A tracking error was assumed to occur when the GFP signal switched between eat-4-negative and eat-4-positive at a given time point. However, this estimation is imprecise and only captures a small subset of all potential errors. Although the authors introduced a correction factor to approximate the true error rate, the validity of this correction relies on the assumption that eat-4 neurons are uniformly distributed across the brain - a condition that is unlikely to hold.

      We respectfully disagree with this critique. We considered the alternative suggested by the reviewer (in their private comments to the authors) of comparing against a manually annotated dataset. But this annotation would require manually linking ~150 neurons across ~1600 timepoints, which would require humans to manually link neurons across timepoints >200,000 times for a single dataset. These datasets consist of densely packed neurons rapidly deforming over time in all 3 dimensions. Moreover, a single error in linking would propagate across timepoints, so the error tolerance of such annotation would be extremely low. Any such manually labeled dataset would be fraught with errors and should not be trusted. Instead, our approach relies on a simple, accurate assumption: GFP expression in a neuron should be roughly constant over a 16min recording (after bleach correction) and the levels will be different in different neurons when it is sparsely expressed. Because all image alignment is done in the red channel, the pipeline never “peeks” at the GFP until it is finished with neuron alignment and tracking. The eat-4 promoter was chosen for GFP expression because (a) the nuclei labeled by it are scattered across the neuropil in a roughly salt-and-pepper fashion – a mixture of eat-4-positive and eat-4-negative neurons are found throughout the head; and (b) it is in roughly 40% of the neurons, giving very good overall coverage. Our view is that this approach of labeling subsets of neurons with GFP should become the standard in the field for assessing tracking accuracy – it has a simple, accurate premise; is not susceptible to human labeling error; is straightforward to implement; and, since it does not require manual labeling, is easy to scale to multiple datasets. We do note that it could be further strengthened by using multiple strains each with different ‘salt-and-pepper’ GFP expression patterns.

      (3) Figure S1F demonstrates that the registration network, BrainAlignNet, alone is insufficient to accurately align arbitrary pairs of C. elegans head images. The high tracking accuracy reported is largely due to the use of a carefully designed registration sequence, matching only images with similar postures, and an effective clustering algorithm. Although the authors address this point in the Discussion section, the abstract may give the misleading impression that the network itself is solely responsible for the observed accuracy.

      Our tracking accuracy requires (a) a careful selection of registration problems, (b) highly accurate registration of the selected registration problems, and (c) effective clustering. We extensively discussed the importance of the choosing of the registration problems in the Results section (lines 218-234 and 318-321), Discussion section (lines 704-708), and Methods section (955-970 and 1246-1250) of our paper. We also discussed the clustering aspect in the Results section (lines 247-259), Discussion section (lines 708-712), and Methods section (lines 1162-1206). In addition, our abstract states that the BrainAlignNet needs to be “incorporated into an image analysis pipeline,” to inform readers that other aspects of image analysis need to occur (beyond BrainAlignNet) to perform tracking.

      (4) The reported accuracy for neuron identification and automatic classification may be misleading, as it was assessed only on a subset of neurons labeled as "high-confidence" by human annotators. Although the authors did not disclose the exact proportion, various descriptions (such as Figure 4f) imply that this subset comprises approximately 60% of all neurons. While excluding uncertain labels is justifiable, the authors highlight the high accuracy achieved on this subset without clearly clarifying that the reported performance pertains only to neurons that are relatively easy to identify. Furthermore, they do not report what fraction of the total neuron population can be accurately identified using their methods-an omission of critical importance for prospective users.

      The reviewer raises two points here: (1) whether AutoCellLabeler accuracy is impacted by ease of human labeling; and (2) what fraction of total neurons are identified. We address them one at a time.

      Regarding (1), we believe that the reviewer overlooked an important analysis in our study. Indeed, to assess its performance, one can only compare AutoCellLabeler’s output against accurate human labels – there is simply no way around it. However, we noted that AutoCellLabeler was identifying some neurons with high confidence even when humans had low confidence or had not even tried to label the neurons (Fig. 4F). To test whether these were in fact accurate labels, we asked additional human labelers to spend extra time trying to label a random subset of these neurons (they were of course blinded to the AutoCellLabeler label). We then assessed the accuracy of AutoCellLabeler against these new human labels and found that they were highly accurate (Fig. 4H). This suggests that AutoCellLabeler has strong performance even when some human labelers find it challenging to label a neuron. However, we agree that we have not yet been able to quantify AutoCellLabeler performance on the small set of neuron classes that humans are unable to identify across datasets.

      Regarding (2), we agree that knowing how many neurons are labeled by AutoCellLabeler is critical. For example, labeling only 3 neurons per animal with 100% accuracy isn’t very helpful. We wish to emphasize that we did not omit this information: we reported the number of neurons labeled for every network that we characterized in the study, alongside the accuracy of those labels (please see Figures 4I, 5A, and 6G; Figure 4I also shows the number of human labels per dataset, which the reviewer requested). We also showed curves depicting the tradeoff between accuracy and number of neurons labeled, which fully captures how we balanced accuracy and number of neurons labeled (Figures 5D and S4A). It sounds like the reviewer also wanted to know the total number of recorded neurons. The typical number of recorded neurons per dataset can also be found in the paper in Fig. 2E.

    1. eLife Assessment

      This study makes a novel and valuable contribution by adapting step selection functions, traditionally used in animal ecology, to explore human movement and environmental risk exposure in urban slums, offering a promising framework for spatial epidemiology, particularly regarding leptospirosis. The integration of GPS telemetry with environmental data and the stratification by gender and serostatus are notable strengths that enhance the study's relevance for public health applications. The strength of evidence is compelling.

    2. Reviewer #1 (Public review):

      Summary:

      The study investigated how individuals living in urban slums in Salvador, Brazil, interact with environmental risk factors, particularly focusing on domestic rubbish piles, open sewers, and a central stream. The study makes use of the step selection functions using telemetry data, which is a method to estimate how likely individuals move towards these environmental features, differentiating among groups by gender, age, and leptospirosis serostatus. The results indicated that women tended to stay closer to the central stream while avoiding open sewers more than men. Furthermore, individuals who tested positive for leptospirosis tended to avoid open sewers, suggesting that behavioral patterns might influence exposure to risk factors for leptospirosis, hence ensuring more targeted interventions.

      Strengths:

      (1) The use of step selection functions to analyze human movement represents an innovative adaptation of a method typically used in animal ecology. This provides a robust quantitative framework for evaluating how people interact with environmental risk factors linked to infectious diseases (in this case, leptospirosis).

      (2) Detailed differentiation by gender and serological status allows for nuanced insights, which can help tailor targeted interventions and potentially improve public health measures in urban slum settings.

      (3) The integration of real-world telemetry data with epidemiological risk factors supports the development of predictive models that can be applied in future infectious disease research, helping to bridge the gap between environmental exposure and health outcomes.

    3. Reviewer #2 (Public review):

      Summary:

      Pablo Ruiz Cuenca et al. conducted a GPS logger study with 124 adult participants across four different slum areas in Salvador, Brazil, recording GPS locations every 35 seconds for 48 hours. The aim of their study was to investigate step-selection models, a technique widely used in movement ecology to quantify contact with environmental risk factors for exposure to leptospires (open sewers, community streams, and rubbish piles). The authors built two different types of models based on distance and based on buffer areas to model human environmental exposure to risk factors. They show differences in movement/contact with these risk factors based on gender and seropositivity status. This study shows the existence of modest differences in contact with environmental risk factors for leptospirosis at small spatial scales based on socio-demographics and infection status.

      Strengths:

      The authors assembled a rich dataset by collecting human GPS logger data, combined with field-recorded locations of open sewers, community streams, and rubbish piles, and testing individuals for leptospirosis via serology. This study was able to capture fine-scale exposure dynamics within an urban environment and shows differences by gender and seropositive status, using a method novel to epidemiology (step selection).

      [Editors' note: I have reviewed the authors' revised submission and confirm that they have adequately addressed the reviewers' comments for this manuscript.]

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      The study investigated how individuals living in urban slums in Salvador, Brazil, interact with environmental risk factors, particularly focusing on domestic rubbish piles, open sewers, and a central stream. The study makes use of the step selection functions using telemetry data, which is a method to estimate how likely individuals move towards these environmental features, differentiating among groups by gender, age, and leptospirosis serostatus. The results indicated that women tended to stay closer to the central stream while avoiding open sewers more than men. Furthermore, individuals who tested positive for leptospirosis tended to avoid open sewers, suggesting that behavioral patterns might influence exposure to risk factors for leptospirosis, hence ensuring more targeted interventions. 

      Strengths: 

      (1) The use of step selection functions to analyze human movement represents an innovative adaptation of a method typically used in animal ecology. This provides a robust quantitative framework for evaluating how people interact with environmental risk factors linked to infectious diseases (in this case, leptospirosis). 

      (2) Detailed differentiation by gender and serological status allows for nuanced insights, which can help tailor targeted interventions and potentially improve public health measures in urban slum settings. 

      (3) The integration of real-world telemetry data with epidemiological risk factors supports the development of predictive models that can be applied in future infectious disease research, helping to bridge the gap between environmental exposure and health outcomes. 

      Weaknesses: 

      (1) The sample size for the study was not calculated, although it was a nested cohort study. 

      We thank Reviewer #1 for highlighting this weakness. We will make sure that this is explained in the next version of the manuscript. At the time of recruiting participants, we found no literature on how to perform a sample size calculation for movement studies involving GPS loggers and associated methods of analysis. Therefore, we aimed to recruit as many individuals as possible within the resource constraints of the study.  

      “Participants who were already enrolled in the cohort study were recruited to take part in the movement analysis study. At the time of recruitment, we found no published scientific studies detailing how to perform sample size calculations for research using GPS data in humans. Therefore, we opted to use convenience sampling instead. A target of 30 people per study area, balanced by gender and blind to their serological status, was chosen for this study.” [Lines 163 - 169]

      (2) The step‐selection functions, though a novel method, may face challenges in fully capturing the complexity of human decision-making influenced by socio-cultural and economic factors that were not captured in the study. 

      We agree with Reviewer #1 that this model may fail to capture the full breadth of human decisionmaking when it comes to moving through local environments. We included a section discussing the aspect of violence and how this influences residents’ choices, along with some possibilities on how to record and account for this. Although it is outside of the scope of this study, we believe that coupling these quantitative methods with qualitative studies would provide a comprehensive understanding of movement in these areas.  

      (3) The study's context is limited to a specific urban slum in Salvador, Brazil, which may reduce the generalizability of its findings to other geographical areas or populations that experience different environmental or socio-economic conditions. 

      We thank the reviewer for highlighting this limitation. We have made this more clear in the discussion section: 

      “As a result, the findings are biased towards the more represented individuals, limiting their generalisability. Additionally, all participants are from specific areas in Salvador, which may further limit the generalisability to similar contexts.” [Lines 561 - 564]

      (4) The reliance on self-reported or telemetry-based movement data might include some inaccuracies or biases that could affect the precision of the selection coefficients obtained, potentially limiting the study's predictive power. 

      We agree that telemetry data has inherent inaccuracies, which we have tried to account for by using only those data points within the study areas. We would like to clarify that there is no self-reported movement data used in this study. All movement data was collected using GPS loggers.  

      (5) Some participants with less than 50 relocations within the study area were excluded without clear justification, see line 149. 

      We found that the SSF models would not run properly if there weren’t enough relocations. Therefore, we decided to remove these individuals from the analysis. They are also removed from any descriptive statistics presented. We have now clarified this in the manuscript.  

      “Individuals with less than 50 relocations within the study area were excluded from the analysis to ensure good model convergence. Details of these excluded individuals can be found in Supplementary Material I.” [Lines 183 – 186]

      (6) Some figures are not clear (see Figure 4 A & B). 

      We have improved the resolution of the image and believe it is more clear now. Please let us know if the resolution still is not clear enough.  

      (7) No statement on conflict of interest was included, considering sponsorship of the study. 

      The conflict of interest forms for each author were sent to eLife separately. I believe these should be made available upon publication, but please reach out if these need to be re-sent.  

      Reviewer #2 (Public review): 

      Summary: 

      Pablo Ruiz Cuenca et al. conducted a GPS logger study with 124 adult participants across four different slum areas in Salvador, Brazil, recording GPS locations every 35 seconds for 48 hours. The aim of their study was to investigate step-selection models, a technique widely used in movement ecology to quantify contact with environmental risk factors for exposure to leptospires (open sewers, community streams, and rubbish piles). The authors built two different types of models based on distance and based on buffer areas to model human environmental exposure to risk factors. They show differences in movement/contact with these risk factors based on gender and seropositivity status. This study shows the existence of modest differences in contact with environmental risk factors for leptospirosis at small spatial scales based on socio-demographics and infection status. 

      Strengths: 

      The authors assembled a rich dataset by collecting human GPS logger data, combined with fieldrecorded locations of open sewers, community streams, and rubbish piles, and testing individuals for leptospirosis via serology. This study was able to capture fine-scale exposure dynamics within an urban environment and shows differences by gender and seropositive status, using a method novel to epidemiology (step selection). 

      Weaknesses: 

      Due to environmental data being limited to the study area, exposure elsewhere could not be captured, despite previous research by Owers et al. showing that the extent of movement was associated with infection risk. Limitations of step selection for use in studying human participants in an urban environment would need to be explicitly discussed. 

      The environmental factors used in the study required research teams to visit the sites and map the locations. Given that individuals travelled throughout the city of Salvador, performing this task at a large scale would be unachievable. Therefore, we limited the data to only those points within the study area boundaries to avoid any biases from interactions with unrecorded environmental factors.  

      Reviewing Editor Comments: 

      The manuscript would benefit from clearer articulation of SSF assumptions, data exclusions, and buffer choices, as well as improvements in figure clarity, to strengthen its generalizability and impact. 

      Please see replies to Reviewer #2 below regarding the assumptions (2.3), data exclusions (2.1) and buffer choices (2.2). We have improved Figure 4 clarity, please let us know if this is not sufficient.  

      Reviewer #1 (Recommendations for the authors): 

      (1) Provide comprehensive details on telemetry data collection for improved data quality and reproducibility. 

      Details for this are included under the “Methods/GPS Data” section. We have included a sentence to explain that we used to GPS device manufacturer’s software to programme them. We believe this provides enough information on how to collect the data for reproducibility, but please let us know if there is further information that we could provide.  

      “Individuals who consented to take part in this study were asked to wear GPS loggers for continuous periods of up to 48 hours, which could be repeated. The GPS loggers used were i-got U GT-600, set to record their location every 35 seconds. We used the manufacturer’s software to programme the devices. Data were collected between March and November 2022.” [Lines 172 - 176]

      (2) Check all figures and improve on clarity (see Figure 4). 

      We have updated Figure 4 and believe the resolution is better now. Please let us know if this it not the case from the readers perspective.  

      (3) Revisit sentence structures to improve readability and reduce overly complex phrasing. 

      We have reviewed the manuscript and made some changes to improve readability. 

      Reviewer #2 (Recommendations for the authors): 

      I thank Ruiz Cuenca et al. for putting together this interesting manuscript on the use of step selection functions for understanding exposure to leptospires in urban Brazil. I thoroughly enjoyed reading it and have a few suggestions that may improve the manuscript. 

      I also apologise, but I was not able to find some of the supplementary materials, for instance, Supplementary Material I. That may have been my oversight. 

      To eLife: These should have been included with the submitted manuscript file. Please let me know if it has to be resubmitted to eLife.

      (1) Descriptive statistics 

      Some more descriptive statistics would be helpful. For instance, what was the leptospirosis infection status of the six individuals who were removed due to having <50 points inside the area? As part of the analysis relies on exposure, defined as GPS locations within a 20m buffer of open sewers, community streams, and rubbish piles, it would be good to have some descriptive statistics around this. How many visits to these different sites did people make, and how did these statistics vary by study area, age, gender, and leptospirosis infection status? 

      We thank Reviewer #2 for highlighting this. Thanks to their comment, we noticed a mistake in the code which excluded more individuals from the summary statistics table than were actually excluded from the full analysis. There were only 2 individuals that had less than 50 relocations across the whole day (5 am to 9 pm) which were excluded from further analysis. The mistake has been rectified and the summary statistics updated. (see table 1)

      We have included the demographic details of excluded participants as a table in the supplementary material, which we have referenced to in the manuscript. We have also explained that the exclusion is to aid model convergence, as we found that too few relocations would result in SSF models not working properly.  

      “Individuals with less than 50 relocations within the study area were excluded from the analysis to ensure good model convergence. Details of these excluded individuals can be found in Supplementary Material I.” [Lines 183 – 186]

      We have also now included a table (Table 2),  to show more descriptive statistics of how much time individuals spent within each of the environmental buffers. 

      (2) Definitions of buffers 

      I was surprised that the authors chose a 20m buffer for each factor but 10m around the household.Could this be more clearly justified, especially given that there will be location errors in both the GPS location point and the GPS logger points? These buffers do appear quite small, particularly in an urban environment where obstruction from buildings can be expected to yield substantial GPS errors. 

      The 20 meter buffer represents an intense interaction with the point of interest. This distance was decided after visiting the sites and seeing the points of interest in person. The 10 meter buffer accounts for the size of dwellings in these areas. We have included these explanations in the new manuscript:  

      “The buffer rasters, one for each factor, were created using a 20 meter buffer around each reference point. The size of this buffer was decided after visiting the study areas and represented an area within which it could be considered a strong interaction with the point of interest.” [Lines 198 – 202]

      “Buffer rasters were also created for each individual’s household location, with a 10 meter buffer around each location.This represented space within and immediately outside each house.  This buffer size accounted for the size of dwellings in these study areas.” [Lines 205 - 208]

      (3) Assumptions of the step selection function 

      Step selection functions (SSFs) rely on a number of assumptions. Whether these assumptions are met needs to be critically discussed within the article. (For a discussion of the assumptions, I am relying on points raised in this article: Integrated step selection analysis: bridging the gap between resource selection and animal movement (2015): Tal Avgar, Jonathan R. Potts, Mark A. Lewis, Mark S. Boyce, DOI: https://doi.org/10.1111/2041-210X.12528). 

      First, SSFs typically assume each step is independent, conditional only on the previous step (Markovian process). This is violated in circular movements, for instance. Circular movements are highly likely in human movement as people will leave and return to their homes during the day. While this is partially addressed by conducting separate analyses by time of day, circular journeys can still exist within these segments. 

      Second, SSFs do not account for goal-oriented behaviour like intentional destination-seeking. So, for instance, when someone executes a plan to visit a specific stream to fetch drinking water, such behaviour is poorly approximated using SSFs because SSFs compare observed steps to random alternatives drawn from a movement kernel, assuming movement is opportunistic rather than intentional. 

      This is true of SSF that do not include movement attributes. However, in our SSF we have included both step lengths and turning angles, which, according to Avgar et al, should be enough to account for this goal-oriented behaviour. It may be clearer to call the model an integrated step selection function (iSSF), as they do in Avgar et al., which we can change in the next version of the manuscript.  

      Third, turning angles in human movement are often sharp due to regular street layout, which can violate the assumptions of SSFs, which usually assume smooth, correlated movement. 

      As this paper proposes SSFs as a novel method to measure exposure to environmentally transmitted pathogens, a discussion on the extent to which assumptions of SSFs are valid for this purpose should be included in the paper. 

      We thank Reviewer #2 for highlighting these points. We have included a section discussing these assumptions in detail: 

      “Additionally, these models have some underlying assumptions that may be violated in this study. Step-selection functions assume each step is independent, conditioned on the previous step. This can be violated by circular journeys. Although we attempted to account for these by analysing specific periods of the day, a higher temporal resolution of analysis may be needed if circular journeys are still present within each period. Another assumption is that movement is smooth through the environment. In urban environments this may not hold true, as street layouts may force sharp corners in movements. The effect of violating this assumption is not immediately clear and requires further methodological research to understand its significance. Finally, we assumed that by including movement characteristics (step lengths and turning angles) into our models, we were accounting for goal-oriented behaviour. These assumptions need to be considered in future studies that attempt to use step-selection functions to analyse human mobility.” [Lines 593 - 607]

      (4) Abstract 

      While it is highlighted in the abstract that this "study introduces a novel method for analysing human telemetry data in infectious disease research, providing critical insights for targeted interventions", I did not see any discussion about how the findings can inform interventions. 

      We thank Reviewer #2 for highlighting this. We have now removed this wording from the abstract to avoid misunderstanding.  

      (5) Effect sizes 

      It would have helped me if there had been some discussion around the size of these effects. Especially for the distance-based models, the effects seem very small. Maybe this is a misinterpretation on my part, but it would help to contextualise if the observed effect were small or large. 

      We agree with Reviewer #2 on this point and have now included a paragraph explaining that these effect sizes are indeed very small. We believe that this may be linked to the spatial scale of the rasters used (1 meter), as the selection coefficients represent changes with regards to increasing distances of 1 meter. This may not be that significant for human mobility. However, given the focus on analysing fine scale movement, we decided to keep the spatial scale of the rasters as small as possible. 

      “It is important to highlight that the effect sizes of the selection coefficients for the distance based rasters are very small and could be considered negligible. This may be linked to the spatial scale used, as these values represent increases of 1 meter. A coarser scale may have produced larger effect sizes that may have been easier to conceptualise. However, given the focus on fine-scale movement, we decided to keep this spatial scale for the analysis.” [Lines 421 - 427]

    1. eLife Assessment

      This valuable study presents a theoretical model of how punctuated mutations influence multistep adaptation, supported by empirical evidence from some TCGA cancer cohorts. This solid model is noteworthy for cancer researchers as it points to the case for possible punctuated evolution rather than gradual genomic change. However, the parametrization and systematic evaluation of the theoretical framework in the context of tumor evolution remain incomplete, and alternative explanations for the empirical observations are still plausible.

    2. Reviewer #1 (Public review):

      Summary:

      Grasper et al. present a combined analysis of the role of temporal mutagenesis in cancer, which includes both theoretical investigation and empirical analysis of point mutations in TCGA cancer patient cohorts. They find that temporally elevated mutation rates contribute to cancer fitness by allowing fast adaptation when the fitness drops (due to previous deleterious mutations). This may be relevant in the case of tumor suppressor genes (TSG), which follow the 2-hit hypothesis (i.e., biallelic 2 mutations are necessary to deactivate TS), and in cases where temporal mutagenesis occurs (e.g., high APOBEC, ROS). They provide evidence that this scenario is likely to occur in patients with some cancer types. This is an interesting and potentially important result that merits the attention of the target audience. Nonetheless, I have some questions (detailed below) regarding the design of the study, the tools and parametrization of the theoretical analysis, and the empirical analysis, which I think, if addressed, would make the paper more solid and the conclusion more substantiated.

      Strengths:

      Combined theoretical investigation with empirical analysis of cancer patients.

      Weaknesses:

      Parametrization and systematic investigation of theoretical tools and their relevance to tumor evolution.

    3. Reviewer #2 (Public review):

      This work presents theoretical results concerning the effect of punctuated mutation on multistep adaptation and empirical evidence for that effect in cancer. The empirical results seem to agree with the theoretical predictions. However, it is not clear how strong the effect should be on theoretical grounds, and there are other plausible explanations for the empirical observations.

      For various reasons, the effect of punctuated mutation may be weaker than suggested by the theoretical and empirical analyses:

      (1) The effect of punctuated mutation is much stronger when the first mutation of a two-step adaptation is deleterious (Figure 2). For double inactivation of a TSG, the first mutation--inactivation of one copy--would be expected to be neutral or slightly advantageous. The simulations depicted in Figure 4, which are supposed to demonstrate the expected effect for TSGs, assume that the first mutation is quite deleterious. This assumption seems inappropriate for TSGs, and perhaps the other synergistic pairs considered, and exaggerates the expected effects.

      (2) More generally, parameter values affect the magnitude of the effect. The authors note, for example, that the relative effect decreases with mutation rate. They suggest that the absolute effect, which increases, is more important, but the relative effect seems more relevant and is what is assessed empirically.

      (3) Routes to inactivation of both copies of a TSG that are not accelerated by punctuation will dilute any effects of punctuation. An example is a single somatic mutation followed by loss of heterozygosity. Such mechanisms are not included in the theoretical analysis nor assessed empirically. If, for example, 90% of double inactivations were the result of such mechanisms with a constant mutation rate, a factor of two effect of punctuated mutagenesis would increase the overall rate by only 10%. Consideration of the rate of apparent inactivation of just one TSG copy and of deletion of both copies would shed some light on the importance of this consideration.

      Several factors besides the effects of punctuated mutation might explain or contribute to the empirical observations:

      (1) High APOBEC3 activity can select for inactivation of TSGs (references in Butler and Banday 2023, PMID 36978147). This selective force is another plausible explanation for the empirical observations.

      (2) Without punctuation, the rate of multistep adaptation is expected to rise more than linearly with mutation rate. Thus, if APOBEC signatures are correlated with a high mutation rate due to the action of APOBEC, this alone could explain the correlation with TSG inactivation.

      (3) The nature of mutations caused by APOBEC might explain the results. Notably, one of the two APOBEC mutation signatures, SBS13, is particularly likely to produce nonsense mutations. The authors count both nonsense and missense mutations, but nonsense mutations are more likely to inactivate the gene, and hence to be selected.

    1. eLife Assessment

      This important work fills a gap in the characterization of motor architecture and chemical coupling of the male reproductive system, crucial to understanding male reproduction and fertility. The convincing analysis reveals two distinct types of glutamatergic neurons that co-release either serotonin or octopamine. While serotonergic neurons are required for male fertility, octopaminergic neurons are dispensable, indicating a division of labour. This work lays the foundations for future investigations into the conserved key principles by which multi-transmitter systems control coordinated motor outputs.

    2. Reviewer #1 (Public review):

      Summary:

      This very thorough anatomical study addresses the innervation of the Drosophila male reproductive tract. Two distinct glutamatergic neuron types were classified: serotonergic (SGNs) and octopaminergic (OGNs). By expansion microscopy, it was established that glutamate and serotonin /octopamine are co-released. The expression of different receptors for 5-HT and OA in muscles and epithelial cells of the innervation target organs was characterized. The pattern of neurotransmitter receptor expression in the target organs suggests that seminal fluid and sperm transport and emission are subjected to complex regulation. While silencing of abdominal SGNs leads to male infertility and prevents sperm from entering the ejaculatory duct, silencing of OGNs does not render males infertile.

      Strengths:

      The studied neurons were analysed with different transgenes and methods, as well as antibodies against neurotransmitter synthesis enzymes, building a consistent picture of their neurotransmitter identity. The careful anatomical description of innervation patterns together with receptor expression patterns of the target organs provides a solid basis for advancing the understanding of how seminal fluid and sperm transport and emission are subjected to complex regulation. The functional data showing that SGNs are required for male fertility and for the release of sperm from the seminal vesicle into the ejaculatory duct is convincing.

      Weaknesses:

      The functional analysis of the characterized neurons is not as comprehensive as the anatomical description, and phenotypic characterization was limited to simple fertility assays. It is understandable that a full functional dissection is beyond the scope of the present work. The paper contains experiments showing neuron-independent peristaltic waves in the reproductive tract muscles, which are thematically not very well integrated into the paper. Although very interesting, one wonders if these experiments would not fit better into a future work that also explores these peristaltic waves and their interrelation with neuromodulation mechanistically.

    3. Reviewer #2 (Public review):

      Summary:

      Cheverra et al. present a comprehensive anatomical and functional analysis of the motor neurons innervating the male reproductive tract in Drosophila melanogaster, addressing a gap in our understanding of the peripheral circuits underlying ejaculation and male fertility. They identify two classes of multi-transmitter motor neurons-OGNs (octopamine/glutamate) and SGNs (serotonin/glutamate)-with distinct innervation patterns across reproductive organs. The authors further characterize the differential expression of glutamate, octopamine, and serotonin receptors in both epithelial and muscular tissues of these organs. Behavioral assays reveal that SGNs are essential for male fertility, whereas OGNs and glutamatergic transmission are dispensable. This work provides a high-resolution map linking neuromodulatory identity to organ-specific motor control, offering a valuable framework to explore the neural basis of male reproductive function.

      Strengths:

      Through the use of an extensive set of GAL4 drivers and antibodies, this work successfully and precisely defines the neurons that innervate the male reproductive tract, identifying the specific organs they target and the nature of the neurotransmitters they release. It also characterizes the expression patterns and localization of the corresponding neurotransmitter receptors across different tissues. The authors describe two distinct groups of dual-identity neurons innervating the male reproductive tract: OGNs, which co-express octopamine and glutamate, and SGNs, which co-express serotonin and glutamate. They further demonstrate that the various organs within the male reproductive system differentially express receptors for these neurotransmitters. Based on these findings, the authors propose that a single neuron capable of co-releasing a fast-acting neurotransmitter alongside a slower-acting one may more effectively synchronize and stagger events that require precise timing. This, together with the differential expression of ionotropic glutamate receptors and metabotropic aminergic receptors in postsynaptic muscle tissue, adds an additional layer of complexity to the coordinated regulation of fluid secretion, organ contractility, and directional sperm movement-all contributing to the optimization of male fertility.

      Weaknesses:

      The main weakness of the manuscript is the lack of detail in the presentation of the results. Specifically, all microscopy image figures are missing information about the number of samples (N), and in the case of colocalization experiments, quantitative analyses are not provided. Additionally, in the first behavioral section, it would be beneficial to complement the data table with figures similar to those presented later in the manuscript for consistency and clarity.

      Wider context:

      This study delivers the first detailed anatomical map connecting multi-transmitter motor neurons with specific male reproductive structures. It highlights a previously unrecognized functional specialization between serotonergic and octopaminergic pathways and lays the groundwork for exploring fundamental neural mechanisms that regulate ejaculation and fertility in males. The principles uncovered here may help explain how males of Drosophila and other organisms adjust reproductive behaviors in response to environmental changes. Furthermore, by shedding light on how multi-transmitter systems operate in reproductive control, this model could provide insights into therapeutic targets for conditions such as male infertility and prostate cancer, where similar neuronal populations are involved in humans. Ultimately, this genetically accessible system serves as a powerful tool for uncovering how multi-transmitter neurons orchestrate coordinated physiological actions necessary for the functioning of complex organs.

    4. Reviewer #3 (Public review):

      Summary:

      This work provides an overview of the motor neuron landscape in the male reproductive system. Some work had been done to elucidate the circuits of ejaculation in the spine, as well as the cord, but this work fills a gap in knowledge at the level of the reproductive organs. Using complementary approaches, the authors show that there are two types of motor neurons that are mutually exclusive: neurons that co-express octopamine and glutamate and neurons that co-express serotonin and glutamate. They also show evidence that both types of neurons express large dense core vesicles, indicating that neuropeptides play a role in male fertility. This paper provides a thorough characterization of the expression of the different glutamate, octopamine, and serotonin receptors in the different organs and tissues of the male reproductive system. The differential expression in different tissues and organs allows building initial theories on the control of emission and expulsion. Additionally, the authors characterize the expression of synaptic proteins and the neuromuscular junction sites. On a mechanistic level, the authors show that neither octopamine/glutamate neuron transmission nor glutamate transmission in serotonin/glutamate neurons is required for male fertility. This final result is quite surprising and opens up many questions on how ejaculation is coordinated.

      Strengths:

      This work fills an important gap in the characterization of innervation of the male reproductive system by providing an extensive characterization of the motor neurons and the potential receptors of motor neuron release. The authors show convincing evidence of glutamate/monoamine co-release and of mutual exclusivity of serotonin/glutamate and octopamine/glutamate neurons.

      Weaknesses:

      (1) Often, it is mentioned that the expression is higher or lower or regional without quantification or an indication of the number of samples analysed.

      (2) The experiment aimed at tracking sperm in the male reproductive system is difficult to interpret when it is not assessed whether ejaculation has occurred.

      (3) The experiment looking at peristaltic waves in the male organs is missing labeling of the different regions and quantification of the observed waves.

    1. eLife Assessment

      This useful study uses creative scalp EEG decoding methods to attempt to demonstrate that two forms of learned associations in a Stroop task are dissociable, despite sharing similar temporal dynamics. However, the evidence supporting the conclusions is incomplete due to concerns with the experimental design and methodology. This paper would be of interest to researchers studying cognitive control and adaptive behavior, if the concerns raised in the reviews can be addressed satisfactorily.

    2. Reviewer #1 (Public review):

      Summary:

      This study focuses on characterizing the EEG correlates of item-specific proportion congruency effects. In particular, two types of learned associations are characterized. One being associations between stimulus features and control states (SC), and the other being stimulus features and responses (SR). Decoding methods are used to identify SC and SR correlates and to determine whether they have similar topographies and dynamics.

      The results suggest SC and SR associations are simultaneously coactivated and have shared topographies, with the inference being that these associations may share a common generator.

      Strengths:

      Fearless, creative use of EEG decoding to test tricky hypotheses regarding latent associations.

      Nice idea to orthogonalize the ISPC condition (MC/MI) from stimulus features.

      Weaknesses:

      (1) I'm relatively concerned that these results may be spurious. I hope to be proven wrong, but I would suggest taking another look at a few things.

      While a nice idea in principle, the ISPC manipulation seems to be quite confounded with the trial number. E.g., color-red is MI only during phase 2, and is MC primarily only during Phase 3 (since phase 1 is so sparsely represented). In my experience, EEG noise is highly structured across a session and easily exploited by decoders. Plus, behavior seems quite different between Phase 2 and Phase 3. So, it seems likely that the classes you are asking the decoder to separate are highly confounded with temporally structured noise.

      I suggest thinking of how to handle this concern in a rigorous way. A compelling way to address this would be to perform "cross-phase" decoding, however I am not sure if that is possible given the design.

      The time courses also seem concerning. What are we to make of the SR and SC timecourses, which have aggregate decoding dynamics that look to be <1Hz?

      Some sanity checks would be one place to start. Time courses were baselined, but this is often not necessary with decoding; it can cause bias (10.1016/j.jneumeth.2021.109080), and can mask deeper issues. What do things look like when not baselined? Can variables be decoded when they should not be decoded? What does cross-temporal decoding look like - everything stable across all times, etc.?

      (2) The nature of the shared features between SR and SC subspaces is unclear.

      The simulation is framed in terms of the amount of overlap, revealing the number of shared dimensions between subspaces. In reality, it seems like it's closer to 'proportion of volume shared', i.e., a small number of dominant dimensions could drive a large degree of alignment between subspaces.

      What features drive the similarity? What features drive the distinctions between SR and SC? Aside from the temporal confounds I mentioned above, is it possible that some low-dimensional feature, like EEG congruency effect (e.g., low-D ERPs associated with conflict), or RT dynamics, drives discriminability among these classes? It seems plausible to me - all one would need is non-homogeneity in the size of the congruency effect across different items (subject-level idiosyncracies could contribute: 10.1016/j.neuroimage.2013.03.039).

      (3) The time-resolved within-trial correlation of RSA betas is a cool idea, but I am concerned it is biased. Estimating correlations among different coefficients from the same GLM design matrix is, in general, biased, i.e., when the regressors are non-orthogonal. This bias comes from the expected covariance of the betas and is discussed in detail here (10.1371/journal.pcbi.1006299). In short, correlations could be inflated due to a combination of the design matrix and the structure of the noise. The most established solution, to cross-validate across different GLM estimations, is unfortunately not available here. I would suggest that the authors think of ways to handle this issue.

      (4) Are results robust to running response-locked analyses? Especially the EEG-behavior correlation. Could this be driven by different RTs across trials & trial-types? I.e., at 400 ms post-stim onset, some trials would be near or at RT/action execution, while others may not be nearly as close, and so EEG features would differ & "predict" RT.

      (5) I suggest providing more explanation about the logic of the subspace decoding method - what trialtypes exactly constitute the different classes, why we would expect this method to capture something useful regarding ISPC, & what this something might be. I felt that the first paragraph of the results breezes by a lot of important logic.

      In general, this paper does not seem to be written for readers who are unfamiliar with this particular topic area. If authors think this is undesirable, I would suggest altering the text.

    3. Reviewer #2 (Public review):

      Summary:

      In this EEG study, Huang et al. investigated the relative contribution of two accounts to the process of conflict control, namely the stimulus-control association (SC), which refers to the phenomenon that the ratio of congruent vs. incongruent trials affects the overall control demands, and the stimulus-response association (SR), stating that the frequency of stimulus-response pairings can also impact the level of control. The authors extended the Stroop task with novel manipulation of item congruencies across blocks in order to test whether both types of information are encoded and related to behaviour. Using decoding and RSA, they showed that the SC and SR representations were concurrently present in voltage signals, and they also positively co-varied. In addition, the variability in both of their strengths was predictive of reaction time. In general, the experiment has a solid design, but there are some confounding factors in the analyses that should be addressed to provide strong support for the conclusions.

      Strengths:

      (1) The authors used an interesting task design that extended the classic Stroop paradigm and is potentially effective in teasing apart the relative contribution of the two different accounts regarding item-specific proportion congruency effect, provided that some confounds are addressed.

      (2) Linking the strength of RSA scores with behavioural measures is critical to demonstrating the functional significance of the task representations in question.

      Weakness:

      (1) While the use of RSA to model the decoding strength vector is a fitting choice, looking at the RDMs in Figure 7, it seems that SC, SR, ISPC, and Identity matrices are all somewhat correlated. I wouldn't be surprised if some correlations would be quite high if they were reported. Total orthogonality is, of course, impossible depending on the hypothesis, but from experience, having highly covaried predictors in a regression can lead to unexpected results, such as artificially boosting the significance of one predictor in one direction, and the other one to the opposite direction. Perhaps some efforts to address how stable the timed-resolved RSA correlations for SC and SR are with and without the other highly correlated predictors will be valuable to raising confidence in the findings.

      (2) In "task overview", SR is defined as the word-response pair; however, in the Methods, lines 495-496, the definition changed to "the pairing between word and ISPC" which is in accordance with the values in the RDMs (e.g., mccbb and mcirb have similarity of 1, but they are linked to different responses, so should they not be considered different in terms of SR?). This needs clarification as they have very different implications for the task design and interpretation of results, e.g., how correlated the SC and SR manipulations were.

    1. eLife Assessment

      This important study used five metrics to compare the cost-effectiveness of intramural and extramural research funded by the National Institutes of Health in the United States between 2009 and 2019. They found that each type of research had its own set of strengths: extramural research was more cost-effective in terms of publications, whereas intramural research was more cost-effective in terms of influencing clinical work. The evidence supporting these findings is mostly solid, but there are a number of questions about the methods and data - notably about indirect cost recovery and other non-NIH sources of funding - that need to be answered.

    2. Reviewer #1 (Public review):

      Summary:<br /> This article carefully compares intramural vs. extramural National Institutes of Health funded research during 2009-2019, according to a variety of bibliometric indices. They find that extramural awards more cost-effectively fund outputs commonly used for academic review such as number of publications and citations per dollar, while intramural awards are more cost-effective at generating work that influences future clinical work, more closely in line with agency health goals.

      Strengths:<br /> Great care was taken in selecting and cleaning the data, and in making sure that intramural vs. extramural projects were compared appropriately. The data has statistical validation. The trends are clear and convincing.

      Weaknesses:<br /> The Discussion is too short and descriptive, and needs more perspective - why are the findings important and what do they mean? Without recommending policy, at least these should discuss possible implications for policy.

      The biggest problem I have with this submission is Figure 3, which shows a big decrease in clinical-related parameters between 2014 and 2019 in both intramural and extramural research (panels C, D and E). There is no obvious explanation for this and I did not see any discussion of this trend, but it cries out for investigation. This might, for example, reflect global changes in funding policies which might also influence the observed closing gaps between intramural and extramural research.

    3. Reviewer #2 (Public review):

      Summary:<br /> This article reports a cost-effectiveness comparison of intramural and extramural that NIH funded between 2009 and 2019. Using data obtained from NIH RePORTER, they linked total project costs to publication output, using robust validated metrics including Relative Citation Ratio (RCR), Approximate Potential to Translate (APT), and clinical citations. They find that after adjusting for confounders in regression and propensity-score analyses, extramural projects were generally more cost-effective, though intramural projects were more cost effective for generating clinical citations. They also describe differences in the topics of intramural- and extramural-funded publications, with intramural projects more likely to generate papers on viral infections and immunity or cancer metastases and survival, but less likely to generate papers on pregnancy and maternal health, brain connectivity and tasks, and adolescent experiences and depression. The authors aptly describe the different natures of the intramural and extramural funding models, including that extramural researchers spend much time writing grant applications and that the work described in extramural publications often receives funding from sources other than NIH grants.

      Strengths:<br /> The authors leveraged publicly available data (including RePORTER and the iCite repository) and used robust validated metrics (RCR, APT, clinical citations). They carefully considered a large number of confounders, including those related to the PI, and performed several well-described regression analyses.

      Weaknesses:<br /> Figure 3A shows intramural projects producing about 2.75 papers per year in 2009, whereas extramural projects are producing just over 1 paper per year. Extramural projects appear to catch up over the next five years. While the authors attempt to explain the difference in their figure legend, another explanation is that the intramural projects started well before 2009 but, as the authors state, intramural data only became available in 2009.

      As the authors note, funding information is often complex and difficult to characterize for an analysis like this. How did the authors handle: i) publications linked to multiple extramural grants; ii) publications linked to intramural and extramural grants; iii) publications linked NIH grants and non-NIH grants?<br /> I would think it necessary to somehow apportion credit, as otherwise it would appear that extramural projects are more productive than they truly are.

      Also, it is not clear if the authors took account of the indirect costs paid by the NIH to universities that have received extramural grants.

    4. Reviewer #3 (Public review):

      Summary:<br /> The manuscript "Comparing the outputs of intramural and extramural grants funded by National Institutes of Health" demonstrates a comparative study on two funding mechanisms adopted by the National Institutes of Health (NIH). The authors adopted a quantitative approach and introduced five metrics to compare the output of intramural and extramural grants. These findings reveal the impacts of intramural and extramural grants on the scientific community, providing funders with insights into the future decisions of funding mechanisms they should take.

      Strengths:<br /> The authors clearly presented their methods for processing the NIH project data and classifying projects into either intramural or extramural categories. The limitations of the study are also well-addressed.

      Weaknesses:<br /> The article would benefit from a more thorough discussion of the literature, a clearer presentation of the results (especially in the figure captions), and the inclusion of evidence to support some of the claims.

    1. eLife Assessment

      This paper presents important new findings about the impact of the TAK-003 vaccine against dengue based on a convincing reanalysis of trial data. The results corroborate those of the original trial analyses, but with reduced uncertainty about the estimates of the impact of the vaccine. The findings will be of interest to clinicians, infectious disease epidemiologists, trial statisticians and policymakers seeking to understand the vaccine's efficacy profile and associated uncertainties.

    2. Reviewer #1 (Public review):

      Summary:

      The authors reduce uncertainties in TAK-003 vaccine efficacy estimates by applying a multi-level model to all published Phase III clinical trial case data and sharing parameters across strata consistent with the data generation process. In line with our current understanding of the vaccine, they show that its efficacy depends on the serostatus and infecting serotype.

      Strengths:

      The methodology is well-described and technically sound, with clear explanations of how the authors reduce uncertainty through the model structure. The comparison of model estimates with and without independence parameter assumptions is particularly valuable. The data come from the Phase III RCT conducted over 4.5 years in 8 countries, and the study is the first to model efficacy using available country-specific data. The analysis is timely and addresses important public health questions regarding TAK-003 efficacy.

      Weaknesses:

      It is unclear whether the simulation study used to validate the model sampled from the priors (as stated in the methods) or the posterior distributions. Supplementary figures 19-28 show that sampled parameters often derive from narrower distributions than the priors, with sampled areas varying by subgroup. Sampling from posterior distributions makes the validation somewhat circular. As many parameters are estimated stratified by multiple subgroups, identifiability issues may arise. Model variations with fewer parameter dependencies could impact the resulting estimates.

      Assessment of aims and conclusions:

      The authors achieve their aims of reducing uncertainty in efficacy estimates and show that efficacy varies by serostatus and serotype. The conclusions are well-justified, although they could be strengthened by clarifying the model validation, as discussed above.

      Impact and utility:

      This work contributes valuable evidence demonstrating TAK-003's serostatus and serotype-specific efficacy and highlights remaining uncertainties in the protection or risk against DENV3/4 in seronegative individuals. The methods are well-described and would be useful to other modellers, and could be applied to additional dengue vaccines like the Butantan-DV vaccine currently under development.

      Additional context:

      Several factors may influence the estimates but cannot be addressed using public data, including the role of subclinical infections, flavivirus cross-immunity, and the imperfect use of hospitalisation as a proxy for severe disease.

    3. Reviewer #2 (Public review):

      Summary:

      In this paper, the authors used a multi-level modelling approach to reanalyse trial data from Takeda's Phase III randomised control trial investigating the efficacy of the TAK-003 vaccine against dengue. The aim of the paper is to refine uncertainty by incorporating all the available data into the model and pooling across stratifications that are correlated. A major challenge in constructing a likelihood that allows for data available at differing levels of aggregation by group and outcome, and at different time intervals. This is done by first supposing that the data is available without aggregation for all groups, outcomes and time points, and then marginalising over the aggregated levels. The model is validated using simulations and then applied to trial data from Takeda. Results appear to corroborate those of Takeda with reduced uncertainty in the estimates.

      Strengths:

      The main strength of the paper is the multi-level modelling approach. It is a particularly natural one for this setting. One reason for this, as discussed in the paper, is that correlations across stratifications can arise when there are similarities in their underlying causal structure. It is more realistic to model this nested data structure hierarchically. Another reason, also well discussed in the paper, is the reduction in uncertainty you get when you pool estimates across related groups. Multi-level modelling is also beneficial when group sizes are different. For example, there were too few cases of DENV-4 from seronegatives, which resulted in hospitalised disease for the original analysis to produce estimates, but by using multi-level modelling, this paper can produce estimates. The modelling framework developed in this paper will be simple to extend to further trial data collected in the future.

      Another strength is that it is reanalysing existing trial data, which is both cost-effective and beneficial for scientific reproducibility. This approach also helps to assess the robustness of conclusions about the efficacy of the TAK-003 vaccine to use of different analytical methods.

      The paper is well-written. The tables and figures presented in this paper are particularly informative. Protection conferred by the vaccine varies depending upon which variant a person is exposed to, their serostatus, and time since vaccination. The analysis presented supports the discussed conclusions. Comparisons between the results of this paper and the results of the original trial analysis are also shown and demonstrate a reduction in the uncertainty of parameter estimates, as desired.

      Weaknesses:

      The weakness of the paper is that it reports per-exposure protection instead of vaccine efficacy. This is methodologically sound, but it does limit the comparability of this study with the original trial analyses, which reported vaccine efficacy. It is therefore unclear whether the reduction in uncertainty observed is due solely to the multi-level modelling approach or whether it may be due in part to the parameters of interest being slightly different.

    4. Reviewer #3 (Public review):

      Summary:

      The authors provide estimates of the efficacy of the dengue vaccine, which is notoriously complex given the different serotypes and complex immunity. Through their method using publicly available data, the estimates have less uncertainty and are of use to the field in understanding the future possible impact of the vaccine.

      Strengths:

      This is an elegant analysis addressing an important question. The pooling of common factors for estimation is nice and adds strength to the analysis. It is an important analysis for the field and our understanding of the vaccine, and for the analysis of future multi-site trials for the dengue vaccine.

      Weaknesses:

      It would be useful to have more understanding of how the way the vaccine efficacy is defined here is related to the previous estimates and a greater understanding of how the estimated impact changes over time.

    1. eLife Assessment

      This study makes a valuable contribution by separating two timescales of adaptation: rapid, within block reductions in learning rate, and slower, location specific, meta-learned adjustments. Behavioural data and computational modeling converge to support both processes. The evidence is solid with neuroimaging results suggesting that meta-learned learning rates are encoded in the orbitofrontal cortex, while prediction errors are represented in a distributed network including the ventral striatum and are modulated by expected error magnitude, though the specificity of these effects requires further contextualization. The manuscript is timely and clearly written; its main limitation is the weak linkage between neural signals and behavior, leaving uncertainty over whether the reported signals play a mechanistic role in learning.

    2. Reviewer #1 (Public review):

      Summary:

      Simoens and colleagues use a continuous estimation task to disentangle learning rate adjustments on shorter and longer timescales. They show that participants rapidly decrease learning rates within a block of trials in a given "location", but that they also adjust learning rates for the very first trial based on information accrued gradually about the statistics of each location, which can be viewed as a form of metalearning. The authors show that the metalearned learning rates are represented in patterns of neural activity in the orbitofrontal cortex, and that prediction errors are represented in a constellation of brain regions, including the ventral striatum, where they are modulated by expectations about error magnitude to some degree. Overall, the work is interesting, timely, and well communicated. My primary concern with the work was that the link between the brain signals and their role in the behavior of interest was not well explored, raising some questions about the degree to which signals are really involved in the learning process, versus playing some downstream role.

      Strengths:

      The authors build on an interesting task design, allowing them to distinguish moment-to-moment adjustments in learning rate from slower adjustments in learning rate corresponding to slowly-gained knowledge about the statistics of specific "locations". Behavior and computational modeling clearly demonstrate that individuals adjust to environmental statistics in a sort of metalearning. fMRI data reveal representations of interest, including those related to adjusted learning rates and their impact on the degree of prediction error encoding in the striatum.

      Weaknesses:

      It was nice to see that the authors could distinguish differences between the OFC signals that they observed and those in the visual regions based on changes through the session. However, the linkage between these brain activations and a functional role in generating behavior was left unexplored. Without further exploration, it is hard to tell exactly what role the signals might be playing, if any, in the behavior of interest.

    3. Reviewer #2 (Public review):

      Summary:

      Across two experiments, this work presents a novel spatial predictive inference paradigm that facilitates the investigation of meta-learning across multiple environments with distinct statistics, as well as more local learning from sequences of observations within an environment. The authors present behavioral data indicating that people can indeed learn to distinguish between noise levels and calibrate their learning rates accordingly across environments, even on initial trials when revisiting an environment. They complement their behavioral results with computational modeling, further bolstering claims of both local and global adaptation. Additional fMRI results support the role of OFC in this meta-learning process, with central OFC activity reflecting similarity between environments. This similarity emerges over time with task experience. Holistically, this paradigm and these data add to our understanding of how humans dynamically adapt their behavior on different timescales.

      Strengths:

      The novel paradigm represents a clever and creative expansion of spatial predictive inference tasks. The cover story was well chosen to facilitate an intuitive understanding of both the differences between environments and the estimation of the mean within environments.

      Additionally, the authors present complementary results from two experiments, which strengthen the behavioral findings. This is especially effective as the initial experiment's results were a bit noisy, and the modifications within the second experiment increased both power and the specificity/accuracy of participant predictions. Taken together, the behavioral results provide convincing evidence that participants did distinguish environments based on their underlying statistics and adapted their initial behavior accordingly.

      Beyond this, the combination of behavioral results, computational modeling, and neuroimaging enhances the impact of the work. It paints a fuller picture of whether and how humans meta-learn the global statistics of environments, and this is an important direction for the field of adaptive learning.

      Weaknesses:

      (1) The authors make the distinction between meta-learned "global" learning rates and within-environment learning rate adaptation in response to "local" fluctuations/observations. Though the experimental paradigm is novel, there are certainly links to prior work - for instance, though change point structures don't entail revisiting unique environments, they do require meta-learning from environmental statistics that is distinct from transient local adaptation to prediction errors. This tendency to increase one's learning rate after large prediction errors is appropriate in change point environments, though, as is true in this study, the amount of increase should be dependent on. This represents a similar kind of slower-timescale learning or reuse of more "global" parameters, and can be seen to different extents in prior work. It might benefit readers if the authors were to link the current work to previous research more explicitly to draw clearer connections between the approaches and findings.

      (2) Throughout much of the paper, the authors refer to the distinctions between environments primarily as differences in "initial learning rates" or "environment-specific learning rates." This is particularly prominent when discussing fMRI results. Though the optimal initial learning rate did differ across environments, this was the result of differences in underlying task statistics. It will be important to clarify this throughout the text, because of the confounds between task statistics and initial learning rate (and to some extent, the position on the screen), it is not possible to separate the impact of these specific variables. This is also relevant to understanding the justification for using methods like RSA to test whether brain regions represent task states similarly. If the main hypothesis is that neural activity reflects the (initial) learning rate itself, then a univariate analysis approach would seem more natural.

      (3) For the neuroimaging results in particular, the specificity of some of the results (e.g. ventral striatum showing an effect of prediction error only in the low noise condition in the second half of task experience, only on the first trial) is a bit surprising. Additional justification of or context for these results would be useful to help readers gauge how expected or surprising these findings are.

      (4) There are some methodological details that are unclear (e.g., how were the positions of the crabs selected relative to the location they emerged from? Looking at Figure 1C, it looks like the crabs spread out unevenly, and that the single position they emerge from is not necessarily at the center of the crab locations.) Additional detail and clarity would help address some unanswered questions (more details below).

    1. eLife Assessment

      The authors make an important advance in enzyme annotation by fusing biochemical knowledge with language‑model-based learning to predict catalytic residues from sequence alone. Squidly, a new ML method, outperforms existing tools on standard benchmarks and on the CataloDB dataset. The work has solid support, yet clarifications on dataset biases, ablation analyses, and uncertainty filtering would strengthen its efficiency claims.

    2. Reviewer #1 (Public review):

      In this well-written and timely manuscript, Rieger et al. introduce Squidly, a new deep learning framework for catalytic residue prediction. The novelty of the work lies in the aspect of integrating per-residue embeddings from large protein language models (ESM2) with a biology-informed contrastive learning scheme that leverages enzyme class information to rationally mine hard positive/negative pairs. Importantly, the method avoids reliance on the use of predicted 3D structures, enabling scalability, speed, and broad applicability. The authors show that Squidly outperforms existing ML-based tools and even BLAST in certain settings, while an ensemble with BLAST achieves state-of-the-art performance across multiple benchmarks. Additionally, the introduction of the CataloDB benchmark, designed to test generalization at low sequence and structural identity, represents another important contribution of this work.

      I have only some minor comments:

      (1) The manuscript acknowledges biases in EC class representation, particularly the enrichment for hydrolases. While CataloDB addresses some of these issues, the strong imbalance across enzyme classes may still limit conclusions about generalization. Could the authors provide per-class performance metrics, especially for underrepresented EC classes?

      (2) An ablation analysis would be valuable to demonstrate how specific design choices in the algorithm contribute to capturing catalytic residue patterns in enzymes.

      (3) The statement that users can optionally use uncertainty to filter predictions is promising but underdeveloped. How should predictive entropy values be interpreted in practice? Is there an empirical threshold that separates high- from low-confidence predictions? A demonstration of how uncertainty filtering shifts the trade-off between false positives and false negatives would clarify the practical utility of this feature.

      (4) The excerpt highlights computational efficiency, reporting substantial runtime improvements (e.g., 108 s vs. 5757 s). However, the comparison lacks details on dataset size, hardware/software environment, and reproducibility conditions. Without these details, the speedup claim is difficult to evaluate. Furthermore, it remains unclear whether the reported efficiency gains come at the expense of predictive performance.

      (5) Given the well-known biases in public enzyme databases, the dataset is likely enriched for model organisms (e.g., E. coli, yeast, human enzymes) and underrepresents enzymes from archaea, extremophiles, and diverse microbial taxa. Would this limit conclusions about Squidly's generalisability to less-studied lineages?

    3. Reviewer #2 (Public review):

      Summary:

      The authors aim to develop Squidly, a sequence-only catalytic residue prediction method. By combining protein language model (ESM2) embedding with a biologically inspired contrastive learning pairing strategy, they achieve efficient and scalable predictions without relying on three-dimensional structure. Overall, the authors largely achieved their stated objectives, and the results generally support their conclusions. This research has the potential to advance the fields of enzyme functional annotation and protein design, particularly in the context of screening large-scale sequence databases and unstructured data. However, the data and methods are still limited by the biases of current public databases, so the interpretation of predictions requires specific biological context and experimental validation.

      Strengths:

      The strengths of this work include the innovative methodological incorporation of EC classification information for "reaction-informed" sample pairing, thereby enhancing the discriminative power of contrastive learning. Results demonstrate that Squidly outperforms existing machine learning methods on multiple benchmarks and is significantly faster than structure prediction tools, demonstrating its practicality.

      Weaknesses:

      Disadvantages include the lack of a systematic evaluation of the impact of each strategy on model performance. Furthermore, some analyses, such as PCA visualization, exhibit low explained variance, which undermines the strength of the conclusions.

    1. Author response:

      We thank the reviewers for their constructive feedback on the article’s strengths and weaknesses. In response, we plan to strengthen our work in a revised version by (i) providing an additional example of our method’s implementation and (ii) framing our contribution more clearly as a continuation of the line of research that characterises neuronal models in terms of their bifurcation structure.

      Experimental validation, however, is beyond the scope of this study. Constructing experimental bifurcation diagrams remains a major challenge, particularly for unstable branches. Although some techniques exist to approximate branches of unstable steady states, unstable limit cycles are far more difficult to capture. Additionally, in practice, many factors vary during recordings, and generating reliable diagrams would require a large number of tightly controlled experimental repetitions whose stability often cannot be ensured. Two-dimensional bifurcation diagrams, as needed for the analysis in our manuscript, are even more challenging to obtain because the extensive and stable recordings would have to be available from the same cell at different values of the second parameter (such as different extracellular potassium concentrations). At this stage, our method can be applied to the reduction of detailed conductance-based models, which themselves are constrained by experimental data (for example, gating functions fitted to voltage-clamp recordings). This way, simple yet dynamically faithful phenomenological models for efficient use in network analysis and simulation can be derived from more complex, biophysical models. In contrast to the traditional voltage fitting approach, these models can also capture changes in additional parameters (such as extracellular potassium concentration).

    2. Reviewer #2 (Public review):

      Summary:

      The authors derive an integrate-and-fire model to describe the dynamics of a more complex Wang-Buzsaki model and compare the two models. A detailed discussion of bifurcation schemes in both models is convincing and allows us to evaluate the simpler model.

      Strengths:

      The idea is interesting, and the mathematical approach appears to be convincing. In addition, differences between the simple and original models are also discussed.

      Weaknesses:

      A comparison to experimental data is necessary to support the theoretical work.

    3. Reviewer #1 (Public review):

      Summary:

      From a big picture viewpoint, this work aims to provide a method to fit parameters of reduced models for neural dynamics so that the resulting tuned model has a bifurcation diagram that matches that of a more complex, computationally expensive model. The matching of bifurcation diagrams ensures that the model dynamics agree on a region of parameter space, rather than just at specially tuned values, and that the models share properties such as qualitative features of their phase response curves, as the authors demonstrate. A notable point is the inclusion of extracellular potassium concentration dynamics into the reduced model - here, the quadratic integrate-and-fire model; this is straightforward but nonetheless useful for studying certain phenomena.

      Strengths:

      The paper demonstrates the method specifically on the fitting of the quadratic integrate-and-fire model, with potassium concentration dynamics included, to the Wang-Buzsaki model extended to include the potassium component. The method works very well overall in this instance. The resulting model is thoroughly compared with the original, in terms of bifurcation diagrams, production of various activity patterns, phase response curves, and associated phase-locking and synchronization properties.

      Weaknesses:

      It is important to note that the proposed method requires that a target bifurcation diagram be known. In practical terms, this means that the method may be well suited to fitting a reduced model to another, more complicated model, but is not likely to be useful for fitting the model to data. Certainly, the authors did not illustrate any such application. Secondly, the authors do not provide any sort of general algorithm but rather give a demonstration of a single example of fitting one specific reduced model to one specific conductance-based model. Finally, the main idea of the paper seems to me to be a natural descendant of the chain of reasoning, starting from Rinzel - continuing through Bertram; Golubitsky/Kaper/Josic; Izhikevich; and others - that a fundamental way to think about neuronal models, especially those involving bursting dynamics, is in terms of their bifurcation structure. According to this line of reasoning, two models are "the same" if they have the same bifurcation structure. Thus, it becomes natural to fit a reduced model to a more complicated model based on the bifurcation structure. The authors deserve credit for recognizing and implementing this step, and their work may be a useful example to the community. But the manuscript should have described and cited this chain of works to put the current study in the correct context.

    4. eLife Assessment

      This work demonstrates an objective way to select parameter values for a quadratic integrate-and-fire model so that its bifurcation diagram matches a specific target diagram, generated from the Wang-Buzsaki model. The method is useful for the field and is presented with convincing evidence. The method is currently limited in its ability to be applied to data, but improves our mathematical tools to treat a rarely studied type of bifurcation.

    1. eLife Assessment

      In this important study, the authors conducted extensive atomistic and coarse-grained simulations as well as a lattice Monte Carlo analysis to probe the driving force and functional impact of supercomplex formation in the inner mitochondrial membrane. The study highlighted the major contribution from membrane mechanics to the supercomplex formation and revealed interesting differences in structural and dynamical features of the protein components upon complex formation. Upon revision, the analysis is considered solid, although the magnitude of estimated membrane deformation energies seem somewhat surprisingly large. Overall, the study is thorough, creative and the impact on the field of bioenergetics is expected to be significant.

    2. Reviewer #1 (Public review):

      This paper by Poverlein et al reports the substantial membrane deformation around the oxidative phosphorylation super complex, proposing that this deformation is a key part of super complex formation. I found the paper interesting and well-written.

      * Analysis of the bilayer curvature is challenging on the fine lengthscales they have used and produces unexpectedly large energies (Table 1). Additionally, the authors use the mean curvature (Eq. S5) as input to the (uncited, but it seems clear that this is Helfrich) Helfrich Hamiltonian (Eq. S7). If an errant factor of one half has been included with curvature, this would quarter the curvature energy compared to the real energy, due to the squared curvature. The bending modulus used (ca. 5 kcal/mol) is small on the scale of typically observed biological bending moduli. This suggests the curvature energies are indeed much higher even than the high values reported. Some of this may be due to the spontaneous curvature of the lipids and perhaps the effect of the protein modifying the nearby lipids properties.

      * It is unclear how CDL is supporting SC formation if its effect stabilizing the membrane deformation is strong or if it is acting as an electrostatic glue. While this is a weakenss for a definite quantification of the effect of CDL on SC formation, the study presents an interesting observation of CDL redistribution and could be an interesting topic for future work.

      In summary, the qualitative data presented are interesting (especially the combination of molecular modeling with simpler Monte Carlo modeling aiding broader interpretation of the results). The energies of the membrane deformations are quite large. This might reflect the roles of specific lipids stabilizing those deformations, or the inherent difficulty in characterizing nanometer-scale curvature.

    3. Reviewer #3 (Public review):

      Summary:

      In this contribution, the authors report atomistic, coarse-grained and lattice simulations to analyze the mechanism of supercomplex (SC) formation in mitochondria. The results highlight the importance of membrane deformation as one of the major driving forces for the SC formation, which is not entirely surprising given prior work on membrane protein assembly, but certainly of major mechanistic significance for the specific systems of interest.

      Strengths:

      The combination of complementary approaches, including an interesting (re)analysis of cryo-EM data, is particularly powerful, and might be applicable to the analysis of related systems. The calculations also revealed that SC formation has interesting impacts on the structural and dynamical (motional correlation) properties of the individual protein components, suggesting further functional relevance of SC formation. In the revision, the authors further clarified and quantified their analysis of membrane responses, leading to further insights into membrane contributions. They have also toned down the decomposition of membrane contributions into enthalpic and entropic contributions, which is difficult to do. Overall, the study is rather thorough, highly creative and the impact on the field is expected to be significant.

      Weaknesses:

      Upon revision, I believe the weakness identified in previous work has been largely alleviated.

    1. eLife Assessment

      This is an important study describing the morphological changes during boundary formation between sensory and non-sensory tissues of the inner ear. The authors provided solid evidence that a transcription factor, Lmx1a and ROCK-dependent actinomyosin are key for border formation in the inner ear. However, future studies will be needed to investigate the direct relationships among boundary formation, Lmx1a and ROCK. This work will be of interest to developmental biologists interested in boundary formation.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript investigated the mechanism underlying boundary formation necessary for proper separation of vestibular sensory end organs. In both chick and mouse embryos, it was shown that a population of cells abutting the sensory (marked by high Sox2 expression) /nonsensory cell populations (marked by Lmx1a expression) undergo apical expansion, elongation, alignment and basal constriction to separate the lateral crista (LC) from the utricle. Using Lmx1a mouse mutant, organ cultures, pharmacological and viral-mediated Rock inhibition, it was demonstrated that the Lmx1a transcription factor and Rock-mediated actomyosin contractility is required for boundary formation and LC-utricle separation.

      Strengths:

      Overall, the morphometric analyses were done rigorously and revealed novel boundary cell behaviors. The requirement of Lmx1a and Rock activity in boundary formation was convincingly demonstrated.

      Weaknesses:

      However, the precise roles of Lmx1a and Rock in regulating cell behaviors during boundary formation were not clearly fleshed out. For example, phenotypic analysis of Lmx1a was rather cursory; it is unclear how Lmx1a, expressed in half of the boundary domain, control boundary cell behaviors and prevent cell mixing between Lmx1a+ and Lmx1a- compartments? Well-established mechanisms and molecules for boundary formation were not investigated (e.g. differential adhesion via cadherins, cell repulsion via ephrin-Eph signaling). Moreover, within the boundary domain, it is unclear whether apical multicellular rosettes and basal constrictions are drivers of boundary formation, as boundary can still form when these cell behaviors were inhibited. Involvement of other cell behaviors, such as directional cell intercalation and oriented cell division also warrant consideration. With these lingering questions, the mechanistic advance of the present study is modest.

      Revision: The clarity of the text was improved. The open questions regarding the mechanisms were not experimentally addressed but discussed.

    3. Reviewer #3 (Public review):

      Summary:

      Lmx1a is an orthologue of apterous in flies, which is important for dorsal-ventral border formation in the wing disc. Previously, this research group has described the importance of the chicken Lmx1b in establishing the boundary between sensory and non-sensory domains in the chicken inner ear. Here, the authors described a series of cellular changes during border formation in the chicken inner ear, including alignment of cells at the apical border and concomitant constriction basally. The authors extended these observations to the mouse inner ear and showed that these morphological changes occurred at the border of Lmx1a positive and negative regions, and these changes failed to develop in Lmx1a mutants. Furthermore, the authors demonstrated that the ROCK-dependent actomyosin contractility is important for this border formation and blocking ROCK function affected epithelial basal constriction and border formation in both in vitro and in vivo systems.

      Strengths:

      The morphological changes described during border formation in the developing inner ear are interesting. Linking these changes to the function of Lmx1a and ROCK dependent actomyosin contractile function are provocative.

      Weaknesses:

      There are several outstanding issues that need to be clarified before one can pin the morphological changes observed being causal to border formation and that Lmx1a and ROCK are involved.

      Comments on the latest version:

      The revised manuscript has provided clarity of their results on some levels, but unfortunately, the basal restriction during border formation remains unclear and the study did not advance the understanding of role of Lmx1a in boundary formation. Overall comments are indicated below:

      (1) The authors states in the rebuttal, "we do not think that ROCK activity is required for the formation or maintenance of the basal constriction at the interface of Lmx1a-expressing and non-expressing cells"<br /> If the above is the sentiment of the authors, then the manuscript is not written to support this sentiment clearly, starting with this misleading sentence in the Abstract, "The boundary domain is absent in Lmx1a-deficient mice, which exhibit defects in sensory organ segregation, and is disrupted by the inhibition of ROCK-dependent actomyosin contractility."

      (2) As acknowledged by the authors, the data as they currently stand could be explained by Lmx1a functioning in specifying the non-sensory fate and may not function directly in boundary formation. With this caveat in mind, the role of Lmx1a in boundary formation remains unclear.

      (3) I feel like the word "orchestrate" in the title is an overstatement.

    1. eLife Assessment

      This valuable study expands the inventory of polyadenylated RNAs cleaved by the double-stranded RNA endonuclease Rnt1 in budding yeast, using solid methodology based on high-throughput sequencing. Previous studies had anecdotally discovered mRNA substrates, and this global characterization is comprehensive with multiple complementary controls. This study sets the stage for deeper investigations into the biological function of Rnt1 and substrate cleavage.

    2. Reviewer #1 (Public review):

      Sarpaning et al. provide a thorough characterization of putative Rnt1 cleavage of mRNA in S. cerevisiae. Previous studies have discovered Rnt1 mRNA substrates anecdotally, and this global characterization expands the known collection of putative Rnt1 cleavage sites. The study is comprehensive, with several types of controls to show that Rnt1 is required for several of these cleavages.

      Comments on revisions:

      The authors have responded appropriately to the review.

    3. Reviewer #2 (Public review):

      This study presents a useful inventory of polyadenylated RNAs cleaved by the double-stranded RNA endonuclease Rnt1 in yeast. The data were obtained with solid methodology based on high-throughput sequencing, and the evidence that Rnt1 contributes to cellular homeostasis through controlling the turnover of selected mRNAs is convincing.

      Comments on revisions:

      I appreciate the authors' thorough and thoughtful response, and I find that the manuscript has been substantially strengthened by the additional data, analyses, and textual clarifications.

    1. eLife Assessment

      This study combines mathematical models and experimental data to analyse the emergence of heterogeneity within clonal NK cell responses during antigen-specific cell expansion. It comprises different experimental data and extensively explores various mathematical models, to study NK cell turnover during acute immune responses and homeostatic turnover within murine cytomegalovirus infection (MCMV). The solid study presents valuable findings and provides insights on heterogeneous NK cell development

    2. Reviewer #1 (Public review):

      Summary:

      The objective of this study was to infer the population dynamics (rates of differentiation, division and loss) and lineage relationships of NK cell subsets during an acute immune response and under homeostatic conditions.

      Strengths:

      A rich dataset and a detailed analysis of a particular class of stochastic models.

      Weaknesses: (relating to initial submission)

      The stochastic models used are quite simple; each population is considered homogeneous with first-order rates of division, death, and differentiation. In Markov process models such as these there is no dependence of cellular behavior on its history of divisions. In recent years models of clonal expansion and diversification, in the settings of T and B cells, have progressed beyond this picture. So I was a little surprised that there was no mention of the literature exploring the role of replicative history in differentiation (e.g. Bresser Nat Imm 2022), nor of the notion of family 'division destinies' (either in division number, or the time spent proliferating, as described by the Cyton and Cyton2 models developed by Hodgkin and collaborators; e.g. Heinzel Nat Imm 2017). The emerging view is that variability in clone (family) size arises may arise predominantly from the signals delivered at activation, which dictate each precursor's subsequent degree of expansion, rather than from the fluctuations deriving from division and death modeled as Poisson processes.

      As you pointed out, the Gerlach and Buchholz Science papers showed evidence for highly skewed distributions of family sizes, and correlations between family size and phenotypic composition. Is it possible that your observed correlations could arise if the propensity for immature CD27+ cells to differentiate into mature CD27- cells increases with division number? The relative frequency of the two populations would then also be impacted by differences in the division rates of each subset - one would need to explore this. But depending on the dependence of the differentiation rate on division number, there may be parameter regimes (and timepoints) at which the more differentiated cells can predominate within large clones even if they divide more slowly than their immature precursors. One might not then be able to rule out the two-state model. I would like to see a discussion or rebuttal of these issues.

      Comments on revisions:

      The authors have put in a lot of effort to address the reviews and have explored alternative models carefully.

      In the sections relating to homeostasis and the endogenous response, as far as I can tell you are estimating net growth rates (the k parameters) throughout - this is to be expected if you're working with just cell numbers and no information relating to proliferation. In these sections there are many places where you refer to proliferation rates and death rates when I think you just mean net positive or net negative growth rates. It's important to be precise about this even if the language can get a bit repetitive. (These net rates of growth or loss relate to clonal rather than cellular dynamics, which may be worth explaining). Later, you do use data relating to dead cells, which in principle can be used to get independent measures of death rates, but these data were not used in the fitting.

      There is so much evidence that T and B cell differentiation are often contingent on division that it would be very reasonable to consider it as a possibility for NK cells too. (Differentiation could be asymmetric, as you explored, or simply symmetric with some probability per division). These processes can be cast into simple ODE models but no longer allow you to aggregate division and death rates - so for parameter estimation you need to add measures of proliferation (Ki67 or similar) or death. This may be worth some discussion?

    3. Reviewer #2 (Public review):

      Summary:

      Wethington et al. investigated the mechanistic principles underlying antigen-specific proliferation and memory formation in mouse natural killer (NK) cells following exposure to mouse cytomegalovirus (MCMV), a phenomenon predominantly associated with CD8+ T cells. Using a stochastic modeling approach, the authors aimed to develop a quantitative model of NK cell clonal dynamics during MCMV infection. Starting from a single immature Ly49+CD27+ NK cell, a two-state linear model (with a death variant) explained the negative correlation between clone size at 8 dpi and the CD27+ fraction, but failed to reproduce the first and second moments of CD27+ and CD27− NK cell populations at 8 dpi. To address this limitation, the authors added an intermediate maturation state, yielding a three-stage model (CD27+Ly6C− → CD27−Ly6C− → CD27−Ly6C+) that fits the first and second moments under two constraints: CD27+ NK cells proliferate faster than CD27− NK cells, and clone size is negatively correlated with the CD27+ fraction (upper bound of −0.2). The model predicts high proliferation in the intermediate state and high death in mature CD27−Ly6C+ cells, and it was validated using Adams et al. (2021) NK reporter mice tracking CD27+/− populations after tamoxifen, allowing discrimination between bone marrow-derived and pre-existing peripheral NK cells. To test the prediction that mature CD27− NK cells have a higher death rate, the authors measured Ly49H+ NK cell viability in the mouse spleen at different time points post-MCMV infection. Data confirmed lower viability of mature (CD27−) than immature (CD27+) cells during days 4-8 post-infection, and a model variant supported that higher CD27− death increases their proportion in the dead cell compartment. Altogether, the authors propose a three-stage quantitative model of antigen-specific expansion and maturation of naïve Ly49H+ NK cells with the trajectory CD27+Ly6C− (immature) → CD27−Ly6C− (mature I) → CD27−Ly6C+ (mature II), highlighting high proliferation in the mature I state and increased death in the mature II state.

      Strengths:

      Models explaining correlations and first and second moments, supported by analytical investigations, stochastic simulations, and model selection, identify key processes in antigen-specific NK expansion and maturation. The work distinguishes expansion, contraction, and memory in NK cells from CD8+ T cells and informs NK therapy development.

      Weaknesses (relating to initial submission):

      The conclusions of this paper are largely supported by the available data. However, a comparative analysis with more recent works in the field would be desirable. Clarifications:

      (1) Initial Conditions and Grassmann Data: The Grassmann data is used solely as a constraint, while the simulated values of CD27+/CD27− cells could have been directly fitted to the Grassmann data, which assumes a 1:1 ratio of CD27+/CD27− at t = 0. This would allow an alternative initial condition rather than starting from a single CD27+ cell.

      (2) Correlation Coefficients in the Three-State Model: Although the parameter scan of the three-stage model (Figure 2) demonstrates the potential for negative correlations between colony size and the fraction of CD27+ cells, the calculated correlation coefficients using the fitted parameter values are not shown. Including these would validate that the fitted parameters lie in the negative-correlation regime.

      (3) Viability Dynamics and Adaptive Response: The authors measured the time evolution of CD27+/− dynamics and viability over 30 days post-infection (Figure 4). It would be valuable to test whether the three-state model can reproduce the adaptive response of CD27− cells to MCMV infection, particularly the observed drop in CD27− viability at 5 dpi and its rebound at 8 dpi. Demonstrating this would test whether the model can simultaneously explain viability dynamics and moment dynamics, and would enable sensitivity analysis of CD27− viability with respect to model parameters.

    1. eLife Assessment

      This study combines genetic, cell biological, and interaction data to propose a model of meiotic double-strand break regulation in C. elegans. Solid evidence supports the main conclusions, while by nature of a screening-type study, more may be needed to solidify speculations in future studies. Yet, comprehensive cataloging of the physical and genetic interactions of factors required for meiotic double-strand break is useful information for the field.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript by Raices et al., provides some novel insights into the role and interactions between SPO-11 accessory proteins in C. elegans. The authors propose a model of meiotic DSBs regulation, critical to our understanding of DSB formation and ultimately crossover regulation and accurate chromosome segregation. The work also emphasizes the commonalities and species-specific aspects of DSB regulation.

      Strengths:

      This study capitalizes on the strengths of the C. elegans system to uncover genetic interactions between a lSPO-11 accessory proteins. In combination with physical interactions, the authors synthesize their findings into a model, which will serve as the basis for future work, to determine mechanisms of DSB regulation.

      Weaknesses:

      The methodology, although standard, still lacks some rigor, especially with the IPs.

    3. Reviewer #2 (Public review):

      Summary:

      Meiotic recombination initiates with the formation of DNA double-strand break (DSB) formation, catalyzed by the conserved topoisomerase-like enzyme Spo11. Spo11 requires accessory factors that are poorly conserved across eukaryotes. Previous genetic studies have identified several proteins required for DSB formation in C. elegans to varying degrees; however, how these proteins interact with each other to recruit the DSB-forming machinery to chromosome axes remains unclear.

      In this study, Raices et al. characterized the biochemical and genetic interactions among proteins that are known to promote DSB formation during C. elegans meiosis. The authors examined pairwise interactions using yeast two-hybrid (Y2H) and co-immunoprecipitation and revealed an interaction between a chromatin-associated protein HIM-17 and a transcription factor XND-1. They further confirmed the previously known interaction between DSB-1 and SPO-11 and showed that DSB-1 also interacts with a nematode-specific HIM-5, which is essential for DSB formation on the X chromosome. They also assessed genetic interactions among these proteins, categorizing them into four epistasis groups by comparing phenotypes in double vs. single mutants. Combining these results, the authors proposed a model of how these proteins interact with chromatin loops and are recruited to chromosome axes, offering insights into the process in C. elegans compared to other organisms.

      Weaknesses:

      This work relies heavily on Y2H, which is notorious for having high rates of false positives and false negatives. Although the interactions between HIM-17 and XND-1 and between DSB-1 and HIM-5 were validated by co-IP, the significance of these interactions was not tested in vivo. Cataloging Y2H and genetic interactions does not yield much more insight. The model proposed in Figure 4 is also highly speculative.

    4. Reviewer #3 (Public review):

      The goal of this work is to understand the regulation of double-strand break formation during meiosis in C. elegans. The authors have analyzed physical and genetic interactions among a subset of factors that have been previously implicated in DSB formation or the number of timing of DSBs: CEP-1, DSB-1, DSB-2, DSB-3, HIM-5, HIM-17, MRE-11, REC-1, PARG-1, and XND-1.

      The 10 proteins that are analyzed here include a diverse set of factors with different functions, based on prior analyses in many published studies. The term "Spo11 accessory factors" has been used in the meiosis literature to describe proteins that directly promote Spo11 cleavage activity, rather than factors that are important for the expression of meiotic proteins or that influence the genome-wide distribution or timing of DSBs. Based on this definition, the known SPO-11 accessory factors in C. elegans include DSB-1, DSB-2, DSB-3, and the MRN complex (at least MRE-11 and RAD-50). These are all homologs of proteins that have been studied biochemically and structurally in other organisms. DSB-1 & DSB-2 are homologs of Rec114, while DSB-3 is a homolog of Mei4. Biochemical and structural studies have shown that Rec114 and Mei4 directly modulate Spo11 activity by recruiting Spo11 to chromatin and promoting its dimerization, which is essential for cleavage. The other factors analyzed in this study affect the timing, distribution, or number of RAD-51 foci, but they likely do so indirectly. As elaborated below, XND-1 and HIM-17 are transcription factors that modulate the expression of other meiotic genes, and their role in DSB formation is parsimoniously explained by this regulatory activity. The roles of HIM-5 and REC-1 remain unclear; the reported localization of HIM-5 to autosomes is consistent with a role in transcription (the autosomes are transcriptionally active in the germline, while the X chromosome is largely silent), but its loss-of-function phenotypes are much more limited than those of HIM-17 and XND-1, so it may play a more direct role in DSB formation. The roles of CEP-1 (a Rad53 homolog) and PARG-1 are also ambiguous, but their homologs in other organisms contribute to DNA repair rather than DSB formation.

      An additional significant limitation of the study, as stated in my initial review, is that much of the analysis here relies on cytological visualization of RAD-51 foci as a proxy for DSBs. RAD-51 associates transiently with DSB sites as they undergo repair and is thus limited in its ability to reveal details about the timing or abundance of DSBs since its loading and removal involve additional steps that may be influenced by the factors being analyzed.

      The paper focuses extensively on HIM-5, which was previously shown through genetic and cytological analysis to be important for breaks on the X chromosome. The revised manuscript still claims that "HIM-5 mediates interactions with the different accessory factors sub-groups, providing insights into how components on the DNA loops may interact with the chromosome axis." The weak interactions between HIM-5 and DSB-1/2 detected in the Y2H assay do not convincingly support such a role. The idea that HIM-5 directly promotes break formation is also inconsistent with genetic data showing that him-5 mutants lack breaks on the X chromosomes, while HIM-5 has been shown to be is enriched on autosomes. Additionally, as noted in my comment to the authors, the localization data for HIM-5 shown in this paper are discordant with prior studies; this discrepancy should be addressed experimentally.

      This paper describes REC-1 and HIM-5 as paralogs, based on prior analysis in a paper that included some of the same authors (Chung et al., 2015; DOI 10.1101/gad.266056.115). In my initial review I mentioned that this earlier conclusion was likely incorrect and should not be propagated uncritically here. Since the authors have rebutted this comment rather than amending it, I feel it is important to explain my concerns about the conclusions of previous study. Chung et al. found a small region of potential homology between the C. elegans rec-1 and him-5 genes and also reported that him-5; rec-1 double mutants have more severe defects than either single mutant, indicative of a stronger reduction in DSBs. Based on these observations and an additional argument based on microsynteny, they concluded that these two genes arose through recent duplication and divergence. However, as they noted, genes resembling rec-1 are absent from all other Caenorhabditis species, even those most closely related to C. elegans. The hypothesis that two genes are paralogs that arose through duplication and divergence is thus based on their presence in a single species, in the absence of extensive homology or evidence for conserved molecular function. Further, the hypothesis that gene duplication and divergence has given rise to two paralogs that share no evident structural similarity or common interaction partners in the few million years since C. elegans diverged from its closest known relatives is implausible. In contrast, DSB-1 and DSB-2 are both homologs of Rec114 that clearly arose through duplication and divergence within the Caenorhabditis lineage, but much earlier than the proposed split between REC-1 and HIM-5. Two genes that can be unambiguously identified as dsb-1 and dsb-2 are present in genomes throughout the Elegans supergroup and absent in the Angaria supergroup, placing the duplication event at around 18-30 MYA, yet DSB-1 and DSB-2 share much greater similarity in their amino acid sequence, predicted structure, and function than HIM-5 and REC-1. Further, Raices place HIM-5 and REC-1 in different functional complexes (Figure 3B).

      The authors acknowledge that HIM-17 is a transcription factor that regulates many meiotic genes. Like HIM-17, XND-1 is cytologically enriched along the autosomes in germline nuclei, suggestive of a role in transcription. The Reinke lab performed ChIP-seq in a strain expressing an XND-1::GFP fusion protein and showed that it binds to promoter regions, many of which overlap with the HIM-17-regulated promoters characterized by the Ahringer lab (doi: 10.1126/sciadv.abo4082). Work from the Yanowitz lab has shown that XND-1 influences the transcription of many other genes involved in meiosis (doi: 10.1534/g3.116.035725) and work from the Colaiacovo lab has shown that XND-1 regulates the expression of CRA-1 (doi: 10.1371/journal.pgen.1005029). Additionally, loss of HIM-17 or XND-1 causes pleiotropic phenotypes, consistent with a broad role in gene regulation. Collectively, these data indicate that XND-1 and HIM-17 are transcription factors that are important for the proper expression of many germline-expressed genes. Thus, as stated above, the roles of HIM-17 and XND-1 in DSB formation, as well as their effects on histone modification, are parsimoniously explained by their regulation of the expression of factors that contribute more directly to DSB formation and chromatin modification. I feel strongly that transcription factors should not be described as "SPO-11 accessory factors."

    5. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript by Raices et al., provides novel insights into the role and interactions between SPO-11 accessory proteins in C. elegans. The authors propose a model of meiotic DSBs regulation, critical to our understanding of DSB formation and ultimately crossover regulation and accurate chromosome segregation. The work also emphasizes the commonalities and species-specific aspects of DSB regulation.

      Strengths:

      This study capitalizes on the strengths of the C. elegans system to uncover genetic interactions between a large number of SPO-11 accessory proteins. In combination with physical interactions, the authors synthesize their findings into a model, which will serve as the basis for future work, to determine mechanisms of DSB regulation.

      Weaknesses:

      The methodology, although standard, lacks quantification. This includes the mass spectrometry data , along with the cytology. The work would also benefit from clarifying the role of the DSB machinery on the X chromosome versus the autosomes.

      • We have uploaded the MS data and added a summary table with the number of peptides and coverage.

      • We have added statistics to the comparisons of DAPI body counts.

      • We have provided additional images of the change in HIM-5 localization

      • We have quantified the overlap (or lack thereof) between XND-1 and HIM-17 and the DNA axis

      Reviewer #2 (Public Review):

      Summary:

      Meiotic recombination initiates with the formation of DNA double-strand break (DSB) formation, catalyzed by the conserved topoisomerase-like enzyme Spo11. Spo11 requires accessory factors that are poorly conserved across eukaryotes. Previous genetic studies have identified several proteins required for DSB formation in C. elegans to varying degrees; however, how these proteins interact with each other to recruit the DSB-forming machinery to chromosome axes remains unclear.

      In this study, Raices et al. characterized the biochemical and genetic interactions among proteins that are known to promote DSB formation during C. elegans meiosis. The authors examined pairwise interactions using yeast two-hybrid (Y2H) and co-immunoprecipitation and revealed an interaction between a chromatin-associated protein HIM-17 and a transcription factor XND-1. They further confirmed the previously known interaction between DSB-1 and SPO-11 and showed that DSB-1 also interacts with a nematodespecific HIM-5, which is essential for DSB formation on the X chromosome. They also assessed genetic interactions among these proteins, categorizing them into four epistasis groups by comparing phenotypes in double vs. single mutants. Combining these results, the authors proposed a model of how these proteins interact with chromatin loops and are recruited to chromosome axes, offering insights into the process in C. elegans compared to other organisms.

      Weaknesses:

      This work relies heavily on Y2H, which is notorious for having high rates of false positives and false negatives. Although the interactions between HIM-17 and XND-1 and between DSB-1 and HIM-5 were validated by co-IP, the significance of these interactions was not tested, and cataloging Y2H interactions does not yield much more insight.

      We appreciate that the reviewer recognized the value of our IP data, but we beg to differ that we rely too heavily on the Y2H. We also provide genetic analysis on bivalent formation to support the physical interaction data. We do acknowledge that there are caveats with Y2H, however, including that a subset of the interactions can only be examined with proteins in one orientation due to auto-activation. While we acknowledge that it would be nice to have IP data for all of the proteins using CRISPR-tagged, functional alleles, these strains are not all feasible (e.g. no functional rec-1 tag has been made) and are beyond the scope of the current work.

      Moreover, most experiments lack rigor, which raises serious concerns about whether the data convincingly supports the conclusions of this paper. For instance, the XND-1 antibody appears to detect a band in the control IP; however, there was no mention of the specificity of this antibody.

      We previously showed the specificity of this antibody in its original publication showing lack of staining in the xnd-1 mutant by IF (Wagner et al., 2010). To further address this, however, we have now included a new supplementary figure (Figure S1) demonstrating the specificity of the XND-1 antibody by Western blot. The antibody detects a distinct band in extracts from wild-type (N2) worms, but this band is absent in two independent xnd-1 mutant strains. This confirms that the antibody specifically recognizes XND-1, supporting the validity of the IP results shown in the main figures.

      Additionally, epistasis analysis of various genetic mutants is based on the quantification of DAPI bodies in diakinesis oocytes, but the comparisons were made without statistical analyses.

      We have added statistical analysis to all datasets where quantification was possible, strengthening the rigor and interpretation of our findings.

      For cytological data, a single representative nucleus was shown without quantification and rigorous analysis. The rationale for some experiments is also questionable (e.g. the rescue by dsb-2 mutants by him-5 transgenes in Figure 2), making the interpretation of the data unclear. Overall, while this paper claims to present "the first comprehensive model of DSB regulation in a metazoan", cataloging Y2H and genetic interactions did not yield any new insights into DSB formation without rigorous testing of their significance in vivo. The model proposed in Figure 4 is also highly speculative.

      Regarding the cytology, we provide new images and quantification of HIM-17 and XND-1 overlap with the DNA axes. We also added full germ line images showing HIM-5 localization in wild type and dsb-1 mutants, to provide a more complete and representative view of the observed phenotype. To further support our findings, we’ve also included images demonstrating that this phenotype is consistently observed with both in live worm with the the him-5::GFP transgene and in fixed worms with an endogenously tagged version of HIM-5.

      Reviewer #3 (Public Review):

      During meiosis in sexually reproducing organisms, double-strand breaks are induced by a topoisomerase-related enzyme, Spo11, which is essential for homologous recombination, which in turn is required for accurate chromosome segregation. Additional factors control the number and genome-wide distribution of breaks, but the mechanisms that determine both the frequency and preferred location of meiotic DSBs remain only partially understood in any organism.

      The manuscript presents a variety of different analyses that include variable subsets of putative DSB factors. It would be much easier to follow if the analyses had been more systematically applied. It is perplexing that several factors known to be essential for DSB formation (e.g., cohesins, HORMA proteins) are excluded from this analysis, while it includes several others that probably do not directly contribute to DSB formation (XND-1, HIM-17, CEP-1, and PARG-1).

      We respectfully disagree with the reviewer’s statement regarding the selection of factors included in our analysis. In this work, our focus was specifically on SPO-11 accessory factors — proteins that directly interact with or regulate SPO-11 activity during doublestrand break formation. Cohesins and chromosome axis proteins (such as the HORMA domain proteins) are essential for establishing the correct chromosome architecture that supports DSB formation, but there is no evidence that they are direct accessory factors of SPO-11. Therefore, they were intentionally excluded from this study to maintain a clear and focused scope on proteins that more directly modulate SPO-11 function.

      Conversely, XND-1, HIM-17, CEP-1, and PARG-1 have all been implicated in regulating aspects of SPO-11-mediated DSB formation or its immediate environment. Although their contributions mayinvolve broader chromatin or DNA damage response regulation, prior literature supports their inclusion as relevant modulators of SPO-11 activity, justifying their analysis within the context of this work.

      The strongest claims seem to be that "HIM-5 is the determinant of X-chromosome-specific crossovers" and "HIM-5 coordinates the actions of the different accessory factors subgroups." Prior work had already shown that mutations in him-5 preferentially reduce meiotic DSBs on the X chromosome. While it is possible that HIM-5 plays a direct role in DSB induction on the X chromosome, the evidence presented here does not strongly support this conclusion. It is also difficult to reconcile this idea with evidence from prior studies that him-5 mutations predominantly prevent DSB formation on the sex chromosomes, while the protein localizes to autosomes.

      HIM-5 is not the only protein that is autosomally enriched but preferentially affects the X chromosome: MES-4 and MRG-1 are both autosomally-enriched but influence silencing of the X chromosome. While HIM-5 appears autosomally-enriched, it does not appear to be autosomal-exclusive. While we would ideally perform ChIP to determine its localization on chromatin, this method for assaying DSB sites is likely insufficient to identify DSB sites which differ in each nucleus and for which there are no known hotspots in the worm.

      him-5 mutants confer an ~50% reduction in total number of breaks and a very profound change in break dynamics (seen by RAD-51 foci (Meneely et al., 2012)). Since the autosomes receives sufficient breaks in this context to attain a crossover in >98% of nuclei, this indicates that the autosomes are much less profoundly impacted by loss of DSB functions than is the X chromosome. Indeed, prior data from co-author, Monica Colaiacovo, showed that fewer breaks occur on the X (Gao, 2015) likely resulting from differences in the chromatin composition of the X and autosome resulting from X chromosome silencing.

      The conclusion that HIM-5 must be required for breaks on the X comes from the examination of DSB levels and their localization in different mutants that impair but do not completely abrogate breaks. In any situation where HIM-5 protein expression is affected (xnd-1, him-17, and him-5 null alleles), breaks on the X are reduced/ eliminated. By contrast, in dsb-2 mutants, where HIM-5 expression is unaffected, both X and autosomal breaks are impacted equally. As discussed above, in the absence of HIM-5 function, there are ~15 breaks/ nucleus. The Ppie1::him-5 transgene is expressed to lower levels than Phim-5::him-5, but in the best case, the ectopic expression of this protein should give a maximum of ~15 breaks (the total # of breaks is thought to be ~30/nucleus). By these estimates, Ppie-1::him-5; him-17 and him-5 null mutants have the same number of breaks. Yet, in the former case, breaks occur on the X; whereas in the latter they do not. The best explanation for this discrepancy is that HIM-5 is sufficient to recruits the DSB machinery to the X chromosome.

      The one experiment that seems to elicit the conclusion that HIM-5 expression is sufficient for breaks on the X chromosome is flawed (see below). The conclusion that HIM-5 "coordinates the activities of the different accessory sub-groups" is not supported by data presented here or elsewhere.

      We have reorganized the discussion to more directly address the reviewers’ concerns. We raise the possibility that HIM-5 has an important role in bringing together the SPO-11 and its interacting components (DSB-1/2/3) with the other DSB inducing factors, including those factors that regulating DSB timing (XND-1), coordination with the cell cycle (REC-1), association with the chromosome axis (PARG-1, MRE-11), and coupling to downstream resection and repair (MRE-11, CEP-1).  

      This raises a natural question: if HIM-5 has such a central role, why are the phenotypes of HIM-5 so mild? We propose that while the loss of DSBs on the X appears mild, more profound effects are seen in the total number, timing, and placement of the DSBs across the genome- all of which are diminished or altered in the absence of HIM-5. The phenotypes of him-5 loss reminiscent of those observed in Prdm9-/- in mice where breaks are relocated to transcriptional start sites and show significant delay in formation. As with PRDM9, the comparatively subtle phenotypes of HIM-5 loss do not diminish its critical role in promoting proper DSB formation in most mammals.

      Like most other studies that have examined DSB formation in C. elegans, this work relies on indirect assays, here limited to the cytological appearance of RAD-51 foci and bivalent chromosomes, as evidence of break formation or lack thereof. Unfortunately, neither of these assays has the power to reveal the genome-wide distribution or number of breaks. These assays have additional caveats, due to the fact that RAD-51 association with recombination intermediates and successful crossover formation both require multiple steps downstream of DSB induction, some of which are likely impaired in some of the mutants analyzed here. This severely limits the conclusions that can be drawn. Given that the goal of the work is to understand the effects of individual factors on DSB induction, direct physical assays for DSBs should be applied; many such assays have been developed and used successfully in other organisms.

      We appreciate the reviewer’s thoughtful comments. We agree that RAD-51 foci are an indirect readout of DSB formation and that their dynamics can be influenced by defects in downstream repair processes. However, in C. elegans, the available methods for directly detecting DSBs are limited. Unlike other organisms, C. elegans lacks γH2AX, eliminating the possibility of using γH2AX as a DSB marker. TUNEL assays, while conceptually appealing, have proven unreliable and poorly reproducible in the germline context. Similarly, RPA foci do not consistently correlate with the number of DSBs and are influenced by additional processing steps.

      Given these limitations, RAD-51 foci remain the most widely accepted surrogate for monitoring DSB formation in C. elegans. While we fully acknowledge the caveats associated with this approach — particularly the potential effects of downstream repair defects — RAD-51 analysis continues to provide valuable insight into DSB dynamics and regulation, especially when interpreted in combination with other phenotypic assessments.

      Throughout the manuscript, the writing conflates the roles played by different factors that affect DSB formation in very different ways. XND-1 and HIM-17 have previously been shown to be transcription factors that promote the expression of many germline genes, including genes encoding proteins that directly promote DSBs. Mutations in either xnd-1 or him-17 result in dysregulation of germline gene expression and pleiotropic defects in meiosis and fertility, including changes in chromatin structure, dysregulation of meiotic progression, and (for xnd-1) progressive loss of germline immortality. It is thus misleading to refer to HIM-17 and XND-1 as DSB "accessory factors" or to lump their activities with those of other proteins that are likely to play more direct roles in DSB induction.

      It is clear that we will not reach agreement about the direct vs indirect roles here of chromatin remodelers/transcription factors in break formation. In yeast, there is a precedent for SPP1 and in mouse for Prdm9, both of which could be described as transcription factors as well, as having roles in break formation by creating an open chromatin environment for the break machinery. We envision that these proteins function in the same fashion. The changes in histone acetylation in the xnd-1 mutants supports such a claim.

      We do not know what the reviewer is referring to in statement that “XND-1 and HIM-17 have previously been shown to be transcription factors that promote the expression of many germline genes.” While the Carelli et al paper indeed shows a role for HIM-17 in expression of many germline genes, there is only one reference to XND-1 in this manuscript (Figure S3A) which shows that half of XND-1 binding sites overlap with the co-opted germline promoters. There is no transcriptional data at all on xnd-1 mutants, save our studies (referenced herein) that XND-1 regulates him-5 expression.

      For example, statements such as the following sentence in the Introduction should be omitted or explained more clearly: "xnd-1 is also unique among the accessory factors in influencing the timing of DSBs; in the absence of xnd-1, there is precocious and rapid accumulation of DSBs as monitored by the accumulation of the HR strand-exchange protein RAD-51.

      We are not sure what is confusing here. The distribution of RAD-51 foci is significantly altered in xnd-1 mutants and peak levels of breaks are achieved as nuclei leave the transition zone (Wagner et al., 2010; McClendon et al., 2016). There is no other mutation that causes this type of change in RAD-51 distribution.

      "The evidence that HIM-17 promotes the expression of him-5 presented here corroborates data from other publications, notably the recent work of Carelli et al. (2022), but this conclusion should not be presented as novel here.

      We have clarified this in the text. We note that this paper showed alterations in him-5 levels by RNA-Seq but they did not validate these results with quantitative RT-PCR. Thus, our studies do provide an important validation of their prior results.

      The other factors also fall into several different functional classes, some of which are relatively well understood, based largely on studies in other organisms. The roles of RAD50 and MRE-11 in DSB induction have been investigated in yeast and other organisms as well as in several prior studies in C. elegans. DSB-1, DSB-2, and DSB-3 are homologs of relatively well-studied meiotic proteins in other organisms (Rec114 and Mei4) that directly promote the activity of Spo11, although the mechanism by which they do so is still unclear.

      Whilst we agree that we understand some of the functions of the homologs, there are clearly examples in other processes of conserved proteins adopting unique regulatory function. We should not presume evolutionary conservation until proven. Indeed the comparison between the Mer2 proteins becomes particularly relevant here. For example, the RMM complex in plants does not contain PRD3, although this protein is thought to have function in DSB formation and repair (Lambing et al, 2022; Vrielynck et al., 2021; Thangavel et al., 2023). In Sordaria, as well, the Mer2 homolog has distinct functions (Tesse et al., 2017).  

      Mutations in PARG-1 (a Poly-ADP ribose glycohydrolase) likely affect the regulation of polyADP-ribose addition and removal at sites of DSBs, which in turn are thought to regulate chromatin structure and recruitment of repair factors; however, there is no convincing evidence that PARG-1 directly affects break formation.

      Our prior collaborative studies on PARG-1 showed that is has a non-catalytic function that promote DSBs that is independent of accumulation of PAR (Janisiw et al., 2020; Trivedi et al., 2022)

      CEP-1 is a homolog of p53 and is involved in the DNA damage response in the germline, but again is unlikely to directly contribute to DSB induction.

      We respectfully disagree with the reviewer’s statement. While CEP-1 is indeed a homolog of p53 and plays a major role in the DNA damage response, prior work from Brent Derry’s lab and from our group (Mateo et al., 2016) demonstrated that specific cep-1 separationof-function alleles affect DSB induction and/or repair pathway choice independently of canonical DNA damage checkpoint activation. In particular, defects in DSB formation observed in certain cep-1 mutants can be rescued by exogenous irradiation, supporting a direct or closely linked role in promoting DSB formation rather than merely responding to damage. Thus, based on these functional data, we considered CEP-1 a relevant factor to include in our analysis. We have now clarified this rationale in the revised manuscript.

      HIM-5 and REC-1 do not have apparent homologs in other organisms and play poorly understood roles in promoting DSB induction. A mechanistic understanding of their functions would be of value to the field, but the current work does not shed light on this. A previous paper (Chung et al. G&D 2015) concluded that HIM-5 and REC-1 are paralogs arising from a recent gene duplication, based on genetic evidence for a partially overlapping role in DSB induction, as well as an argument based on the genomic location of these genes in different species; however, these proteins lack any detectable sequence homology and their predicted structures are also dissimilar (both are largely unstructured but REC-1 contains a predicted helical bundle lacking in HIM-5). Moreover, the data presented here do not reveal overlapping sets of genetic or physical interactions for the two genes/proteins. Thus, this earlier conclusion was likely incorrect, and this idea should not be restated uncritically here or used as a basis to interpret phenotypes.

      Actually, there is quite good bioinformatic analysis that the rec-1 and him-5 loci evolved from a gene duplication and that each share features of the ancestral protein (Chung et al., 2015). We are sorry if the reviewer casts aspersions on the prior literature and analyses. The homology between these genes with the ancestral protein is near the same degree as dsb-1, dsb-2, or dsb-3 to their ancestral homologs (<17%).

      DSB-1 was previously reported to be strictly required for all DSB and CO formation in C. elegans. Here the authors test whether the expression of HIM-5 from the pie-1 promoter can rescue DSB formation in dsb-1 mutants, and claim to see some rescue, based on an increase in the number of nuclei with one apparent bivalent (Figure 2C). This result seems to be the basis for the claim that HIM-5 coordinates the activities of other DSB proteins. However, this assay is not informative, and the conclusion is almost certainly incorrect. Notably, a substantial number of nuclei in the dsb-1 mutant (without Ppie-1::him-5) are reported as displaying a single bivalent (11 DAPI staining bodies) despite prior evidence that DSBs are absent in dsb-1 mutants; this suggests that the way the assay was performed resulted in false positives (bivalents that are not actually bivalents), likely due to inclusion of nuclei in which univalents could not be unambiguously resolved in the microscope. A slightly higher level of nuclei with a single unresolved pair of chromosomes in the dsb-1; Ppie-1::him-5 strain is thus not convincing evidence for rescue of DSBs/CO formation, and no evidence is presented that these putative COs are X-specific. The authors should provide additional experimental evidence - e.g., detection of RAD-51 and/or COSA-1 foci or genetic evidence of recombination - or remove this claim. The evidence that expression of Ppie-1::him-5 may partially rescue DSB abundance in dsb-2 mutants is hard to interpret since it is currently unknown why C. elegans expresses 2 paralogs of Rec114 (DSB-1 and DSB-2), and the age-dependent reduction of DSBs in dsb-2 mutants is not understood.

      We have removed this claim in part because we have been unable to create the triple mutants strains to analyze COSA-1 foci.

      To the point about 11 vs 12 DAPI bodies: the literature is actually replete with examples of 11 DAPI bodies vs 12 in mutants with no breaks:

      Hinman al., 2021: null allele of dsb-3 has an average of 11.6 +/- 0.6 breaks;

      Stamper et al, 2013, show just over 60% of dsb-1 nuclei with 12 DAPI bodies and 5-10% with 10 DAPI bodies. (Figure 1);

      In addition, we also previously showed (Machovina et al., 2016) that a subset of meiotic nuclei have a single RAD-51 focus and can achieve a crossover. RAD-51 foci in spo-11 were also reported in Colaiacovo et al., 2003.

      Several of the factors analyzed here, including XND-1, HIM-17, HIM-5, DSB-1, DSB-2, and DSB-3, have been shown to localize broadly to chromatin in meiotic cells. Coimmunoprecipitation of pairs of these factors, even following benzonase digestion, is not strong evidence to support a direct physical interaction between proteins.

      Similarly, the super-resolution analysis of XND-1 and HIM-17 (Figure 1EF) does not reveal whether these proteins physically interact with each other, and does not add to our understanding of these proteins functions, since they are already known to bind to many of the same promoters. Promoters are also likely to be located in chromatin loops away from the chromosome axis, so in this respect, the localization data are also confirmatory rather than novel.

      While the binding to promoters would be expected to be on DNA loops, that has not been definitively shown in the worm germ line. The supplemental data of the Carelli paper suggests that there are ~250 binding sites for each protein at these coopted promoters. This could not account for crossover map seen in C. elegans.

      The reviewer states correct that we do not reveal that these proteins interact, but we have shown that the two proteins co-IP and have a Y2H interaction. This interaction is supporedt by a recent publication (Blazickova et al., 2025) corroborating this conclusion and identifies XND-1 in HIM-17 co-IPs also in the presence of benzonase. We do now show, however, by immuno-localization that the two proteins appear to be adjacent, but nonoverlapping. As now described in the text, AlphaFold 3 modeling and structural analysis suggests that the two proteins do interact directly and that the tagged 5’ end of HIM-17 used in our studies is likely to be at least 200nm from the putative XND-1 binding interface, a distance that is consistent with our confocal images showing frequent juxtaposition of the two proteins.

      The phenotypic analysis of double mutant combinations does not seem informative. A major problem is that these different strains were only assayed for bivalent formation, which (as mentioned above) requires several steps downstream of DSB induction. Additionally, the basis for many of the single mutant phenotypes is not well understood, making it particularly challenging to interpret the effects of double mutants. Further, some of the interactions described as "synergistic" appear to be additive, not synergistic. While additive effects can be used as evidence that two genes work in different pathways, this can also be very misleading, especially when the function of individual proteins is unknown. I find that the classification of genes into "epistastasis groups" based on this analysis does not shed light on their functions and indeed seems in some cases to contradict what is known about their functions. ‘

      As described above, each of the proteins analyzed is thought to have a direct role in regulating meiotic DSB formation and single mutant phenotypes are consistent with this interpretation. In almost all-if not all- of these cases, IR induced breaks suppress univalent phenotypes (or uncover a downstream repair defect (e.g. in mre-11)) supporting this conclusion. We have changed the terminology from “epistasis groups” since this is not strict epistasis, but rather, “functional groups”.  

      The yeast two-hybrid (Y2H) data are only presented as a single colony. While it is understandable to use a 'representative' colony, it is ideal to include a dilution series for the various interactions, which is how Y2H data are typically shown.

      The Y2H data are presented as spots on a plate and are from three to four individual transformants per interaction tested, and are not individual colonies. The experiment was repeated in triplicate from different transformations. We have now made this clearer in the materials and methods section. This approach has been successfully used to examine protein interactions in our prior manuscripts of yeast and human proteins [Gaines et al (2015) Nat. Comms 6:7834; Kondrashova et al (2017) Cancer Discovery 7:984; Garcin et al (2019) PLoS Genetics 15:e1008355; Bonilla et al (2021) eLife 1: e68080) Prakash et al (2022) PNAS 119: e2202727119, etc]

      Additional (relatively minor) concerns about these data:

      (1) Several interactions reported here seem to be detected in only one direction - e.g., MRE-11-AD/HIM-5-BD, REC-1-AD/XND-1-BD, and XND-1-AD/HIM-17-BD - while no interactions are seen with the reciprocal pairs of fusion proteins. I'm not sure if some of this is due to pasting "positive" colony images into the wrong position in the grid, but this should be addressed.

      The asymmetry in the interactions observed is due to the well-known phenomenon in yeast two-hybrid (Y2H) assays where certain plasmids exhibit self-activation when fused in one orientation, making interpretation of reciprocal interactions challenging. In our experiment, some of the plasmids indeed showed self-activation in one direction, which likely accounts for the lack of interaction seen with the reciprocal pairs of fusion proteins. We have clarified this point in the Methods.

      (2) DSB-3 was only assayed in pairwise combinations with a subset of other proteins; this should be explained; it is also unclear why the interaction grids are not symmetrical about the diagonal.

      We have now completed the analysis by adding the interactions of DSB-3 with the remaining proteins that were missing from the initial set.

      (3) I don't understand why the graphic summaries of Y2H data are split among 3 different figures (1, 2, and 3).

      We chose to split the graphic summaries of the Y2H data across Figures 1, 2, and 3 because we felt this organization better aligns with the flow of the results presented in each figure. Each set of interactions is shown in the context of the specific experiments and findings discussed in those sections, which we believe helps provide a clearer and more logical presentation of the data.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Figure 1: B) The IP is difficult to interpret - there is a band of the corresponding size to XND-1 in the control lane calling into question the specificity of the IP/Western.

      We added a supplemental figure with the specificity of the antibody showing that there is a background non-specific band.

      C) More information about the mass spectrometry should be included. No indication of the number of times a peptide was identified, or the overall coverage of the identified proteins.

      Done

      This is important as in the results section (line 114) the authors indicate that there was "strong" interaction yet there is no way to assess this.

      D) Why wasn't hatching measured in the him-5p::him-5; him-17(ok424) strain?

      Great question. I guess we need to do this while back out for review. If anyone has suggestions of what to say here. Clearly we overlooked this point but do have the strain.

      E) Quantification of the cytology should be included.

      We have now quantified overlap between XND-1 and HIM-17

      Figure 2: C) Statistics should be included.

      Done

      E) Quantification should be included for the cytology. I recommend changing the eals15 to HIM-5.

      We included better images showing whole gonads instead of one or two nuclei. We were not sure what the reviewers want us to quantify here since the relocalization of the protein to the cytoplasm is very clear.

      I have a general issue with the use of the term epistasis - this is used to order gene function based on different mutant phenotypes, usually with null alleles. While I think the authors have valid points with how they group the different SPO-11 accessory proteins, I do not think they should use the word epistasis, but rather genetic interactions.

      We appreciate the reviewers thoughts on this matter and have removed the term epistasis and use functional groups or genetic interactions throughout the text.

      Figure 4 and the nature of the X chromosome: First, I think it would help the non-C. elegans reader to include a little more information on the X chromosome with respect to its differences compared to the autosomes. I also think that, if possible, it would be beneficial to include a model of the X in Figure 4.

      We have added more about X/autosome differences in the intro and during the discussion of HIM-5 function and have added a figure showing difference in the behavior of the X/autosomes during DSB/crossover formation.

      Minor points:

      Abstract: Given the findings of Silva and Smolikove on SPO-11 breaks, I recommend removing "early" from line 28 in the Abstract.

      Done

      Introduction (line 93): I think "biochemical studies" is a stretch here - I recommend "interaction studies".

      Done

      Results: (lines 160-161): mutations are not required for breaks. Line 172, there is a problem with the sentence.

      Corrected

      Reviewer #2 (Recommendations For The Authors):

      Major comments:

      (1) Figure 1B- The signal for XND-1 seems to appear both in the control and him-17::HA IP. Do the authors have tested the specificity of the XND-1 antibody?

      We included a supplementary figure demonstrating the specificity of the XND-1 antibody by Western blot. This was also previously published (Wagner et al., 2010)

      (2) Figure 1D - can the authors provide an explanation why the him-5p::him-5 transgene that drives a higher expression than pie-1p::him-5 fails to suppress the Him phenotype seen in him-17? What are the HIM-5 levels like in these two strains compared to N2 and him-17 null mutants? Can this information provide explanation for the differential effect of the him-5 transgenes?

      We previously reported that him-5p::him-5 drives higher expression than pie-1p::him-5 (McClendon et al, 2016).

      The reason that him-5p::him-5 does not rescue, despite higher wild type expression is that HIM-17 directly regulates expression of him-5. Since HIM-17 does not regulate the pie-1 promoter, the pie-1p::him-5 construct can at least partially suppress the him-17 mutation.

      We have (hopefully) explained this better in the text.  

      (3) Line 102- the subheading "HIM-5 is the essential factor for meiotic breaks in the Xchromosome" may not be appropriate for this section. This is what has previously been known. However, the results in Figure 1 demonstrate that a him-5 transgene can partially rescue the him-17 and ¬xnd-1 phenotype, but not that it is essential for meiotic DSB formation on X chromosomes.

      We think some of the concern here is sematic and have changed the phraseology to say that HIM-5 is SUFFICIENT for DSBs on the X… which had not previously been shown.

      Vis-à-vis the X chromosome, in all genetic backgrounds examined, the absence of HIM-5 consistently results in a complete lack of DSBs on the X. For instance, in dsb-2 mutants— where HIM-5 is still expressed—DSBs are reduced genome-wide, but the X chromosome occasionally retains breaks. In contrast, even a weak allele of him-17 results specifically in the loss of X chromosome breaks, underscoring a unique requirement for HIM-5 in promoting DSBs on the X. While Figure 1 shows that a him-5 transgene can partially rescue him-17 and xnd-1 phenotypes, the consistent observation that X breaks are absent without HIM-5 supports its classification as sufficient for DSB formation on the X chromosome.

      (4) Figure 1E - please consider enlarging the images and showing multiple examples.

      Done.

      I also suggest that the authors perform a more rigorous analysis to support the conclusion that XND-1 and HIM-17 localize away from the axis by quantifying multiple images and doing line-scan analysis.

      Provided. New images are provided in both, the main and supplemental figures, and quantification is included. There is no detectable overlap of the two protein with one another or the DNA axes (see quantification of overlap in Fig. 1).

      (5) Line 162 - This is the first mention of DSB-1, DSB-2, and DSB-3 in the paper. DSB-1 and DSB-2 are Rec114 homologs in C. elegans (Tesse et al., 2017), while DSB-3 is a homolog of Mei4 (Hinman et al., 2021). These proteins should be properly introduced in the introduction with appropriate citations.

      Done. We appreciate the reviewer pointing out that this was the first reference to these genes.

      (6) Line 169 - the rationale for this experiment is unclear. Why did the Y2H interaction between HIM-5 and DSB-1 prompt the authors to test the rescue of dsb-1 or dsb-2 phenotypes by the ectopic expression of him-5? Do the authors have evidence that HIM-5 level is reduced in dsb-1 or dsb-2 mutants?

      We have reorganized this section to better explain the motivation for looking at these interactions. We did see a difference in the localization in HIM-5 in the dsb-1 mutant animals and we did have a sense that HIM-5 was critical for breaks on the X. We reasoned that it could have independent functions in promoting breaks that were not yet appreciated so wanted to do this experiment.

      (7) Line 172 - "very slightly reduced". This claim requires statistical analysis.

      We added statistical analysis, but we also removed this claim.

      (8) Figures 2C and 2D - Can the authors provide an explanation why the pie-1p::him-5 transgene fails to suppress the phenotypes in dsb-1, while the him-5p::him-5 trasgene can? Again, the rationale for these experiments is unclear. Because of this, the interpretation is also unclear.

      The difference in rescue between the pie-1p::him-5 and him-5p::him-5 transgenes likely reflects differences in expression levels. As previously shown (McClendon et al., 2016), the him-5p::him-5 construct results in significantly higher expression of HIM-5 protein compared to pie-1p::him-5. This elevated expression likely explains its ability to partially rescue the dsb-1 phenotype. In contrast, the lower expression driven by the pie-1 promoter is insufficient to compensate for the absence of dsb-1 function. We have clarified the rationale and interpretation of these experiments in the revised manuscript to better reflect this point.

      (9) Lines 184-185 - the data for endogenously tagged HIM-5::3xHA are not shown anywhere in the paper. This must be shown.

      We have added this in the supplemental figures.

      (10) Figure 2D and 2E - what does the localization of pie-1p::him-5::GFP (eaIs15) and him5p::him-5::GFP (eaIs4) look like in wild-type and dsb-1 mutants? Are the cytoplasmic aggregates caused by increased levels of HIM-5 expression? Can the differential behavior of him-5 transgenes provide explanation for differential rescues?

      We now show both live and fixed images of Phim-5::him-5::gfp transgenes, as well as the localization of the endogenously HA-tagged HIM-5 locus (Figure 2 and S3). In all cases, the protein is initially nuclear and then absent from meiotic nuclei with similar timing. The Ppie1::him-5 transgene was very difficult to image due to low expression (even in wild type) so it not shown here. We presume it is the slightly elevated level of expression of the Phim5::him-5::gfp that can explain the differential rescue.

      (11) Lines 221-222, where are the results shown? Please refer to Figure S3.

      Done

      (12) Figure S3 - these need statistical analyses.

      Done

      (13) Lines 230-231 - what about the rec-1; parg-1; cep-1 triple mutant?

      This is an excellent suggestion and not one we have not yet pursued. Given the lack of strong phenotypes in all combination of double mutants, we prioritized other experiments . However, we agree that examining the rec-1; parg-1; cep-1 triple mutant would provide a valuable test of whether these factors act in the same pathway, and we appreciate the reviewer highlighting this potential future direction.

      (14) Line 298 - I suggest the authors take a look at the Alphafold prediction of DSB-1/DSB-2/DSB-3 and the comparison to human and budding yeast Rec114/Mei4 complex in Guo et al., 2022 eLife, which could provide insights into the Y2H results.

      We thank the reviewer for these comments and have indeed used these interactions and predicted homologies to zero in a region of interaction between these proteins that resembles what is seen in humans and yeast with a dimer of REC114 like proteins wraps stabilizing a central Mei4 helix . This is now shown in Figure 3H, I. Satisfyingly, this modeling predicts that a trimer comprised of 2 DSB-1 proteins with DSB-3 is more stable than a DSB1-DSB-2-DSB-3 trimer. This might explain why DSB-2 is not required in young adults and only becomes essential as DSB-1 levels drop in older animals (Rosu et al., 2013)

      (15) Can the authors introduce mutations within the DSB-1 interfaces that disrupt the interaction to either SPO-11 or DSB-2?

      We have begun to address this question by introducing targeted mutations within DSB-1. As shown in Figure 3E and 3F, mutations in the C-terminal region of DSB-1—which includes a core of four α-helices—disrupt its interaction with DSB-2 and DSB-3, but not with SPO-11. These findings suggest that the C-terminus mediates interactions specifically with DSB2 and DSB-3

      (16) Line 323 - The him-5 phenotypes are too weak to support the idea that it serves as the linchpin for the whole DSB complex. Do the authors have an explanation for why him-5 mutants exhibit X-chromosome-specific DSB defects?

      In response to the reviewer, above, and in the text, we have included a more detailed explanation of why we think HIM-5 has a key role in coordinating meiotic break formation. Although, identified for its role on the X, the phenotypes associated with DSB formation in the mutant are really quite pleiotropic and severe.

      (17) Line 436 - C. elegans lacks DSB hotspots.

      Removed

      Minor comments:

      (1) Figure 1A - please show the raw data for the yeast two-hybrid.

      We show representative yeast colonies in Figure S3.

      (2) It looks like the labeling for Figure 1B and 1C are switched.

      Fixed.

      (3) Figure 1B - what does the red box indicate? Please explain it in the legend.

      It indicates the XND-1 band. We added that information in the legend.

      (4) Figure 1C - in the legend, it was noted that the results are from GFP pulldowns of HIM17::GFP. However, the method for Figure 1B and the method section noted that HIM-17 was tagged with 3xHA, and the pull-down was performed using anti-HA affinity matrix. Please reconcile this discrepancy.

      That’s because they were done in two different sets of experiments. For the IPs we used a HIM-17::HA strain and for the MS, a HIM-17::GFP strain.

      (5) Also in Figure 1C - please call Table S2 in the main text when discussing the mass spec results. Also, it is not clear what HIM-17 and GFP indicate in the table. What makes CKU80 different from the other proteins listed under GFP? Please explain more clearly in the legend.

      We have move the table to supplemental data where we have included all of the peptide counts and gene coverage. We have included in the revised method rationale for inclusion in this table which explains why CKU-80 differs.

      (6) Line 527 - it is unclear what experiment was done for HIM-17. Please revise it to indicate that this is for "HIM-17 immunoprecipitation". Also please indicate the strain used for HIM17 pull-down (AV280?).

      (7) Line 113- please be specific about how the HIM-17 IP was performed. Which epitope and strains are used for pull-downs?

      This indeed was AV280. This has been added to the text and methods.

      (8) Figure 1D- What does ND mean? In the text, it was stated that there was only a minor suppression of hatching rates. The hatching rate for him-5p::him-5; him-17 must have been measured, and the data must be presented.

      ND does mean not determined. We have removed the statement about “minor suppression”. We only tested the overall population dynamics in the Phim-5::him-5;him17(ok424) and the DAPI body counts. The failure to suppress the latter suggests there would be no enect on hatching rates, although we did not test this directly. Since we had done this for the Ppie-1::him-5;him-17 strain, we provided this information to further support the claims of genetic rescue by ectopic expression.

      (9) Line 151 - please specify that STED was used.

      We have removed the STED images, and just show the confocal images with Lightning Processing.

      (10) Figure 1E- the authors suggested that HIM-17 and XND-1 mainly localize to autosomes but not the X chromosome. However, there is not enough evidence that the chromosome excluded from HIM-17 staining is indeed an X chromosome.

      (11) Figure 1E (Line 154) - what are the active chromatin markers examined? Where are the data?

      We have previously shown that the chromosome lacking XND-1 staining is the X (Wagner et al., 2010). The X is heterochromatic and chromatin marks associated with active transcription, including H3K4me3 and HTZ-1 (a variant H2A), preferentially localize to autosomes, effectively anti-marking the X chromosome. As shown in the new Figure 1E, a single chromosome has very little XND-1 and HIM-17 associated proteins. This is the X chromosome.

      (12) Line 172 - It should be a comma instead of the period after "In dsb-1 mutants".

      Fixed

      (13) Figure S3H-K - I suggest the authors indicate the alleles of mre-11 (null vs. iow1) on the graph, similarly to him-5(e1490) to avoid confusion.

      Done

      (14) Lines 294 and 600 - Guo et al. 2022 is now published in eLife. The authors must cite the published paper, not the preprint.

      Fixed

      (15) Line 407 - the reference Carelli et al., 2022 is missing.

      Added

      (16) Line 766 - please remove "is" before nuclear.

      Done

      Reviewer #3 (Recommendations For The Authors):

      Major issues:

      In my view, the most interesting mechanistic finding in the paper is the evidence that HIM-5 may not bind to chromatin in the absence of DSB-1. If validated, this would suggest that HIM-5 is likely to be directly involved in a process that promotes break formation, in contrast to factors such as HIM-17 and XND-1. It does not, however, support the idea that HIM-5 is at the top of a hierarchy of DSB factors, as it is interpreted here. More importantly, the data supporting this claim are unconvincing; only a single image of an unfixed gonad from an animal expressing HIM-5::GFP is shown. Immunofluorescence should be performed and the results must be quantified.

      We have provided additional images of the HIM-5 relocalization to show that we observed this in both fixed and live worms with two different tagged strains. The exclusion from the nucleus is seen in all scenarios. Whether the protein now accumulates exclusively in the cytoplasm/ is destabilized is challenging to address with the fixed images due to the arbitrariness of defining “background” staining.

      More generally, this type of analysis, looking at the interdependence of different factors for their association with chromosomes, is much more informative than the genetic interaction data presented in the paper, which does not seem to provide any mechanistic insights into the functions of the factors analyzed. The paper could potentially be greatly improved through a more extensive, systematic analysis of the interdependence of DSBpromoting factors for their localization to chromosomes.

      We have at least added this for XND-1 and HIM-17 and show they are not interdependent for chromosome association. We also provide for the first time data on the localization of HIM-5 in the dsb-1 mutant. Many of the other interactions have already been shown in the literature and/or were not warranted base on the lack of genetic interaction we present here.

      Minor issues:

      The title is vague and inconclusive. A more concrete title summarizing the major findings would help readers to assess whether the work is of interest.

      We have discussed the title extensively with all authors and all would like to keep the current title.

      The authors claim that the expression of HIM-5 from a different promoter (Ppie-1::him-5) but not its endogenous promoter (Phim-5::him-5) can partially rescue the DSB defect in him-17 mutants. To support this claim, they should really quantify the germline expression of HIM-5 in wild-type, him-17, him-17; Ppie-1::him-5, and Phim-5::him-5; him-17.

      We had previously reported the expression in the N2 background of both transgenes (McClendon et al., 2016)

      Panel O appears to be missing from Figure S3.

      Fixed

      The evidence for chromosome fusions in cep-1; mre-11 mutants shown in S4D is not convincing and the claim should be removed unless stronger evidence can be obtained.

      A clearer image has been added

      The basis of the following statement is unclear: "Furthermore, rec-1;him-5 double mutants give an age-dependent severe loss of DSBs (like dsb-2 mutants) suggesting that the ancestral function of the protein may have a more profound effect on break formation." The manuscript does not seem to include data regarding age-dependent loss of DSBs and no other publication is cited to support this claim. The interpretation is also perplexing; I think that it may be predicated on the idea that REC-1 and HIM-5 are paralogs, but as stated above, this claim is not well supported and is likely specious.

      We have added the reference. This was shown in Chung et al., 2013 – the paper that presented the cloning of the rec-1 locus.

  2. Sep 2025
    1. eLife Assessment

      The study provides valuable insights into the role of thalamic nuclei in associative threat and extinction learning, supported by a large dataset and multipronged analyses. However, aspects of the evidence remain incomplete, particularly regarding the statistical methods, the claims of plasticity, and the network modeling framework. With this addressed, this manuscript will be of interest to those interested in learning and memory, fear, thalamic circuitry, and related mental heath conditions.

    2. Reviewer #1 (Public review):

      Summary:

      Badarnee and colleagues analyse fMRI data collected during an associative threat-learning task. They find evidence for parallel processes mediated by the mediodorsal, LGn, and pulvinar nuclei of the thalamus. The evidence for these conclusions is promising, but limited by a lack of clarity regarding the preprocessing and statistical methods.

      Strengths:

      The approach is inventive and novel, providing information about thalamocortical interactions that are scant in the current literature.

      Weaknesses:

      (1) There are not sufficient details present to allow for the direct interrogation of the methods used in the study.

      (2) The figures do not contain sufficiently granular details, making it challenging to determine whether the observed effects were robust to individual differences.

    3. Reviewer #2 (Public review):

      Summary:

      The authors quantify human fMRI BOLD responses in pulvinar and mediodorsal thalamic nuclei during a fear conditioning and extinction task across two days, in a large sample size (hundreds of participants). They show that the BOLD responses in these areas differentiate the conditioned (CS+) and safety (CS-) stimuli. Additionally, this changes with repeated trials, which could be a neural correlate of fear learning. They show that the anterior pulvinar is most correlated with the MD, and that this is not due to anatomical proximity. They perform graph analysis on the pulvinar subnuclei, which suggests that the medial pulvinar is a hub between the sensory (lateral/inferior) and associative (anterior) pulvinar. They show different patterns of thalamic activity across conditioning, extinction, recall, and renewal.

      Strengths:

      The data has a large sample size (n=293 in some measures, n=412 in others). This is a validated human fear conditioning/extinction task that Dr Milad's group has been working with for several years. Few labs have investigated the thalamus activity during fear conditioning and extinction, particularly with a large sample size. There is an independent replication of the pulvinar network structure (Figure 3), which suggests that the processing in the more sensory-related inferior and lateral pulvinar is relayed to the anterior pulvinar (and possibly thereby to more action-related prefrontal areas) via an intermediate step in the medial pulvinar - potentially a novel discovery, but that needs more validation.

      Weaknesses:

      (1) The authors cannot make causal claims about their results based on correlational neuroimaging evidence. Causal claims should be pared back. E.g., sentence 1 in the Results section: "The anterior pulvinar and MD contribute to early associative threat learning, as evidenced by increased functional activation in response to CS+ compared to CS- at the block level (Fig. 1b-c)." needs to be reworded to something like "The anterior pulvinar and MD have increased functional activation... This suggests that these areas may contribute to early associate threat learning."

      (2) Figure 1: The fact that the difference in BOLD activity between CS+ and CS- goes away on the third trial is not addressed. This is a very large effect in the data.

      (3) Figure 3: Could the observed network structure be due to anatomical proximity? Perhaps the authors should do an analogous analysis to what they did in Figure 2 for this intra-pulvinar analysis. This analysis doesn't take into account the indirect connections through corticothalamic and thalamocortical connections with the visual cortex and the pulvinar. There is an implicit assumption that there are interconnections between the pulvinar subnuclei, but there are few strong excitatory projections between these subnuclei to my knowledge. If visual areas are included in the graph, it would make things more complex, but would probably dramatically change the story. In this way, the message is somewhat constructed or arbitrary.

      (3) In the results section describing Figures 4-7, there are no statistics supporting the claims made. There needs to be a set of graphs comparing the results across the study sessions and days, with statistical comparisons between the different experiments to confirm differences.

      (4) Figure 7 does not include the major corticothalamic and thalamocortical projections from early, mid-level, and higher visual cortex to the different pulvinar nuclei. I doubt that there are strong direct projections between the pulvinar nuclei; rather, the functional connections are probably mediated through interconnections with cortical visual areas.

      (5) Stylistic: There are a lot of hypotheses and interpretations presented in this primary literature paper, which may be better suited for a review or perspective piece.

      (6) In the discussion, there is an assumption that the fMRI BOLD responses to CS+ and CS- need to be different to indicate that an area is processing these distinctly, but the BOLD signal can only detect large-scale changes in overall activity. It's easy to imagine that an area could be involved in processing these two stimuli distinctly without showing an overall difference in the gross amount of activity.

      (7) There is strong evidence that the BOLD responses to the threat-related and safety-related stimuli are different, modest evidence for their claims of learning/plasticity in these pathways, and circumstantial evidence supporting their hypothesized graph network models. Overall, most of the claims made in the discussion are better considered possible interpretations rather than proven findings - this is not a criticism, as these experiments and subject matter are extremely complex.

      This study continues to validate the power and utility of this in human fear conditioning/extinction paradigm, and extends this paradigm to investigating fear learning beyond the traditional limbic system pathways. It's possible that their models for the pulvinar nuclei interconnections could guide future neuromodulation or DBS studies that could provide more causal evidence for their hypotheses.

    4. Reviewer #3 (Public review):

      Summary:

      The present work was aimed at investigating the specific contributions of thalamic nuclei to associative threat learning and extinction. Using fMRI, they examined activation patterns across pulvinar divisions, the lateral geniculate nucleus (LGN), and the mediodorsal thalamus (MD) during threat acquisition, extinction, and recall. Their goal was to uncover whether distinct thalamic systems support different modes of learning-automatic survival mechanisms versus more deliberate processes - and to propose a hierarchical pulvinar model of fear conditioning. They also try to refine current neuroanatomical models of threat learning and memory, highlighting the role of thalamic nuclei in it.

      Strengths:

      (1) Valuable theoretical elaboration and modeling regarding the differential role of pulvinar subdivisions on feedforward (inferior, lateral) and higher-order integration (anterior), and their functional interplay with other relevant subcortical and cortical structures in associative threat and extinction learning.

      (2) Large sample sizes and multipronged analytical approaches were used for hypothesis testing.

      (3) Exhaustive literature review in the field of associative threat, as well as regarding the role of thalamic nuclei and other brain structures in it.

      Weaknesses:

      (1) Several weaknesses should be pointed out regarding how fMRI data were collected, as well as decisions regarding how the fMRI data were preprocessed and analyzed:

      a) fMRI data have low resolution (3 cubic mm), which certainly limits the examination of small nuclei such as the ones investigated here, and especially the examination of the LGN and inferior pulvinar.

      b) fMRI was normalized to standard space. Analyzing the data in individual-subject space would have given you the options of avoiding altering every participant's brain and of using a probabilistic thalamic atlas that better adapts to each subject's brain and thalamic nuclei (see, for instance, Iglesias et al., 2018). This would have been ideal and would have given the authors more precision, especially considering the low resolution of the fMRI data and the size of the thalamic nuclei of interest.

      c) On top of the two previous points, the authors decided to smooth the data to 6mm, which means that every single voxel within these small nuclei was blurred/mixed with the 2 immediately contiguous voxels (if they followed the standard SPM12 normalization resampling default which resamples, or upsamples the data in this case, to 2 x 2 x 2mm). Given the strong changes in structural connectivity and function that can occur, especially in the thalamus, on voxels of this size, this and the previous 2 decisions do not favor anatomical precision.

      d) Motion during scanning was poorly controlled in the preprocessing. Including the motion parameters as covariates of no interest in the GLM does not fully guarantee that motion is not influencing the results, and that motion is not differentially influencing some experimental conditions more than others.

      (2) It is not clearly indicated in the manuscript how many subjects and how many trials went into each of the analyses. It would be important to indicate this in the text and/or the figures.

      (3) It is not clear either, why, given the large sample size, some of the results were not conducted using reproducibility strategies such as dividing the sample into 2 or 3 groups or using further cross-validation strategies.

      (4) Limited testing of alternative hypotheses. The results clearly seem to be a selection of the findings supporting the hypotheses that the authors sought to confirm. (just one example: in the analysis reported in Figures 1-2; are there other correlations between the activation of the anterior pulvinar and MD with other pulvinar nuclei? only the MD-anterior Puv is reported).

      (5) The manuscript does not contain a limitations subsection. Practically every study has limitations, and this one is not an exception. Better to tell the limitations to the readers upfront so they can factor them into their evaluation of the relevance of the manuscript and reported evidence.

      (6) Data should be made available to the scientific community. Code too. Even if you just used standard fMRI toolboxes, any code used to run analyses will be helpful to the community, or if someone decides to try to replicate your findings.

      Despite these weaknesses and what can be derived from them, this manuscript constitutes a valuable contribution to the field to start characterizing and conceptualizing the involvement of thalamic nuclei and their interactions with other brain regions in the associative threat learning circuitries. It also paves the road for further testing of the functional dynamics among these regions and circuitries, and modeling testing.

    5. Author response:

      We thank the reviewers and editors for their thoughtful and constructive feedback. We have carefully considered the comments and plan to revise the manuscript as follows:

      · Methods: We will expand the Methods section to provide additional details regarding the Pavlovian fear conditioning procedure, including instructions, experimental parameters, and the randomization process.

      · Figures and Statistical Reporting: We will break down some figures where appropriate and clearly display the distributions of key variables. We will also include additional statistical details in the main text and elaborate on the analyses where needed.

      · Language and Interpretation: We will revise the text to consistently use correlational rather than causal terminology, ensuring that our conclusions accurately reflect the findings from the fMRI data.

      · Computational Model of the Pulvinar: We will further elaborate on the assumptions and limitations of the intra-pulvinar model, discuss potential neural pathways and candidate regions (e.g., visual cortex), and highlight directions for future work, including studies in nonhuman primates to investigate anatomical connectivity.

      · Alternative Hypotheses of the mediodorsal thalamus-anterior pulvinar relationships: Other pulvinar subregions were already included as covariates in our hierarchical regression analyses, allowing us to account for anatomical proximity and shared variance. We will make this analysis more explicit and clarify the thinking process behind this analysis to allow readers to assess the specificity of the anterior pulvinar-mediodorsal thalamus relationship.

      · Limitations: We will add a dedicated subsection outlining key limitations, including considerations specific to fMRI studies.

      · Data Availability: All data and materials used in this study will be made available upon request from the corresponding author, subject to obtaining the necessary institutional authorization for the data-sharing agreement.

      We are confident that these revisions will enhance the clarity, transparency, and interpretability of the work, and we are grateful to the reviewers for their valuable suggestions. We will provide a detailed, point-by-point response along with the revised submission as soon as possible.

    1. eLife Assessment

      In this manuscript, the authors used in vivo long-term tip recordings of the long trichoid sensilla of male hawkmoths to analyze spontaneous spiking activity indicative of the ORNs' endogenous membrane potential oscillations. The authors combine extracellular electrophysiology of the hawkmoth antennae with computational modeling to predict that Orco receptor neuron (ORN) activity is required for circadian, not ultradian, firing patterns. The work provides valuable support for the hypothesis that a posttranslational feedback loop regulates daily and ultradian rhythms in neuronal excitability. Nevertheless, the evidence reported provides only incomplete support for their conclusions, especially with regard to the biological implications of their assumption-heavy models.

    2. Joint Public Review:

      This manuscript puts forward the provocative idea that a posttranslational feedback loop regulates daily and ultradian rhythms in neuronal excitability. The authors used in vivo long-term tip recordings of the long trichoid sensilla of male hawkmoths to analyze spontaneous spiking activity indicative of the ORNs' endogenous membrane potential oscillations. This firing pattern was disrupted by pharmacological blockade of the Orco receptor. They then use these recordings together with computational modeling to predict that Orco receptor neuron (ORN) activity is required for circadian, not ultradian, firing patterns. Orco did not show a circadian expression pattern in a qPCR experiment, and its conductance was proposed to be regulated by cyclic nucleotide levels. This evidence led the authors to conclude that a post-translational feedback loop (PTFL) clockwork, associated with the ORN plasma membrane, allows for temporal control of pheromone detection via the generation of multi-scale endogenous membrane potential oscillations. The findings will interest researchers in neurophysiology, circadian rhythms, and sensory biology. However, the manuscript has limited experimental evidence to support its central hypothesis and is undermined by several questionable assumptions that underlie their data analysis and model builds, as well as insufficient biological data, including critical controls to validate and/or fully justify the model the authors are proposing.

      Strengths:

      The study is notable for its combination of long-term in vivo tip recordings with computational modeling, which is technically challenging and adds weight to the authors' claims. The link between Orco, cyclic nucleotides, and circadian regulation is potentially important for sensory neuroscience, and the modeling framework itself - a stochastic Hodgkin-Huxley formulation that explicitly incorporates channel noise - is a solid and forward-looking contribution. Together, these elements make the study conceptually bold and of clear interest to circadian and olfactory biologists.

      Major weaknesses:

      At the same time, several limitations temper the conclusions. The pharmacological evidence relies on a single antagonist and concentration, without key controls. The circadian analysis is based on relatively small numbers of neurons, with rhythms detected only in subsets, and the alignment procedure used in constant darkness raises concerns of bias. The molecular evidence is sparse, with only three qPCR timepoints, and the model, while creative, rests on assumptions that are not yet fully supported by in vivo data.

      Detailed comments are provided below:

      (1) The role for Orco proposed in the authors' model largely stems from the effects seen following the administration of (a single dose) of the Orco antagonist, OLC15. However, this hypothesis is undercut by the lack of adequate pharmacological controls, including a basic multipoint OLC15 dose-response series in addition to the administration of blockers for the other channels that are embedded in their model, but which were ruled out as being involved in the modulation of biological rhythms. In addition, these studies would (ideally) also benefit from the inclusion of the same concentration (series) of an inactive OLC15 analog to better control for off-target effects.

      (2) The expression pattern of Orco was assessed using qPCR at only three timepoints. Rhythmic transcripts can easily be missed with such sparse sampling (Hughes et al., 2017). A minimum of six evenly spaced timepoints across a 24-hour cycle would be required to confidently rule out circadian transcriptional regulation. In addition, the use of the timeless mRNA control from another study is not acceptable. Furthermore, qPCR analysis measures transcript abundance, not transcription, as the authors repeatedly state. Transcriptional studies would require nuclear run-off or, more recently, can be done with snRNAseq analysis. Taken together, these concerns undermine the authors' desire to rule out TTFL-based control that directly led them to implicate a PTTF-based model.

      (3) The modelling presented is based on Orco as a ZT-dependent conductance tied to the cAMP oscillations that were reported by this group in the cockroach and from the presence and functionality in Manduca of homomeric Orco complexes that are devoid of tuning ORs. While these complexes have been generated in cell culture and other heterologous expression systems, as well as presumably exist in vivo in the Drosophila empty neuron and other tuning OR mutants, there is no evidence that these complexes exist in wild-type Manduca ORNs. While this doesn't necessarily undermine every aspect of their models, the authors should note the presence of Orco/OR complexes rather than Orco homomeric complexes.

      (4) Some aspects of the authors' models, most notably the decision to phase align/optimize their DD and OLC15 recordings, are likely to bias their interpretations.

      (5) The tip recordings from long trichoid sensilla are critical aspects of this study. These recordings were carried out on upper sensillar tips located on the distal-most second annulus. Since there are approximately 80 annuli on the Manduca antennae, it is unclear whether the recordings are representative of the antennal response.

      (6) The authors do not provide any data in support of their cAMP/cGMP-based Orco gating, and the PTTF model proposed is somewhat disappointing. The model seems to be influenced by their long-held proposal that insect olfactory signaling has a critical metabotropic component involving cyclic nucleotides, PKC, etc, a view that may be influenced by the use of Orco homomeric complexes generated in HEK cells. Nevertheless, structural studies on Orco do not support a cyclic nucleotide binding site, although PKC-based phosphorylation has been implicated in the fine-tuning/adaptation of olfactory signaling.

      (7) Because only 5/11 LD and 7/10 DD animals showed daily rhythms, with averages lacking clear daily modulation, the methods are not sufficiently reliable enough to reveal novel underlying mechanisms of circadian rhythm generation. The reported results are therefore not yet reliable or quantifiable. To quantify their results, the authors should apply tests for circadian rhythmicity using methods such as RAIN, JTK CYCLE, MetaCycle, or Echo. The use of FFT and Wavelet is applauded, but these methods do not have tests of significance for rhythms and can be biased when analyzing data in which there could only be 1-3 circadian cycles. Because the conclusions appear to be based on 11-12 neurons that were recorded for 2-4 days, the reader is concerned that the methods are not yet perfected to provide strong evidence for circadian regulation of spontaneous firing of ORNs. The average data (e.g., Figure 3Bii and 3Cii) highlight the apparent lack of daily rhythms. In summary, the results would be more compelling if more than 50% of the recordings had significant circadian amplitudes and with similar periods and phases.

      (8) The statement that circadian patterns of ORN firing are lost with the Orco antagonist (OLC15) is not strongly supported. The manuscript should be revised to quantify how Orco changed circadian amplitude in the 12 recorded neurons. Measures of circadian amplitude can avoid confusing/vague statements like Line 394 "low and high frequency bands appeared to merge during the activity phase around ZT 0 in the animals that showed clear circadian rhythms (N = 5 of 11 in LD)". The conclusion that Orco blocks circadian firing appears to be contradicted by Figure 6, which indicates that ~6 of these neurons had circadian periods detected by wavelet. The manuscript would be strengthened with details about the specificity and reproducibility of the Orco antagonist. The authors quantify the gradual decrease in firing with the slope of a linear fit to estimate how the "effectiveness [of OLC15] increased over time." They conclude that the drug "obliterated circadian rhythms and attenuated the spontaneous activity in several, but not all experiments (N = 8 of 12)." The report would be greatly strengthened with corroborating data from additional Orco antagonists and additional doses of OLC15 (the authors use only 50 uM OLC15).

      (9) The manuscript includes several statements that are more speculation than conclusion. For example, there is no evidence for tuning or plasticity in this report. Statements like the following should be removed or addressed with experiments that show changes in odor response specificity or sensitivity: "ORN signalosomes are highly plastic endogenous PTFL clocks comprising receptors for circadian and ultradian Zeitgebers that allow to tune into internal physiological and external environmental rhythms as basis for active sensing." (Discussion Line 622). The paper concludes that (line 380) "mean frequency of spontaneous spiking and the frequency of bursting expressed daily modulation, and are both most likely controlled via a circadian clock that targets the leak channel Orco." This is too bold given the available results.

      (10) Because Orco conductance is modulated by cyclic nucleotides, it remains highly plausible that circadian regulation occurs upstream at the level of signaling pathways (e.g., calcium, calcium-binding proteins, GPCRs, cyclases, phosphodiesterases). The possibility that circadian oscillations of cyclic nucleotides are generated by the canonical TTFL mechanism has not been excluded. In fact, extensive work in Drosophila has demonstrated that the TTFL-based molecular clock proteins are required for circadian rhythms in olfaction.

      (11) A defining feature of circadian oscillators is the feedback mechanism that generates a time delay (e.g., PERIOD/TIMELESS repressing their own transcription). While the authors describe how cyclic nucleotides can regulate Orco conductance, they do not provide a convincing explanation of how Orco activity could, in turn, feed back into the proposed PTFL to sustain oscillations. For these reasons, the authors should consider:

      (a) Providing a broader discussion of non-TTFL models of circadian rhythms (e.g., redox cycles, post-translational modifications).

      (b) Reassessing Orco expression using a higher-resolution temporal sampling ({greater than or equal to}6 timepoints per 24 h).

      (c) Clarifying or revising the PTFL model to explicitly address how feedback would be achieved. Alternatively, the data may be more consistent with Orco conductance rhythms being regulated by post-translational mechanisms downstream of the canonical TTFL oscillator, as suggested by the Drosophila olfactory system literature.

      Minor weaknesses:

      (1) The authors should compare the firing patterns of ORN neurons to the bursts, clusters, and packets of retinal efferent spikes reported in Liu JS and Passaglia CL (2011; JBR). By comparing measures in moths to measures in Limulus, the authors might be able to address the question: Is the daily firing pattern of ORN neurons likely a conserved feature of circadian control of sensory sensitivity?

      (2) The methods need further details. For example, it is unclear if or how single neuron activity was discriminated and whether the results were compromised by the relatively large environmental fluctuations in temperature (21-27oC), humidity (35-60%), or other cues known to modulate spontaneous firing.

    3. Author response:

      Joint Public Review

      This manuscript puts forward the provocative idea that a posttranslational feedback loop regulates daily and ultradian rhythms in neuronal excitability. The authors used in vivo long-term tip recordings of the long trichoid sensilla of male hawkmoths to analyze spontaneous spiking activity indicative of the ORNs' endogenous membrane potential oscillations. This firing pattern was disrupted by pharmacological blockade of the Orco receptor. They then use these recordings together with computational modeling to predict that Orco receptor neuron (ORN) activity is required for circadian, not ultradian, firing patterns. Orco did not show a circadian expression pattern in a qPCR experiment, and its conductance was proposed to be regulated by cyclic nucleotide levels. This evidence led the authors to conclude that a post-translational feedback loop (PTFL) clockwork, associated with the ORN plasma membrane, allows for temporal control of pheromone detection via the generation of multi-scale endogenous membrane potential oscillations. The findings will interest researchers in neurophysiology, circadian rhythms, and sensory biology. However, the manuscript has limited experimental evidence to support its central hypothesis and is undermined by several questionable assumptions that underlie their data analysis and model builds, as well as insufficient biological data, including critical controls to validate and/or fully justify the model the authors are proposing.

      We thank the reviewers for their thorough and thoughtful comments and believe that the manuscript will be much stronger once we incorporate the requested changes.

      Please note that we used ORN as acronym for “olfactory receptor neuron” throughout the manuscript. ORNs contain odorant receptors (ORs), and in insects these ORs have to associate with the olfactory receptor co-receptor (Orco) in the cilium of the neuron to form functional OR-Orco complexes for odorant detection. Besides this chaperone function, Orco can form homomers with the potential to act as ionic pacemaker channels; a role which we explore in this study.

      Strengths:

      The study is notable for its combination of long-term in vivo tip recordings with computational modeling, which is technically challenging and adds weight to the authors' claims. The link between Orco, cyclic nucleotides, and circadian regulation is potentially important for sensory neuroscience, and the modeling framework itself - a stochastic Hodgkin-Huxley formulation that explicitly incorporates channel noise - is a solid and forward-looking contribution. Together, these elements make the study conceptually bold and of clear interest to circadian and olfactory biologists.

      Major weaknesses:

      At the same time, several limitations temper the conclusions. The pharmacological evidence relies on a single antagonist and concentration, without key controls. The circadian analysis is based on relatively small numbers of neurons, with rhythms detected only in subsets, and the alignment procedure used in constant darkness raises concerns of bias. The molecular evidence is sparse, with only three qPCR timepoints, and the model, while creative, rests on assumptions that are not yet fully supported by in vivo data.

      Please see our responses to the detailed comments.

      Detailed comments are provided below:

      (1) The role for Orco proposed in the authors' model largely stems from the effects seen following the administration of (a single dose) of the Orco antagonist, OLC15. However, this hypothesis is undercut by the lack of adequate pharmacological controls, including a basic multipoint OLC15 dose-response series in addition to the administration of blockers for the other channels that are embedded in their model, but which were ruled out as being involved in the modulation of biological rhythms. In addition, these studies would (ideally) also benefit from the inclusion of the same concentration (series) of an inactive OLC15 analog to better control for off-target effects.

      The Orco agonist VUAA1 (Jones et al., 2011) binds directly to Orco and increases the channel open time probability. In M. sexta hawkmoths, we have already published that VUAA 1 increases the low spontaneous activity of ORNs in a dose-dependent fashion (Nolte et al., 2016). Chen and Luetje (2012) systematically varied the chemical structure of VUAA1 to identify new Orco ligands and discovered 22 Orco Ligand Candidates (OLC) that either activated or inhibited Orco. In their heterologous expression system, Orco was most sensitive to inhibition by OLC15. Based on these results, we published a dose-response curve of OLC15 inhibition (1-100 µM) using in vivo tip recordings of pheromone-sensitive long trichoid sensilla of M. sexta (Nolte et al., 2016). In that study, we could also demonstrate that OLC15 antagonizes the VUAA1 activation of Orco.

      Furthermore, we tested other published Orco antagonists in in vivo assays in intact hawkmoths, focusing on amiloride-derived antagonists, because we previously identified an amiloride-sensitive cation channel in hawkmoth ORNs. We found that, in contrast to OLC15, the amilorides HMA and MIA were not Orco-specific but instead affected different targets depending on time-of-day (Nolte et al., 2016). Based on those experiments and the dose-response curves we determined that the Orco agonist VUAA1 (Jones et al., 2011) and the Orco antagonist OLC15 (Chen and Luetje, 2012) worked best in hawkmoth ORNs to target Orco pharmacologically. Based on comparative tests with other published Orco antagonists we settled since then in all further experiments on a dose of 50 µM OLC15.

      We will clarify the Methods section accordingly.

      (2) The expression pattern of Orco was assessed using qPCR at only three timepoints. Rhythmic transcripts can easily be missed with such sparse sampling (Hughes et al., 2017). A minimum of six evenly spaced timepoints across a 24-hour cycle would be required to confidently rule out circadian transcriptional regulation. In addition, the use of the timeless mRNA control from another study is not acceptable. Furthermore, qPCR analysis measures transcript abundance, not transcription, as the authors repeatedly state. Transcriptional studies would require nuclear run-off or, more recently, can be done with snRNAseq analysis. Taken together, these concerns undermine the authors' desire to rule out TTFL-based control that directly led them to implicate a PTTF-based model.

      We agree with the referees that more time points and a direct comparison between timeless and Orco mRNA levels should be included in this manuscript. We will include these additional qPCR experiments and edit the manuscript to make clear that we measure transcript abundance, but we will not perform snRNAseq analysis due to time- and financial constraints. We are currently working on the transcriptional control of Orco, both during ontogeny and throughout the day but this work in progress is beyond the scope of this manuscript.

      (3) The modelling presented is based on Orco as a ZT-dependent conductance tied to the cAMP oscillations that were reported by this group in the cockroach and from the presence and functionality in Manduca of homomeric Orco complexes that are devoid of tuning ORs. While these complexes have been generated in cell culture and other heterologous expression systems, as well as presumably exist in vivo in the Drosophila empty neuron and other tuning OR mutants, there is no evidence that these complexes exist in wild-type Manduca ORNs. While this doesn't necessarily undermine every aspect of their models, the authors should note the presence of Orco/OR complexes rather than Orco homomeric complexes.

      Our ELISAs found circadian oscillations in cAMP levels not only in antennae of the Madeira cockroach (Schendzielorz et al., 2014, 2012), but also in hawkmoth antennae (Schendzielorz et al., 2015). We will add the 2015 citation to the Modeling chapter in the Methods section to clarify this.

      We agree with the referees that we cannot distinguish between Orco homo- and heteromers in the different compartments of our hawkmoth ORNs. Thus, as the referee suggests, we will add text regarding the presence and localization of OR-Orco heteromers. However, we have indications that Orco homomers could indeed be present in the hawkmoth ORNs. In a heterologous expression system, MsexOrco expression alone was sufficient to increase intracellular Ca<sup>2+</sup> levels in response to VUAA1 application (Nolte et al., 2013). In differentiating primary cell cultures of hawkmoth antennae, Orco expression started during a developmental time window where ORNs did not yet express pheromone receptors, and Orco affected spontaneous activity (Nolte et al., 2016). Thus, Orco homomers are present in developing hawkmoth ORNs during a time window where ORNs already express spontaneous activity but cannot heteromerize with pheromone receptors. However, we do not know whether and in what ratio homo- and heteromers of Orco and ORs are present in the respective sensillum compartments of adult hawkmoths (Nolte et al., 2013; Stengl, 1994; Stengl and Hildebrand, 1990).

      We will clarify our manuscript accordingly.

      (4) Some aspects of the authors' models, most notably the decision to phase align/optimize their DD and OLC15 recordings, are likely to bias their interpretations.

      It is consensus that insects display daily and circadian rhythms in pheromone-dependent mating, odor-gated feeding, and egg-laying behavior that phase-locks to environmental rhythms, corresponding with daily/circadian rhythms of sensory neuron physiology (e.g., Merlin et al., 2007; Rymer et al., 2007; Schendzielorz et al., 2015, 2012). However, circadian rhythms can be easily masked by stress, like the disturbances during a very challenging long-term recording experiment over several days. In addition, we observed in our animal raising facility that in LD 17:7 light-dark cycles the originally nocturnal hawkmoths M. sexta distribute their activity patterns over the course of the day, finding nocturnal as well as diurnal hawkmoths. Thus, light-dark cycles were not enough to ensure phase-synchronized behavioral rhythms, and it is very likely that the nocturnal hawkmoths rely heavily on pheromone/odor dependent synchronization as also found in other moth species (Ghosh et al., 2024). Here, we used isolated males that were never exposed to the female pheromones so that their circadian activity patterns readily disperse. Therefore, it became necessary in free-running conditions to first determine the respective behavioral rhythm for each animal, and then to phase-align their activity patterns to allow for statistical analysis. Otherwise, circadian differences would average out in a free-running population. As requested by the referees in point (7), we will use additional tests for rhythmicity in each of our recordings and revise the manuscript accordingly.

      Assuming that hawkmoths need pheromone presence as additional Zeitgeber, we are currently working on a new set of experiments where we attempt to improve synchronization by exposure to LD cycles and pheromone before DD and OLC15 recordings. We will add these experiments to the manuscript.

      (5) The tip recordings from long trichoid sensilla are critical aspects of this study. These recordings were carried out on upper sensillar tips located on the distal-most second annulus. Since there are approximately 80 annuli on the Manduca antennae, it is unclear whether the recordings are representative of the antennal response.

      We think the reviewers might have misinterpreted our description of the recording site. In the Methods, we state that we clip off the 20 most distal annuli (leaving a stump of about 60 annuli) and insert the reference electrode into the flagellum up to the second annulus from the cut end, i.e., the recording site is located at 2/3 – 3/4 of the antenna length as seen from the head of the animal. We will make this more clear in the Methods section.

      In addition, our lab did show with antibody stainings against Orco that apparently all ORNs that innervate long and short trichoid sensilla along the whole flagellum express the same staining pattern (Nolte et al., 2016). Furthermore, our patch clamp recordings of primary cell cultures of whole male antennae found largely overlapping ion channel populations across ORNs. This would indicate that all ORNs, whether they express pheromone- or general odorant receptors, could potentially share the same Orco-dependent spontaneous activity rhythms. In our lab, different experimenters from different years that recorded from long trichoid sensilla on different annuli did not detect obvious differences in neither the spontaneous activity nor the pheromone responses (c.f., Dolzer et al., 2003; Gawalek and Stengl, 2018; Schneider et al., 2025). Thus, it is very likely that we are reporting a general encoding mechanism that is not locally restricted along the antennal flagellum.

      (5.1) The authors do not provide any data in support of their cAMP/cGMP-based Orco gating…

      There are publications supporting cyclic nucleotide gating of Orco in Drosophila, but only after previous phosphorylation via protein kinase C (PKC; review: (Wicher and Miazzi, 2021)). Since Orco is very conserved among insect species, it is likely that these PKC and cGMP/cAMP-dependent regulations are present in other insect species. We are currently running thorough tip-recording experiments on the regulation of Orco gating, which are beyond the scope of this manuscript. However, we will add a set of experiments to this manuscript that demonstrates cAMP gating of Orco.

      (5.2)… and the PTTF model proposed is somewhat disappointing.

      For a detailed introduction of our PTFL membrane clock hypothesis please see our opinion paper (Stengl and Schneider, 2024).

      (5.3) The model seems to be influenced by their long-held proposal that insect olfactory signaling has a critical metabotropic component involving cyclic nucleotides, PKC, etc, a view that may be influenced by the use of Orco homomeric complexes generated in HEK cells.

      Indeed, we propose a metabotropic pheromone-transduction cascade, which in moths and cockroaches is based on G-protein-mediated activation of phospholipase C but not on adenylyl cyclase activation. Our hypothesis is not influenced by HEK cell heterologous expression studies of Orco but is supported by our own work comparing in vivo tip recordings of intact hawkmoths with patch clamp experiments on hawkmoth primary cell cultures of olfactory receptor neurons, which are able to respond to their species-specific pheromones in vitro ((Schneider et al., 2025; Stengl, 2010; Stengl and Funk, 2013; Wicher and Miazzi, 2021). In addition, a multitude of publications by other laboratories with in vivo and in vitro studies using physiological, genetic, and immunocytochemical assays all support a metabotropic signal transduction cascade in insect olfaction (reviews: Stengl, 2010; Stengl and Funk, 2013; Wicher and Miazzi, 2021). In contrast, the hypothesis suggesting a solely ionotropic pheromone- and general odor-dependent transduction cascade for all insect species is based on very sparse experimental evidence, based primarily on heterologous expression studies such as HEK cells that lack the insect’s WT molecular surroundings, and thus, cannot predict OR-Orco function in vivo. Furthermore, the ionotropic hypothesis is heavily based upon the argument that an inverse 7TM receptor cannot couple to G-proteins, which lacks careful backup via biochemical and structural studies. In addition, the ionotropic hypothesis lacks support via carefully performed physiological in vivo studies in different insect species that paid attention to analysis of the distinct kinetic components of ORN´s odor/pheromone responses and that employ physiological concentrations and durations of odor/pheromone stimuli (please see our most recent publication by Schneider et al. (2025)).

      (5.4) Nevertheless, structural studies on Orco do not support a cyclic nucleotide binding site, although PKC-based phosphorylation has been implicated in the fine-tuning/adaptation of olfactory signaling.

      While structural studies did not find evidence for conserved known cyclic nucleotide binding sites on Orco, this does not exclude the presence of so far unknown binding sites, or via sites that fold out only after a specific sequence of previous phosphorylations of the many phosphorylation sites on Orco. Indeed, physiological studies in Drosophila presented evidence for cyclic nucleotide dependence of Orco after previous PKC-dependent phosphorylation (Getahun et al., 2013). Our ongoing in vivo experiments in hawkmoths further corroborate a PKC- and cAMP-dependent modulation of Orco. These studies will be published in a follow-up publication.

      (6) Because only 5/11 LD and 7/10 DD animals showed daily rhythms, with averages lacking clear daily modulation, the methods are not sufficiently reliable enough to reveal novel underlying mechanisms of circadian rhythm generation. The reported results are therefore not yet reliable or quantifiable. To quantify their results, the authors should apply tests for circadian rhythmicity using methods such as RAIN, JTK CYCLE, MetaCycle, or Echo. The use of FFT and Wavelet is applauded, but these methods do not have tests of significance for rhythms and can be biased when analyzing data in which there could only be 1-3 circadian cycles. Because the conclusions appear to be based on 11-12 neurons that were recorded for 2-4 days, the reader is concerned that the methods are not yet perfected to provide strong evidence for circadian regulation of spontaneous firing of ORNs. The average data (e.g., Figure 3Bii and 3Cii) highlight the apparent lack of daily rhythms. In summary, the results would be more compelling if more than 50% of the recordings had significant circadian amplitudes and with similar periods and phases.

      The long-term tip-recordings of intact hawkmoths are very challenging and take a very long time to accomplish, thus, we are very happy that we succeeded in obtaining so many of them (N=34). Since 5/11 LD recordings and 7/10 DD recordings revealed daily/circadian rhythmicity and since many other physiological recordings at different ZTs of different members of our laboratory all revealed ZT-dependent pheromone-transduction we can be certain that the physiology of hawkmoth antennae is under strict circadian control. Please see also our response to (4) above commenting the phase-dispersal of activity rhythms observed in our experiments, as well as in the behavior of hawkmoth males in the mating cage.

      Nevertheless, we will follow the advice of the referees to apply additional tests for significance of rhythms in spontaneous activity, and we are thankful for the tests suggested that we were not aware of.

      (7) The statement that circadian patterns of ORN firing are lost with the Orco antagonist (OLC15) is not strongly supported. The manuscript should be revised to quantify how Orco changed circadian amplitude in the 12 recorded neurons. Measures of circadian amplitude can avoid confusing/vague statements like Line 394 “low and high frequency bands appeared to merge during the activity phase around ZT 0 in the animals that showed clear circadian rhythms (N = 5 of 11 in LD)”. The conclusion that Orco blocks circadian firing appears to be contradicted by Figure 6, which indicates that ~6 of these neurons had circadian periods detected by wavelet. The manuscript would be strengthened with details about the specificity and reproducibility of the Orco antagonist. The authors quantify the gradual decrease in firing with the slope of a linear fit to estimate how the “effectiveness [of OLC15] increased over time.” They conclude that the drug “obliterated circadian rhythms and attenuated the spontaneous activity in several, but not all experiments (N = 8 of 12).” The report would be greatly strengthened with corroborating data from additional Orco antagonists and additional doses of OLC15 (the authors use only 50 uM OLC15).

      We will revise our data analysis, according to the valuable suggestions of the referees.

      However, based upon our previous studies with other Orco antagonists and different doses of OLC15 (Nolte et al., 2016) we found that 50 µM OLC15 is the best Orco antagonist dose in M. sexta to target Orco-dependent modulation of spontaneous action potential activity of hawkmoth olfactory receptor neurons. Please see also our response to (1).

      (8) The manuscript includes several statements that are more speculation than conclusion. For example, there is no evidence for tuning or plasticity in this report. Statements like the following should be removed or addressed with experiments that show changes in odor response specificity or sensitivity: "ORN signalosomes are highly plastic endogenous PTFL clocks comprising receptors for circadian and ultradian Zeitgebers that allow to tune into internal physiological and external environmental rhythms as basis for active sensing." (Discussion Line 622). The paper concludes that (line 380) "mean frequency of spontaneous spiking and the frequency of bursting expressed daily modulation, and are both most likely controlled via a circadian clock that targets the leak channel Orco." This is too bold given the available results.

      We will revise the discussion accordingly and clarify which statements are supported via published evidence and which are predictions based upon our novel hypothesis published in our opinion paper (Stengl and Schneider, 2024).

      (9.1) Because Orco conductance is modulated by cyclic nucleotides, it remains highly plausible that circadian regulation occurs upstream at the level of signaling pathways (e.g., calcium, calcium-binding proteins, GPCRs, cyclases, phosphodiesterases).

      We agree with the referees that it is very likely that there are multiple layers of interconnected feedback cycles that control Orco localization and activity. Our novel hypothesis suggests interlocked TTFL and PTFL control of physiological circadian rhythms, not strictly hierarchical TTFL control, which would require a daily turnover of membrane proteins and transcriptional control via the established TTFL clock in insect ORNs. We currently search for TTFL control at all levels of odor/pheromone transduction using ZT-dependent transcriptomics in combination with qPCR and single nuclear transcriptomics, involving also all the molecules suggested by the referees. These studies are ongoing, are very time- and money-consuming, and are beyond the scope of this manuscript.

      (9.2) The possibility that circadian oscillations of cyclic nucleotides are generated by the canonical TTFL mechanism has not been excluded. In fact, extensive work in Drosophila has demonstrated that the TTFL-based molecular clock proteins are required for circadian rhythms in olfaction.

      Our experiments that test circadian TTFL control at different levels of the cAMP transduction cascade in hawkmoth antennae are on the way and are part of another publication. We will revise our discussion accordingly.

      The experiments published for TTFL dependent control of Drosophila olfaction that we are aware of (Krishnan et al., 1999; Tanoue et al., 2004) do not exclude interlinked PTFL and TTFL clocks. Krishnan et al. (1999) demonstrate that the TTFL clock in antennal olfactory receptor neurons correlates with circadian rhythms in odor responses measured in electroantennogram (EAG) recordings, not in single sensillum recordings as in our experiments. EAG recordings comprise not only voltage responses of the olfactory sensory neurons but also voltage changes generated in non-neuronal antennal cells such as trichogen and tormogen cells that built the transepithelial potential gradient via vATPases that generates the high K<sup>+</sup> concentration in the sensillum lymph (Jain et al., 2024; Klein, 1992; Thurm and Küppers, 1980). In addition, EAG recordings most likely contain responses of afferent neurons originating from somata in the brain that maintain central control of the antennae. Thus, EAG recordings are difficult to interpret.

      (11) A defining feature of circadian oscillators is the feedback mechanism that generates a time delay (e.g., PERIOD/TIMELESS repressing their own transcription). While the authors describe how cyclic nucleotides can regulate Orco conductance, they do not provide a convincing explanation of how Orco activity could, in turn, feed back into the proposed PTFL to sustain oscillations. For these reasons, the authors should consider:

      a) Providing a broader discussion of non-TTFL models of circadian rhythms (e.g., redox cycles, post-translational modifications).

      We will revise the discussion accordingly.

      b) Reassessing Orco expression using a higher-resolution temporal sampling ({greater than or equal to}6 timepoints per 24 h).

      We will add those experiments to the revised version of the manuscript (see our response to (2)).

      c) Clarifying or revising the PTFL model to explicitly address how feedback would be achieved. Alternatively, the data may be more consistent with Orco conductance rhythms being regulated by post-translational mechanisms downstream of the canonical TTFL oscillator, as suggested by the Drosophila olfactory system literature.

      We will revise the manuscript accordingly.

      Minor weaknesses:

      (1) The authors should compare the firing patterns of ORN neurons to the bursts, clusters, and packets of retinal efferent spikes reported in Liu JS and Passaglia CL (2011; JBR). By comparing measures in moths to measures in Limulus, the authors might be able to address the question: Is the daily firing pattern of ORN neurons likely a conserved feature of circadian control of sensory sensitivity?

      We will revise the discussion accordingly.

      (2) The methods need further details. For example, it is unclear if or how single neuron activity was discriminated and whether the results were compromised by the relatively large environmental fluctuations in temperature (21-27oC), humidity (35-60%), or other cues known to modulate spontaneous firing.

      We will clarify the Methods section.

      References

      Chen S, Luetje CW. 2012. Identification of New Agonists and Antagonists of the Insect Odorant Receptor Co-Receptor Subunit. PLOS ONE 7:e36784. doi:10.1371/journal.pone.0036784

      Dolzer J, Fischer K, Stengl M. 2003. Adaptation in pheromone-sensitive trichoid sensilla of the hawkmoth Manduca sexta. J Exp Biol 206:1575–1588. doi:10.1242/jeb.00302

      Gawalek P, Stengl M. 2018. The Diacylglycerol Analogs OAG and DOG Differentially Affect Primary Events of Pheromone Transduction in the Hawkmoth Manduca sexta in a Zeitgebertime-Dependent Manner Apparently Targeting TRP Channels. Front Cell Neurosci 12:218. doi:10.3389/fncel.2018.00218

      Getahun MN, Olsson SB, Lavista-Llanos S, Hansson BS, Wicher D. 2013. Insect Odorant Response Sensitivity Is Tuned by Metabotropically Autoregulated Olfactory Receptors. PLOS ONE 8:e58889. doi:10.1371/journal.pone.0058889

      Ghosh S, Suray C, Bozzolan F, Palazzo A, Monsempès C, Lecouvreur F, Chatterjee A. 2024. Pheromone-mediated command from the female to male clock induces and synchronizes circadian rhythms of the moth Spodoptera littoralis. Curr Biol 34:1414-1425.e5. doi:10.1016/j.cub.2024.02.042

      Jain K, Prelic S, Hansson BS, Wicher D. 2024. Expression of Drosophila melanogaster V-ATPases in Olfactory Sensillum Support Cells. Insects 15:1016. doi:10.3390/insects15121016

      Jones PL, Pask GM, Rinker DC, Zwiebel LJ. 2011. Functional agonism of insect odorant receptor ion channels. Proc Natl Acad Sci 108:8821–8825. doi:10.1073/pnas.1102425108

      Klein U. 1992. The insect V-ATPase, a plasma membrane proton pump energizing secondary active transport: immunological evidence for the occurrence of a V-ATPase in insect ion-transporting epithelia. J Exp Biol 172:345–354. doi:10.1242/jeb.172.1.345

      Krishnan B, Dryer SE, Hardin PE. 1999. Circadian rhythms in olfactory responses of Drosophila melanogaster. Nature 400:375–378. doi:10.1038/22566

      Merlin C, Lucas P, Rochat D, François M-C, Maïbèche-Coisne M, Jacquin-Joly E. 2007. An Antennal Circadian Clock and Circadian Rhythms in Peripheral Pheromone Reception in the Moth Spodoptera littoralis. J Biol Rhythms 22:502–514. doi:10.1177/0748730407307737

      Nolte A, Funk NW, Mukunda L, Gawalek P, Werckenthin A, Hansson BS, Wicher D, Stengl M. 2013. In situ Tip-Recordings Found No Evidence for an Orco-Based Ionotropic Mechanism of Pheromone-Transduction in Manduca sexta. PLOS ONE 8:e62648. doi:10.1371/journal.pone.0062648

      Nolte A, Gawalek P, Koerte S, Wei H, Schumann R, Werckenthin A, Krieger J, Stengl M. 2016. No Evidence for Ionotropic Pheromone Transduction in the Hawkmoth Manduca sexta. PLOS ONE 11:e0166060. doi:10.1371/journal.pone.0166060

      Rymer J, Bauernfeind AL, Brown S, Page TL. 2007. Circadian rhythms in the mating behavior of the cockroach, Leucophaea maderae. J Biol Rhythms 22:43–57. doi:10.1177/0748730406295462

      Schendzielorz J, Schendzielorz T, Arendt A, Stengl M. 2014. Bimodal Oscillations of Cyclic Nucleotide Concentrations in the Circadian System of the Madeira Cockroach Rhyparobia maderae. J Biol Rhythms 29:318–331. doi:10.1177/0748730414546133

      Schendzielorz T, Peters W, Boekhoff I, Stengl M. 2012. Time of Day Changes in Cyclic Nucleotides Are Modified via Octopamine and Pheromone in Antennae of the Madeira Cockroach. J Biol Rhythms 27:388–397. doi:10.1177/0748730412456265

      Schendzielorz T, Schirmer K, Stolte P, Stengl M. 2015. Octopamine Regulates Antennal Sensory Neurons via Daytime-Dependent Changes in cAMP and IP3 Levels in the Hawkmoth Manduca sexta. PLOS ONE 10:e0121230. doi:10.1371/journal.pone.0121230

      Schneider AC, Schröder K, Chang Y, Nolte A, Gawalek P, Stengl M. 2025. Hawkmoth Pheromone Transduction Involves G-Protein–Dependent Phospholipase Cβ Signaling. eNeuro 12:ENEURO.0376-24.2024. doi:10.1523/ENEURO.0376-24.2024

      Stengl M. 2010. Pheromone Transduction in Moths. Front Cell Neurosci 4:133. doi:10.3389/fncel.2010.00133

      Stengl M. 1994. Inositol-trisphosphate-dependent calcium currents precede cation currents in insect olfactory receptor neurons in vitro. J Comp Physiol A 174:187–194. doi:10.1007/BF00193785

      Stengl M, Funk NW. 2013. The role of the coreceptor Orco in insect olfactory transduction. J Comp Physiol A 199:897–909. doi:10.1007/s00359-013-0837-3

      Stengl M, Hildebrand JG. 1990. Insect olfactory neurons in vitro: morphological and immunocytochemical characterization of male-specific antennal receptor cells from developing antennae of male Manduca sexta. J Neurosci 10:837–847. doi:10.1523/JNEUROSCI.10-03-00837.1990

      Stengl M, Schneider AC. 2024. Contribution of membrane-associated oscillators to biological timing at different timescales. Front Physiol 14:1243455. doi:10.3389/fphys.2023.1243455

      Tanoue S, Krishnan P, Krishnan B, Dryer SE, Hardin PE. 2004. Circadian Clocks in Antennal Neurons Are Necessary and Sufficient for Olfaction Rhythms in Drosophila. Curr Biol 14:638–649. doi:10.1016/j.cub.2004.04.009

      Thurm U, Küppers J. 1980. Epithelial physiology of insect sensilla In: Locke M, Smith DS, editors. Insect Biology in the Future. Academic Press. pp. 735–763. doi:10.1016/B978-0-12-454340-9.50039-2

      Wicher D, Miazzi F. 2021. Functional properties of insect olfactory receptors: ionotropic receptors and odorant receptors. Cell Tissue Res 383:7–19. doi:10.1007/s00441-020-03363-x

    1. eLife Assessment

      This study presents valuable new insights into the patterns of organelle inheritance in the protozoan parasite Toxoplasma gondii. An innovative dual-labeling approach used in this study to track maternal-derived and de novo synthesized organelles provides a technical advance with potential to be more broadly applied. Solid evidence is provided that different organelles show distinct inheritance fates during cell replication; however, the data describing the residual body component in this process is incomplete.

    2. Reviewer #1 (Public review):

      Summary:

      This work asks the question of how different organelles and structures in the apicomplexan parasite Toxoplasma gondii are recycled and/or segregated to the daughter cells during cell replication. In particular, they consider an unusual cell structure called the residual body that links replicating cells during the intracellular infection stage of this parasite. The residual body has historically been considered a 'dumping ground' for unnecessary relics of the mother cell during division, but this notion is increasingly being revised. Indeed, cell replication in Toxoplasma is often misinterpreted as cell division (cytokinesis), but in fact, the cell replicates its organelles and structures to multiple 10s of copies in seemingly distinctly formed daughter cells, but cytokinesis is delayed for many such cycles and typically only occurs simultaneously with parasite egress from its host cell. The residual body is, in fact, the connection between these pre-cytokinetic replicated daughters, and effectively, this is still a single cell at this stage. The authors have previously shown that an actin network extends through the residual body between these daughter cells, and ER and mitochondria common to all cells are also linked through this structure. This study examining the fates of organelles during cell replication is timely for continuing our understanding of how this fascinating component of the cell participates in these processes. The authors use Halo-tags as their principal tool to track discrete populations of proteins, labelling their organelle locations, and this provides beautiful insight into these processes.

      Strengths:

      Using dyes conjugated to Halo tags, this work elegantly tracks the fates of proteins synthesised by an original 'mother' cell over several replication cycles of pre-cytokinetic 'daughters'. Using this tool, they show that some organelles are made intact just once and that some of these can be subsequently sorted to the daughters (micronemes and rhoptries) while others are dismantled (IMC) and the daughters must make their own. A third set of organelles (largely synthesis, sorting, and metabolic compartments) is divided and inherited, and new daughter-synthesised proteins are added to the preexisting maternal proteins in these structures. A role for actin and myosin is clearly demonstrated for micronemes and rhoptries, and this correlates with their relatively late inheritance into the developing daughters. Overall, this work gives clarity to the behaviours of several cell structures during replication and paves the way to a better understanding of the mechanisms that drive the differences between structures and the universality of these processes in other apicomplexan parasites.

      Weaknesses:

      In addressing the question of residual body participation in sorting of organelles, it would be useful to clearly define this structure and when and where it is delineated from the posterior of a mother cell during the formation of daughter structures. This might seem like a moot point, but it would give clarity to notions of recycling and 'reservoirs'. Mother cells retain their active invasion apparatus until very late in daughter formation, and the need for micronemes and rhoptries to be released from this service late in the process might explain why they are only then trafficked to the cell posterior and then into the daughters. So, is this a distinct 'residual body' body function/reservoir or just a spatial constraint of this sequence of daughter formation? In subsequent cell replications (4, 8, 16... stages), is there a separation between the residual body that links them all and the posterior of each new 'mother cell', and if so, when is this distinction lost? This is important because without a definition, we might be confusing different processes. Are rhoptries/micronemes that originate in one 'mother' able to be sorted to the 'daughters' from a distinct mother in this syncytium? If so, this would make it a sorting centre, but otherwise we could be just capturing the activities at the posterior of any given cell during replication. The authors' further thoughts on this would be very interesting.

      The Group 2 structures are described as those that are divided between daughters and receive newly synthesised proteins that add to the maternal protein of these compartments. While this is a logical conclusion for several that are mentioned, where the maternal protein signal is seen to be depleted with replication (including for the apicoplast, ER, glideosome, and Golgi). Data for the addition of new proteins to these existing structures is actually only presented in direct support of this for the Golgi.

    3. Reviewer #2 (Public review):

      Summary:

      Toxoplasma gondii is an obligate intracellular parasite and the causative agent of Toxoplasmosis. Parasite invasion into host cells, intracellular replication, and then egress, which results in the destruction of the infected cell, is central to pathogenicity. This manuscript focuses on understanding how maternal resources (in this case, cellular organelles) are shared between daughter parasites during cell division. Many organelles are single copy, meaning that division and inheritance by the daughters is crucial for successful replication. The major strength of this study was the use of a Halo-based pulse chase assay to characterize patterns of organelle inheritance. The results show that both microneme and rhoptries (secretory vesicles) previously thought to be synthesized de novo are inherited by daughter parasites. Thus, this paper adds new insight to our understanding of cell division in this important parasite.

      Strengths:

      This study demonstrated that pulse labeling of proteins can be used to monitor protein synthesis, turnover, and movement. This approach will be of great interest to the field. Using this method, the authors demonstrate three main modes of organelle inheritance.

      (1) Organelles, where there are multiple copies (such as secretory vesicles, micronemes, and rhoptries), are divided between the daughter parasites, with additional contribution of newly formed vesicles. New and old material remain as separate entities in the cell.

      (2) Single-copy organelles, which are expanded to include newly synthesized material prior to division, such as the Golgi and apicoplast.

      (3) Cytoskeletal structures that are synthesized anew during each round of division. These studies provide more refined insight into patterns or organelle inheritance and demonstrate that secretory organelles are not made de novo during each round of division as previously thought. The paper has a logical flow, and overall, the data is presented in a clear and organized fashion.

      Weaknesses:

      (1) Descriptions of methodology and statistical analysis were incomplete.

      (2) There are inconsistencies between the data in Figures 1 and 5. In Figure 1, a small amount of maternal IMC is visible in stage 2 parasites. Although this is a ~90% reduction, these parasites should be quantified as parasites with material IMC. However, the graph in Figure 5C indicates that no material parasites have GAPM1a, given that graph 5C is a binary measure (present vs. absent), one would expect a non-zero percent of parasites to have maternal material.

      (3) The conclusion from Figure 6 was not justified based on the data. I agree with the author's conclusion that the accumulation of micronemes and rhoptries in the residual body was time-dependent. In Figure 6A, the signal observed in the residual body at times 6:30, 13, and 14 hours is not observed in subsequent time points. However, the fate of these micronemes and rhoptries is unclear. It cannot be concluded that these vesicles are recycled back to the mother. They could also have been degraded. In fact, the graphs of microneme inheritance in Figure 2B show a decrease in maternal signal from 100% to 80% between stages 1 and 2, indicating that some microneme degradation is taking place.

      (4) To convincingly demonstrate that the redistribution of micronemes and rhoptries was due to recovery of MyoF protein levels after auxin washout, a Western blot should be performed to show MyoF protein levels over time. In addition, the decrease in mMIC2 protein levels in the residual body in Figure 8F should be measured and normalized for photobleaching. Both apical and basal signals appear to be reduced over the time course of imaging.

    4. Reviewer #3 (Public review):

      Summary:

      Knoerzer-Suckow et al. explore the mechanisms of organelle inheritance during endodyogeny in Toxoplasma gondii using an innovative dual-labeling approach to track the distribution of maternal organelles into daughter parasites. They can clearly distinguish between maternal and daughter-derived organelles using their dual-labeling Halo Tag approach. They reveal that different organelles are trafficked to daughter parasites in three broad patterns, which they have binned into groups. Their findings reveal a role for MyoF in the inheritance of micronemes and rhoptries, and notably, they observe that the inner membrane complex (IMC) is not recycled. Instead, the IMC undergoes a pronounced relocalization to the posterior of the maternal cell, where it is likely targeted for degradation.

      Strengths:

      The data surrounding their MyoF knockdown experiments, IMC degradation, and trafficking of MIC2 after auxin washout are compelling. These data add to the knowledge of how organelle inheritance occurs in T. gondii, increasing the field's understanding of endodyogeny.

      Weaknesses:

      (1) The evidence provided to support the claim that microneme and rhoptry inheritance specifically traffics through the residual body does not sufficiently substantiate the claim. The temporal resolution of the imaging is inadequate to precisely trace the path of microneme and rhoptry inheritance. From the data shown in the manuscript, it can be concluded that at least some of the micronemes and rhoptries might be recycled through the residual body, but it is unclear whether many or most of these organelles do so.

      (2) The absence of specific markers for the residual body brings into question whether microneme inheritance occurs through a discrete residual body or simply via the basal end of the maternal parasite. The authors need a robust way to visualize and define the residual body to claim that micronemes and rhoptries are specifically transported through this structure.

    1. eLife Assessment

      This is a solid paper on intermittent fasting that will be of interest to readers. The data presented are certainly valuable as a resource. The findings of both shared and tissue-specific signatures, both at the proteomic and transcriptomic levels, align well with what has been established and bring new insight into metabolic adaptation and its consequences in muscle, cortex, and liver.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, the authors employed comprehensive proteomics and transcriptomics analysis to investigate the systemic and organ-specific adaptations to IF in males. They found that shared biological signaling processes were identified across tissues, suggesting unifying mechanisms linking metabolic changes to cellular communication, which revealed both conserved and tissue-specific responses by which IF may optimize energy utilization, enhance metabolic flexibility, and promote cellular resilience.

      Strengths:

      This study detected multiple organs, including the liver, brain, and muscle, and revealed both conserved and tissue-specific responses to IF.

      Weaknesses:

      (1) Why did the authors choose the liver, brain, and muscle, but not other organs such as the heart and kidney? The latter are proven to be the largest consumers of ketones, which is also changed in the IF treatment of this study.

      (2) The proteomics and transcriptomics analyses were only performed at 4 months. However, a strong correlation between IF and the molecular adaptations should be time point-dependent.

      (3) The context lacks a "discussion" section, which would detail the significance and weaknesses of the study.

      (4) There is no confirmation for the proteomic and transcriptomic profiling. For example, the important changes in proteomics could be further identified by a Western blot.

    3. Reviewer #2 (Public review):

      Summary:

      Fan and colleagues measure proteomics and transcriptomics in 3 organs (liver, skeletal muscle, cerebral cortex) from male C57BL/6 mice to investigate whether intermittent fasting (IF; 16h daily fasting over 4 months) produces systemic and organ-specific adaptations.

      They find shared signaling pathways, certain metabolic changes, and organ-specific responses that suggest IF might affect energy utilization, metabolic flexibility, while promoting resilience at the cellular level.

      Strengths:

      The fact that there are 3 organs and 2 -omics approaches is a strength of this study.

      Weaknesses:

      The analytical approach of the data generated by the present study is not well posed, because it doesn't help to answer key questions implicit in the experimental design. Consequently, the paper, as it is for now, reads as a mere description of results and not a response to specific questions.

      The presentation of the figures, the knowledge of the literature, and the inclusion of only one sex (male) are all weaknesses.

    4. Reviewer #3 (Public review):

      Summary:

      Fan et al utilize large omics data sets to give an overview of proteomic and gene expression changes after 4 months of intermittent fasting (IF) in liver, muscle, and brain tissue. They describe common and distinct pathways altered under IF across tissues using different analysis approaches. The main conclusions presented are the variability in responses across tissues with IF. Some common pathways were observed, but there were notable distinctions between tissues.

      Strengths:

      (1) The IF study was well conducted and ran out to 4 months, which was a nice long-term design.

      (2) The multiomics approach was solid, and additional integrative analysis was complementary to illustrate the differential pathways and interactions across tissues.

      (3) The authors did not overstep their conclusions and imply an overreached mechanism.

      Weaknesses:

      The weaknesses, which are minor, include the use of only male mice and the early start (6 weeks) of the IF treatment. See specifics in the recommendations section.

    5. Author response:

      Reviewer #1 (Public review):

      Summary: 

      In this study, the authors employed comprehensive proteomics and transcriptomics analysis to investigate the systemic and organ-specific adaptations to IF in males. They found that shared biological signaling processes were identified across tissues, suggesting unifying mechanisms linking metabolic changes to cellular communication, which revealed both conserved and tissue-specific responses by which IF may optimize energy utilization, enhance metabolic flexibility, and promote cellular resilience. 

      Strengths: 

      This study detected multiple organs, including the liver, brain, and muscle, and revealed both conserved and tissue-specific responses to IF.

      We appreciate the recognition of the study’s strengths and the opportunity to clarify the points raised.

      Weaknesses: 

      (1) Why did the authors choose the liver, brain, and muscle, but not other organs such as the heart and kidney? The latter are proven to be the largest consumers of ketones, which is also changed in the IF treatment of this study.

      We agree that the heart and kidney are critical organs in ketone metabolism. Our selection of the liver, brain, and muscle was guided by their distinct metabolic functions and relevance to systemic energy balance, neuroplasticity, and locomotor activity, key domains influenced by intermittent fasting (IF). These tissues also offer complementary perspectives on central and peripheral adaptations to IF. Notably, we have previously examined the effects of IF on the heart (eLife 12:RP89214), and we fully acknowledge the importance of the kidney. We intend to include it in future studies to broaden the scope and deepen our understanding of IF-induced systemic responses.

      (2) The proteomics and transcriptomics analyses were only performed at 4 months. However, a strong correlation between IF and the molecular adaptations should be time point-dependent.

      We appreciate this insightful comment. The 4-month time point was selected to capture long-term adaptations to IF, beyond acute or transitional effects. While we acknowledge that molecular responses to IF are time-dependent, our goal in this study was to establish a foundational understanding of sustained systemic and tissue-specific changes. We fully agree that a longitudinal approach would provide deeper insights into the temporal dynamics of IF-induced adaptations. To address this, we are currently undertaking a comprehensive 2-year study that is specifically designed to explore these time-dependent effects in greater detail.

      (3) The context lacks a "discussion" section, which would detail the significance and weaknesses of the study.

      We appreciate this observation. The manuscript was originally structured to emphasize results and interpretation within each section, but we recognize that a dedicated discussion section would enhance clarity and contextual depth. In the revised version, we will add a comprehensive discussion section addressing broader implications, limitations, and future directions of the study.

      (4) There is no confirmation for the proteomic and transcriptomic profiling. For example, the important changes in proteomics could be further identified by a Western blot. 

      We acknowledge the importance of orthogonal validation to support high-throughput findings. While our study primarily focused on uncovering systemic patterns through proteomic and transcriptomic profiling, we agree that targeted confirmation would strengthen the conclusions. To this end, we have included immunohistochemical validation of a key protein common to all three organs—Serpin A1C. Additionally, we are planning a dedicated follow-up study to expand functional validation of several key proteins identified in this manuscript, which will be pursued as a separate project.

      Reviewer #2 (Public review): 

      Summary: 

      Fan and colleagues measure proteomics and transcriptomics in 3 organs (liver, skeletal muscle, cerebral cortex) from male C57BL/6 mice to investigate whether intermittent fasting (IF; 16h daily fasting over 4 months) produces systemic and organ-specific adaptations. 

      They find shared signaling pathways, certain metabolic changes, and organ-specific responses that suggest IF might affect energy utilization, metabolic flexibility, while promoting resilience at the cellular level.

      Strengths: 

      The fact that there are 3 organs and 2 -omics approaches is a strength of this study. 

      We appreciate the reviewer’s recognition of the breadth of our study design. By integrating proteomics and transcriptomics across three metabolically distinct organs, we aimed to provide a comprehensive view of systemic and tissue-specific adaptations to IF. This multi-organ, multi-omics approach was central to uncovering both conserved and divergent biological responses.

      Weaknesses: 

      (1) The analytical approach of the data generated by the present study is not well posed, because it doesn't help to answer key questions implicit in the experimental design. Consequently, the paper, as it is for now, reads as a mere description of results and not a response to specific questions.

      We thank the reviewer for this important observation. Our initial aim was to establish a foundational atlas of molecular changes induced by IF across key organs. However, we recognize that clearer framing of the biological questions would enhance interpretability. In the revised manuscript, we will have restructured the introduction, results, and discussion to align more explicitly with specific hypotheses, particularly those related to energy metabolism, cellular resilience, and inter-organ signaling. We have also added targeted analyses and clarified how each dataset contributes to answering these questions.

      (2) The presentation of the figures, the knowledge of the literature, and the inclusion of only one sex (male) are all weaknesses.

      We appreciate this feedback and agree that these are important considerations. Regarding figure presentation, we will revise several figures for improved clarity, add more descriptive legends, and reorganize supplemental materials to better support the main findings. On the literature front, we will expand the discussion to include recent and relevant studies on IF, metabolic adaptation, and sex-specific responses. As for the use of only male mice, this was a deliberate choice to reduce hormonal variability and focus on establishing baseline molecular responses. We fully acknowledge the importance of sex as a biological variable and will soon be conducting studies in female mice to address this gap.

      Reviewer #3 (Public review):

      Summary: 

      Fan et al utilize large omics data sets to give an overview of proteomic and gene expression changes after 4 months of intermittent fasting (IF) in liver, muscle, and brain tissue. They describe common and distinct pathways altered under IF across tissues using different analysis approaches. The main conclusions presented are the variability in responses across tissues with IF. Some common pathways were observed, but there were notable distinctions between tissues.

      Strengths: 

      (1) The IF study was well conducted and ran out to 4 months, which was a nice long-term design. 

      (2) The multiomics approach was solid, and additional integrative analysis was complementary to illustrate the differential pathways and interactions across tissues. 

      (3) The authors did not overstep their conclusions and imply an overreached mechanism. 

      We sincerely thank the reviewer for acknowledging the strengths of our study design and analytical approach. We aimed to strike a careful balance between comprehensive data generation and cautious interpretation, and we appreciate the recognition that our conclusions were appropriately framed within the scope of the data.

      Weaknesses: 

      The weaknesses, which are minor, include the use of only male mice and the early start (6 weeks) of the IF treatment. See specifics in the recommendations section.

      We appreciate the reviewer’s thoughtful comments. The decision to use male mice and initiate IF at 6 weeks was based on minimizing hormonal variability and capturing early adult metabolic programming. We acknowledge that sex and developmental timing are important biological variables. To address this, we are conducting parallel studies in female mice and evaluating IF initiated at later life stages. These follow-up investigations will help determine the extent to which sex and timing influence the molecular and physiological outcomes of IF.

    1. eLife Assessment

      This important study provides evidence for our understanding of HIV transmission dynamics by age and sex in Zambia during the PopART trial; by combining phylogenetic and individual-based mathematical modelling (IBM), it adds depth to the epidemiological literature and may inform more strategic allocation of HIV prevention resources in sub-Saharan Africa. The authors employ two complementary and well-established methodologies (phylogenetics and IBM), and this dual approach is a notable strength. However, the evidence supporting key conclusions is incomplete, with several claims insufficiently substantiated by the data presented. Improvements in data presentation (e.g., quantification of qualitative statements, statistical estimates, and clearer description of results) would substantially strengthen the paper.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript describes the results of phylogenetic and epidemiological modeling of the PopART community cohorts in Zambia. The current manuscript draft is methodologically strong, but needs revision to strengthen the take-home messages. As written, there are many possible take-away conclusions. For example, the agreement between IBM and phylogenetic analysis is noteworthy and provides a methodological focus. The revealed age patterns of transmission could be a focus. The effects of the PopART intervention and the consequences of a 1-year disruption could be a focus. It is important, though, that any main messages summarized by the authors are substantiated by the evidence provided and do not extrapolate beyond the data that have been generated. I recommend that the authors think deeply about what the most important, well-supported messages are and reframe the discussion and abstract accordingly.

      Strengths/weaknesses by section:

      (1) ABSTRACT

      The Abstract summarizes qualitative findings nicely, but the authors should incorporate quantitative results for all of the qualitative findings statements.

      The ending claim is not substantiated by the modeling scenarios that have been run: "targeted interventions for demographic groups such as under-35 men may be the key to finally ending HIV." It is straightforward to run this specific scenario in the model to determine whether or not this is true.

      The authors should add confidence intervals to the quantitative metrics, such as the 93.8% and 62.1% incidence reduction.

      (2) RESULTS

      The authors should check the Results section for any qualitative claims not substantiated by the analyses performed, and ensure the corresponding analyses are presented to support the claims.

      The Results and Methods describe the model's implementation of the PopART intervention differently. The Methods describes it as including VMMC, TB, and STI services, while the Results only mentions intensified HIV testing and linkage.

      A limitation of the model is that HIV disease progression is based on the ATHENA cohort in the Netherlands, which is a different HIV subtype (B) than the one in the research setting (C). The model should be configured using subtype C progression data, which have been published, or at least a sensitivity analysis should be conducted with respect to disease progression assumptions.

      In Table 2, the authors should consider adding a p-value to establish whether or not IBM and phylogenetics estimates are different.

      (3) DISCUSSION

      The literature review and comparison of study results to previously published phylogenetic studies is very nice. The authors could strengthen this by providing quantitative estimates with CIs for a more scientific comparison of the study results vs. prior studies, perhaps as a table or figure.

      The authors state that due to "the narrow geographical catchment area... The results should not be automatically extrapolated to apply to other SSA settings." The authors should exercise this caution when comparing the results to studies in South Africa and elsewhere.

      There are many other limitations to the analysis, including some mentioned above, that are not acknowledged. The authors should think carefully about what the most important limitations are and acknowledge them honestly at the end of the Discussion section.

    3. Reviewer #2 (Public review):

      Summary:

      The authors analyzed PopART data to better characterize the age and sex-specific heterosexual HIV transmission dynamics in Zambia, with the goal of allocating resources.

      Strengths:

      Important analysis to hone in on the key driver of HIV transmission in Zambia, which hopefully can be used to tune prevention efforts to maximize effect while limiting required resources. Two analytic approaches were used, and while the phylogenetic data were markedly more limited, they mirrored the simulated epidemic. The authors did a nice job reviewing the limitations of the data and the analyses. The authors did a nice job of providing analyses to support their goals and hypothesis, and this work may have more impact now that resources in SSA for HIV prevention and treatment may become more scarce

      Weaknesses:

      To increase the impact and utility of this work, it would be helpful to parse the analysis just a bit further to estimate the roles of undiagnosed vs diagnosed and untreated subpopulations on this transmission. PopART is a multifaceted intervention, but the cost, effort, and approach to reengagement in care vs testing/treatment can be quite different.

    4. Author response:

      We thank the editors and reviewers for their positive and constructive comments. The three most substantial points raised by the public review are the following:

      No explicit modelling of targeting of young men as a course to ending HIV. 

      We did not intend to imply that the epidemic could be ended by this alone, or even that targeting young men was the optimum strategy if resources were available for more general preventative interventions. The “last mile” for HIV will be a very complex scenario in which key populations will start to play an outsize role, and our modelling framework was not developed to consider it. As a result, we would not have confidence in modelling the decline of the viral population to zero. We shall be qualifying the existing language in the paper in order to make this clear.

      Subtype-specific disease progression data. 

      The criticism is that our modelling of disease progression was based on subtype B, while the HIV viral population in Zambia is overwhelmingly subtype C. Sensitivity to subtype has not been looked at in detail in this analysis as the literature suggests that the rate of CD4 decline does not differ between subtypes B and C.

      While some studies have shown differences in CD4 cell decline between subtypes, they have generally highlighted that subtype D progresses faster than other subtypes. Little evidence has been published on the differences between subtype B and C, and studies that do include both subtypes concluded that there was no significant difference in rates of CD4 decline between subtypes.

      No significant difference between rate of CD4 progression by subtype is evidenced in the following publications:<br /> - Klein et al. (2014) (N=9772)<br /> - Bouman et al. (2023) (although no subtype B)<br /> - Easterbrook et al. (2010) (N=861)

      While some studies have illustrated that "progression changes with HIV subtype", an interrogation of the underlying data highlights that subtype B is not included, e.g.<br /> - Kanki et al. (1999) looked at A versus "non-A subtype" but included no subtype B data.<br /> - Vasan et al. (2006) claims differences in rate of CD4 decline by subtype when compared to subtype D but includes no subtype B data.<br /> - Baeten et al. (2007) claims subtype D has faster progression that subtype A but includes no subtype B data.<br /> - Kiwanuka et al. (2008) claims differences in rate of CD4 decline but includes no subtype B data.<br /> - Amornkul et al. (2013) has no subtype B data.

      Furthermore, to explain why we used subtype B data to parameterise the model: usually, statistical analyses of CD4 count progression do not report parameters in a form that can be directly imported into models. Analysing summary statistics to include in models results in under-specified models of disease progression in simulations. For this reason we use the estimates from Cori et al. (2015); where the statistical analysis was specifically tailored to generate modelling parameters. The trade-off is therefore to use subtype C data with model misspecification, or subtype B data without; neither choice is perfect, and we chose the subtype B correctly specified estimates.

      The role of undiagnosed versus diagnosed and untreated subpopulations. 

      We will add an additional analysis us to compare age differences in sources and recipients according to the diagnostic status of the source.

      The rest of the comments in the public review ask for improvements in data presentation (including some additional statistical analyses) and to make sure qualitative claims are fully justified. We are happy to oblige with these, and will make our thinking clear on all points in the full response.

    1. eLife Assessment

      This is a useful paper regarding the roles of brown adipose tissue and skeletal muscle in thermogenesis in mice, with potential significance for the field. The overall approach is innovative but on balance the evidence for the claim is incomplete, as cast immobilization, while innovative, is likely stressful, may impact muscle and BAT directly, and imposes an energetic cost of motion on the animal that is not accounted for. Further experiments are also needed to directly assess the role of adipose-derived BCAAs in thermogenesis. The authors have done a good job of textually editing their manuscript to clarify the findings and limitations of the study.

    2. Reviewer #1 (Public review):

      Summary:

      Heat production mechanisms are flexible, depending on a wide variety of genetic, dietary and environmental factors. The physiology associated with each mechanism is important to understand, since loss of flexibility associates with metabolic decline and disease.

      The phenomenon of compensatory heat production has been described in some detail in publications and reviews, notably by modifying BAT-dependent thermogenesis (for example by deleting UCP1 or impairing lipolysis, cited in this paper).

      These authors chose to eliminate exercise as an alternative means for maintaining body temperature. To do this, they cast either one or both mouse hindlimbs.

      This paper is set up as an evaluation of a loss of function of muscle on the functionality of BAT. However, the authors show that cast immobilization (CI) does not work as a (passive) loss of function, instead this procedure produces a dramatic gain of function.

      It does not test the hypothesis as stated, instead it adds an extraneous variable, which is that the animal is put under enormous stress, inducing b-adrenergic effectors, increased oxygen consumption, and IL6 expression in a variety of tissues, together with commensurate cachectic effects on muscle and fat. The BAT is stressed by this procedure, becoming super-induced but relatively poor functioning. This is an inaccurate experimental construct, and the paper is therefore full of wrong conclusions.

      Within hours and days of CI, there is massive muscle loss (leading to high circulating BCAAs), and loss of lipid reserves in adipose and liver. The lipid cycle that maintains BAT thermogenesis is depleted and the mouse is unable to maintain body temperature.

      I cannot agree with these statements in the Discussion -

      "We have here shown that cast immobilization suppressed skeletal muscle thermogenesis, resulting in failure to maintain core body temperature in a cold environment."

      • This result could also be attributed to high stress and decreased calorie reserves. Note also: CI suppresses 50% locomoter activity, but the actual work done by the mouse carrying bilateral casts is not taken into account (how heavy are they?). Presumably other muscles in the mouse body are compensating to allow the mouse to drag itself to the food source, to maintain food consumption, which remarkably, is unchanged. Is the demand for heat even the same when the mouse is wrapped in gypsum?

      I cannot be convinced that this approach (CI) can be interpreted at all in terms of organ communication during thermogenic challenge. This paper describes instead the resilience and adaptation of mouse physiology in the face of dragging around hind limb casts.

      From Rebuttal:

      "On the other hand, the experiment shown in Fig.1C involved acute cold exposure of mice 2 h after cast immobilization. This result suggests that, even before the depletion of energy stores by immobilization of skeletal muscle, cast immobilization may cause cold intolerance in mice."

      Since the mice are in acute recovery from the anesthetic, there can be no conclusions drawn about thermogenesis. Isoflurane is a great way to depress body temperature (http://www.ncbi.nlm.nih.gov/pubmed/12552204), and the recovery time is not known.

      "In addition, as the reviewer suggests, cast immobilization may result in BAT thermogenesis and cachectic effects on muscle and fat. However, circulating corticosterone concentrations and hypothalamic CRH gene expression are not significantly altered after cast immobilization (Figure 2_figure supplement 2D-F)."

      The absence of positive results from your stress assays does not exclude stress as the primary source of the results. These mice are not proceeding as normal with their lives - they are learning whole new behaviors in order to stay fed and watered.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, the authors identified a previously unrecognized organ interaction where limb immobilization induces thermogenesis in BAT. They showed that limb immobilization by cast fixation enhances the expression of UCP1 as well as amino acid transporters in BAT, and amino acids are supplied from skeletal muscle to BAT during this process, likely contributing to increased thermogenesis in BAT. Furthermore, the experiments with IL-6 knockout mice and IL-6 administration to these mice suggest that this cytokine is likely involved in the supply of amino acids from skeletal muscle to BAT during limb immobilization.

      Strengths:

      The function of BAT plays a crucial role in the regulation of an individual's energy and body weight. Therefore, identifying new interventions that can control BAT function is not only scientifically significant but also holds substantial promise for medical applications. The authors have thoroughly and comprehensively examined the changes in skeletal muscle and BAT under these conditions, convincingly demonstrating the significance of this organ interaction.

      Weaknesses:

      Through considerable effort, the authors have demonstrated that limb-immobilized mice exhibit changes in thermogenesis and energy metabolism dynamics at their steady state. However, The impact of immobilization on the function of skeletal muscle and BAT during cold exposure has not been thoroughly analyzed.

      Comments on revisions:

      The authors appropriately responded to the reviewers' recommendations made during the previous round of peer review.

    1. eLife Assessment

      This useful study provides new insights into the liver stage antigen LSA3, its export to erythrocytes, and its role in liver stage development. While the functional importance of LSA3 is well-demonstrated, the data underlying conclusions about antibody specificity, liver stage localization, and phenotype remain incomplete. A key gain is the use of mosquito and humanized mouse models to access life cycle stages rarely studied in most laboratories.

    2. Reviewer #1 (Public review):

      Summary:

      The extent to which P. falciparum liver stage parasites export proteins into the host cell is unclear. Most blood-stage exported proteins tested in liver stages were not exported. An exception is LISP2, which is exported in P. berghei but not P. falciparum liver stages. While the machinery for export is present in liver stages, efforts to demonstrate export have so far been mostly unsuccessful. Parasite proteins exported during the liver stage could be presented by MHC and thereby become the target of immune control, an incentive to study liver stage export and identify proteins exported during this stage. However, particularly for P. falciparum, it is very difficult to study liver stages.

      This work studies LSA3 in P. falciparum blood and liver stages. The authors show that this protein is exported into the host cell in blood stages, but in liver stages, no or only very little export was detected. A disruption of LSA3 reduced liver stage load in a humanized mouse model, indicating this protein contributes to efficient development of the parasites in the liver.

      The paper also studies the localization of LSA3 in blood stages and uses a known inhibitor to show that it is processed by plasmepsin 5, a protease important for protein trafficking. The work also shows that LSA3 is not needed for passage through the mosquito.

      Strengths:

      The main strength of this work is the use of the humanized mouse model to study liver stages of P. falciparum, which is technically challenging and requires specialized facilities. The biochemical analysis of LSA3 localization and processing by plasmepsin 5 is thorough and mostly overcame adverse issues such as a cross-reactive antibody and the negative influence of the GFP-tag on LSA3 trafficking. The mosquito stage analysis is also notable, as these kinds of studies are difficult with P. falciparum. However, there was no evidence for a function of LSA3 in mosquito stages.

      Weaknesses:

      The cross-reactivity of the antibody, together with the co-infection strategy, prevents reliable assessment of LSA3 localization in liver stages. Despite this, it seems LSA3 is not exported in liver stages, and the paper does not bring us closer to the original goal of finding an exported liver stage protein.

      While the localization analysis in blood stages is well done and thorough, the advance is somewhat limited. LSA3 may be in structures like J dots, but this hypothesis was not tested. Although parasites with a disrupted LSA3 were generated, the function of this protein was not explored. Given that a previous publication found some inhibitory effect of LSA3 antibodies on blood stage growth, a comparison of the growth of the LSA3 disruption clones with the parent would have been very welcome and easy to do. At this point, LSA3 is one more of many proteins exported in blood stages for which the function remains unclear.

      It might be possible to refine some of the conclusions. The impact on liver stage development is interesting, but which phase of the liver stage is affected, and the phenotype remains largely unknown. The co-infection (WT together with LSA3 mutant) has the advantage of a direct comparison of the mutant with the control in the same liver, but complicates phenotypic analysis if the LSA3 antibody is also cross-reactive in liver stages. This issue adds a question mark to the shown localization and precludes phenotypic comparisons. The authors write that they do not know if the cross-reactive protein is expressed at that stage. But this should be immediately evident from the mixed WT/mutant infection. If all cells are positive for LSA3, there is a cross-reaction. If about half of the cells are negative, there isn't. In the latter case, the localization shown in the paper is indeed LSA3, and morphological differences between WT and LSA3 disruption could be assessed without additional experiments.

      Significance:

      The conclusion from the paper that "our study presents just the second PEXEL protein so far identified as important for normal P. falciparum liver-stage development and confirms the hypothesized potential of exported proteins as malaria vaccine candidates" is partially misleading. Neither LISP2 nor LSA3 seems to be exported in P. falciparum liver stages, and we can't confirm the potential of vaccines with proteins exported in this stage. LSA3 is still important and may still be the target of the immune response, but based on this work, probably not due to export in liver stages.

    3. Reviewer #2 (Public review):

      Summary:

      Immunogenic Plasmodium falciparum proteins that could be targeted to prevent parasite development in the liver are of significant interest for novel anti-malarial vaccine development. In this study, McConville et al evaluate the trafficking and functional importance of LSA3, a protein expressed in the blood and liver stages and previously shown to provide protection in immunized chimpanzees. LSA3 contains a PEXEL motif, but the authors have previously shown that this protein does not appear to be exported beyond the PVM in the liver stage (McConville et al, PNAS 2024). However, LSA3 trafficking and functional importance have not been comprehensively evaluated across stages. In the present study, the authors find that blood-stage LSA3 undergoes PEXEL processing, and a portion of the protein is exported into the erythrocyte, where it localizes to punctate structures distinct from Maurer's clefts. Using a knockout mutant, LSA3 is shown to be dispensable for blood and mosquito stages but important to liver-stage development. Collectively, these results validate LSA3 as a liver-stage target and place it among several other PEXEL proteins that display differential trafficking beyond the PVM in the erythrocyte but not the hepatocyte.

      Strengths:

      (1) The authors present a thorough analysis of LSA3 trafficking in the blood stage. PEXEL processing by Plasmepsin 5 is clearly demonstrated through a combination of mini LSA3-GFP reporters and Plasmepsin 5 inhibitors. Importantly, an LSA3 knockout mutant is used to show that the LSA3-C anti-sera also react with additional, unidentified parasite proteins in the blood stage. Nonetheless, comparison between the WT and KO parasites clearly indicates that a portion of LSA3 is exported into the erythrocyte, which is further supported by protease-protection assays with fractionated iRBCs. This contrasts with the liver stage, where LSA3 does not appear to traffic beyond the PVM, similar to what has been observed for other PEXEL proteins in the rodent malaria model.

      (2)This study provides the first direct analysis of LSA3 function by reverse genetics, showing this protein is important for liver stage development in chimeric human liver mice. Several PEXEL proteins in P. berghei have been shown to be exported into the host cell in the blood stage, but do not appear to cross the PVM in the liver stage. These observations reinforce that even without detectable export into the hepatocyte, PEXEL proteins play critical roles during liver stage development.

      Weaknesses:

      (1) A previous study reported that anti-LSA3 antibodies inhibit blood-stage growth, suggesting a role for LSA3 during erythrocyte infection. While the authors carefully evaluate the LSA3 mutant in mosquito and liver stages, the impact on blood stage fitness is not tested. While the knockout shows LSA3 is not essential in the blood stage, its importance during erythrocyte infection remains unclear.

      (2) The authors previously reported that anti-LSA3-C signal in the liver stage localizes within the parasite and at the parasite periphery but is not exported into the hepatocyte. In the present study, it is shown that anti-LSA3-C reacts with other parasite proteins beyond LSA3 in the blood stage, and this may also occur in the liver stage. However, since liver-stage IFAs were only performed on samples co-infected with both WT and ∆LSA3 parasites, non-specific anti-LSA3-C reactivity at this stage could not be determined, and the localization of LSA3 in the liver stage remains somewhat unclear.

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript provides a comprehensive characterization of the Plasmodium falciparum protein LSA3, combining biochemical, genetic, and in vivo approaches. The authors convincingly demonstrate that LSA3 is expressed during liver stage infection and that disruption of the gene leads to a modest but reproducible reduction in liver stage parasite load in humanized mice.

      Strengths:

      Their biochemical and cell biological analysis of blood stages provides strong evidence that LSA3 is exported to the infected erythrocyte, and the detailed analysis of its PEXEL motif processing is well executed.

      Weaknesses:

      The study suggests LSA3 as one of only two known P. falciparum PEXEL proteins contributing to this stage, although there is no evidence for the export beyond the vacuolar membrane. Several key conclusions, particularly regarding antibody specificity, localization in liver stage parasites, and the interpretation of the phenotypic data, are not fully supported by the current experiments.

    5. Author response:

      We thank all three reviewers for their positive comments and valuable suggestions for improving the manuscript. A detailed blood stage analysis of LSA3-deificient parasites was conducted with, and led by, collaborators at Ehime University in a separate study that is currently in revision at another journal and will be published separately. We intend to cite the complementary publication once it is accepted for publication and to revise the wording in the current manuscript in accordance with suggested feedback. These changes will be reflected in the revised manuscript to be submitted as the eLife Version of Record.

    1. eLife Assessment

      This work, combining behavioural genetics and calcium imaging, provides evidence for a form of learning in Drosophila that derives solely from direct or (optogenetically induced) phantom experience of punishment or reward. Flies that experience foot-shock alone show a subsequent decrease in avoidance to all odorants, together with increased odor-evoked activation of reward-encoding dopaminergic neurons that innervate the mushroom body. Phantom reward, delivered via optogenetic activation of reward-encoding dopaminergic neurons, increases subsequent odour-avoidance. While the findings are valuable to the field, there are aspects of the work that are incomplete, and some of the conclusions and terminology are also not completely justified; three major issues include : (a) the use of the term "priming" to describe this form of learning seems inappropriate and inconsistent with the accepted definition of this term; (b) a key 1998 publication with an initial description of this behavioural phenomenon needs to be cited and presented as context; and (c) the work on reward induced increase in odor-aversion seems relatively preliminary.

    2. Reviewer #1 (Public review):

      Summary:

      The authors present an investigation of associative learning in Drosophila in which a previous exposure to an aversive stimulus leads to an increase in approach behaviors to a novel odor relative to a previously paired odor or no odor (air). Moreover, this relative increase is larger compared to that of a control group - i.e., presented with a (different) odor only. Evidence for the opposite effect with an appetitive stimulus, delivered indirectly by optogenetically activating sugar sensory neurons, which leads to a reduction in approach behavior to a novel odor, was also presented. The olfactory memory circuits underpinning these responses, which the authors refer to as 'priming', are revealed and include a feedback loop mediated by dopaminergic neurons to the mushroom body.

      Strengths:

      (1) The study includes a solid demonstration of the effect of the valence of a previous stimulus on sensory preferences, with an increase or decrease in preference to novel over no odor following an aversive or appetitive stimulus, respectively.

      (2) The demonstration of bidirectional effects on odor preferences following aversive or rewarding stimuli is compelling.

      (3) The evidence for distinct neural circuits underpinning the odor preferences in each context appears to be robust.

      Weaknesses:

      (1) The conclusions regarding the links between neural and behavioral mechanisms are mostly well supported by the data. However, what is less convincing is the authors' argument that their study offers evidence of 'priming'. An important hallmark of priming, at least as is commonly understood by cognitive scientists, is that it is stimulus specific: i.e., a repeated stimulus facilitates response times (repetition priming), or a repeated but previously ignored stimulus increases response times (negative priming). That is, it is an effect on a subsequent repeated stimulus, not ANY subsequent stimulus. Because (prime or target) stimuli are not repeated in the current experiments, the conditions necessary for demonstrating priming effects are not present. Instead, a different phenomenon seems to be demonstrated here, and one that might be more akin to approach/avoidance behavior to a novel or salient stimulus following an appetitive/aversive stimulus, respectively.

      (2) On a similar note, the authors' claim that 'priming' per se has not been well studied in non-human animals is not quite correct and would need to be revised. Priming effects have been demonstrated in several animal types, although perhaps not always described as such. For example, the neural underpinnings of priming effects on behavior have been very well characterized in human and non-human primates, in studies more commonly described as investigations of 'response suppression'.

      (3) The outcome measure - i.e., difference scores between the two odors or odor and non-odor (i.e., the number of flies choosing to approach the novel odor versus the number approaching the non-odor (air)) - appears to be reasonable to account for a natural preference for odors in the mock-trained group. However, it does not provide sufficient clarification of the results. The findings would be more convincing if these relative scores were unpacked - that is, instead of analyzing difference scores, the results of the interaction between group and odor preference (e.g., novel or air) (or even within the pre- and post-training conditions with the same animals) would provide greater clarity. This more detailed account may also better support the argument that the results are not due to conditioning of the US with pure air.

    3. Reviewer #2 (Public review):

      The manuscript by Yang et al. investigates how a prior experience (notably by the activation of sensory/reinforcing dopaminergic neurons) alters olfactory response and memory expression in Drosophila. They refer to a priming effect with the definition: "Priming is a process by which exposure to a stimulus affects the response to a subsequent stimulus in Humans". The authors observed that exposing flies to a series of shocks (or the optogenetic activation of aversively reinforcing dopaminergic neurons) decreases ensuing odour avoidance. Conversely, optogenetic activation of sweet-sensing neurons increases following odour avoidance. They proposed that the reduced odour avoidance was due to the involvement of reward dopaminergic neurons involved during shock (or the optogenetic activation of aversively reinforcing dopaminergic neurons). They indeed show the involvement of reward dopaminergic neurons innervating the mushroom body (the fly learning and memory centre) during shock preexposure. Recording (calcium activity) from reward dopaminergic neurons before and after shock preexposure shows that only a small subset of dopaminergic neurons innervating the mushroom body γ4 compartment increases their response to odour after shock. They then showed the requirement of the γ4 reward dopaminergic neurons during shock preexposure on ensuing odour avoidance. They also tested the role of the dopamine receptor in the mushroom body. They finally recorded from different mushroom body output neurons, including the one (MBON-γ4γ5) likely affected by the increased activity of the corresponding γ4 reward dopaminergic neurons after shock preexposure. They recorded odour-evoked responses from these neurons before and after shock preexposure, but did not find any plasticity, while they found a logical effect during spaced cycles of aversive training.

      Overall, the study is very interesting with a substantial amount of behavioural analysis and in vivo 2-photon calcium imaging data, but some major (and some minor) issues have to be resolved to strengthen their conclusions.

      (1) According to neuropsychological work (Henson, Encyclopedia of Neuroscience (2009), vol. 7, pp. 1055-1063), « Priming refers to a change in behavioral response to a stimulus, following prior exposure to the same, or a related, stimulus. Examples include faster reaction times to make a decision about the stimulus, a bias to produce that stimulus when generating responses, or the more accurate identification of a degraded version of the stimulus". Or "Repetition priming refers to a change in behavioural response to a stimulus following re-exposure" (PMID: 18328508). I therefore do not think that the effects observed by the authors are really the investigation of the neural mechanisms of priming. To me, the effect they observed seems more related to sensitisation, especially for the activation of sweet-sensing neurons. For the shock effect, it could be a safety phenomenon, as in Jacob and Waddell, 2020, involving (as for sugar reward) different subsets for short-term and long-term safety.

      (2) The author missed the paper from Thomas Preat, The Journal of Neuroscience, October 15, 1998, 18(20):8534-8538 (Decreased Odor Avoidance after Electric Shock in Drosophila Mutants Biases Learning and Memory Tests). In this paper, one of the effects observed by the authors has already been described, and the molecular requirement of memory-related genes is investigated. This paper should be mentioned and discussed.

      (3) Overall, the bidirectional effect they observed is interesting; however, their results are not always clear, and the use of a delta PI is sometimes misleading. The authors have mentioned that shocks induced attraction to the novel odour, while they should stick to the increase or decrease in preference/avoidance. As not all experiments are done in parallel logic, it is not always easy to understand which protocol the authors are using. For example, only optogenetics is used in the appetitive preexposure. Does exposing flies to sugar or activating reward dopaminergic neurons also increase odour avoidance? The observed increased odour avoidance after optogenetic activation of sweet-sensing neurons involve reward (e.g., decreased response) and/or punishment (e.g., increased response) to increase odour avoidance? The author should always statistically test the fly behavioural performances against 0 to have an idea of random choice or a clear preference toward an odour. On the appetitive side, the internal hunger state would play an important role. The author should test it or at least discuss it.

      (4) The authors found a discrepancy between genetic backgrounds; sometimes the same odour can be attractive or aversive. Different effects between the T-maze and the olfactory arena are found. The authors proposed that: "Punishment priming effect was still not detected, probably due to the insensitivity of the optogenetic arena". This is unclear to me, considering all prior work using this arena. The author should discuss it more clearly. They mentioned that flies could not be conditioned with air and electric shock. However, flies could be conditioned with the context + shock, which is changing in the T-maze and not in the optogenetic area.

    4. Author response:

      We thank both reviewers for their valuable comments. We have prepared a point-by-point response below.

      Reviewer #1 (Public review):

      Weaknesses:

      (1) The conclusions regarding the links between neural and behavioral mechanisms are mostly well supported by the data. However, what is less convincing is the authors' argument that their study offers evidence of 'priming'. An important hallmark of priming, at least as is commonly understood by cognitive scientists, is that it is stimulus specific: i.e., a repeated stimulus facilitates response times (repetition priming), or a repeated but previously ignored stimulus increases response times (negative priming). That is, it is an effect on a subsequent repeated stimulus, not ANY subsequent stimulus. Because (prime or target) stimuli are not repeated in the current experiments, the conditions necessary for demonstrating priming effects are not present. Instead, a different phenomenon seems to be demonstrated here, and one that might be more akin to approach/avoidance behavior to a novel or salient stimulus following an appetitive/aversive stimulus, respectively.

      (2) On a similar note, the authors' claim that 'priming' per se has not been well studied in non-human animals is not quite correct and would need to be revised. Priming effects have been demonstrated in several animal types, although perhaps not always described as such. For example, the neural underpinnings of priming effects on behavior have been very well characterized in human and non-human primates, in studies more commonly described as investigations of 'response suppression'.

      We thank the reviewer for these critical comments. After careful consideration of both reviews, we agree that “priming” may not be the most accurate term to describe the behavioral phenomenon. We plan to revise our terminology throughout the manuscript accordingly to better capture the generalized nature of the effect we observe.

      (3) The outcome measure - i.e., difference scores between the two odors or odor and non-odor (i.e., the number of flies choosing to approach the novel odor versus the number approaching the non-odor (air)) - appears to be reasonable to account for a natural preference for odors in the mock-trained group. However, it does not provide sufficient clarification of the results. The findings would be more convincing if these relative scores were unpacked - that is, instead of analyzing difference scores, the results of the interaction between group and odor preference (e.g., novel or air) (or even within the pre- and post-training conditions with the same animals) would provide greater clarity. This more detailed account may also better support the argument that the results are not due to conditioning of the US with pure air.

      We use the PI score as a standard metric to quantify all the odor preference in behavioral assays because it allows for robust comparison across different genetic or treatment groups under the same experimental setting. In T-maze, real time tracking of fly trajectories is technically difficult. With olfactory arenas, we showed some examples of fly distribution in quadrants over the entire odor choice test period (Figure 2—figure supplement 2) for both pre-trained and post-trained groups and discussed the trajectories in Discussion. We will ensure this point is clarified in the revised text.                       

      Reviewer #2 (Public review):

      […] They finally recorded from different mushroom body output neurons, including the one (MBON-γ4γ5) likely affected by the increased activity of the corresponding γ4 reward dopaminergic neurons after shock preexposure. They recorded odour-evoked responses from these neurons before and after shock preexposure, but did not find any plasticity, while they found a logical effect during spaced cycles of aversive training.

      We thank the reviewer for the summary. We would like to clarify that we did, in fact, observe plasticity in MBON-γ4γ5 following shock exposure, as shown in Figure 4B.

      Overall, the study is very interesting with a substantial amount of behavioural analysis and in vivo 2-photon calcium imaging data, but some major (and some minor) issues have to be resolved to strengthen their conclusions.

      (1) According to neuropsychological work (Henson, Encyclopedia of Neuroscience (2009), vol. 7, pp. 1055-1063), « Priming refers to a change in behavioral response to a stimulus, following prior exposure to the same, or a related, stimulus. Examples include faster reaction times to make a decision about the stimulus, a bias to produce that stimulus when generating responses, or the more accurate identification of a degraded version of the stimulus". Or "Repetition priming refers to a change in behavioural response to a stimulus following re-exposure" (PMID: 18328508). I therefore do not think that the effects observed by the authors are really the investigation of the neural mechanisms of priming. To me, the effect they observed seems more related to sensitisation, especially for the activation of sweet-sensing neurons. For the shock effect, it could be a safety phenomenon, as in Jacob and Waddell, 2020, involving (as for sugar reward) different subsets for short-term and long-term safety.

      As noted in our response to Reviewer #1, we plan to revise our use of the term “priming” in the manuscript to more accurately interpret the behavioral phenomenon.

      (2) The author missed the paper from Thomas Preat, The Journal of Neuroscience, October 15, 1998, 18(20):8534-8538 (Decreased Odor Avoidance after Electric Shock in Drosophila Mutants Biases Learning and Memory Tests). In this paper, one of the effects observed by the authors has already been described, and the molecular requirement of memory-related genes is investigated. This paper should be mentioned and discussed.

      We thank the reviewer for bringing this important reference to our attention. We will cite the Preat (1998) paper and discuss its relevant findings in relation to our own in the revised manuscript.

      (3) Overall, the bidirectional effect they observed is interesting; however, their results are not always clear, and the use of a delta PI is sometimes misleading. The authors have mentioned that shocks induced attraction to the novel odour, while they should stick to the increase or decrease in preference/avoidance.

      The ΔPI is calculated either as (trained PI – mock PI) for different animals or as (post PI – pre PI) for the same animals, with the specific calculation clarified in each figure legend. A positive ΔPI signifies an increase in preference for the odor, which is equivalent to a relative attraction or a decrease in avoidance.

      As not all experiments are done in parallel logic, it is not always easy to understand which protocol the authors are using. For example, only optogenetics is used in the appetitive preexposure. Does exposing flies to sugar or activating reward dopaminergic neurons also increase odour avoidance? The observed increased odour avoidance after optogenetic activation of sweet-sensing neurons involve reward (e.g., decreased response) and/or punishment (e.g., increased response) to increase odour avoidance?  

      We used different behavioral assays (T-maze or arena), stimuli (real shock or optogenetics), and protocols (different or same animal groups) to robustly demonstrate the phenomenon across platforms. We explained each protocol in the figures or texts, and we’ll make them clearer to follow in the revised version. We focused on activating a clean set of sugar sensing neurons because this optogenetic stimulus is an effective and efficient substitute to real sugar. We agree that testing reward dopaminergic neuron activation is a logical extension and will consider adding these experiments in the revised work.

      The author should always statistically test the fly behavioural performances against 0 to have an idea of random choice or a clear preference toward an odour.

      Our primary focus is on the change in preference induced by training, rather than the innate odor preference itself, which can be highly variable due to physiological and environmental factors. Statistical testing against 0 for innate preference scores is not standard practice in this specific paradigm, as the critical question is whether a treatment alters behavior relative to a control.

      On the appetitive side, the internal hunger state would play an important role. The author should test it or at least discuss it.

      For appetitive experiments, we always starve the flies on 1% agar for two days prior to behavioral tests to standardize their hunger state. We will consider adding fed flies as control groups in the revised work.

      (4) The authors found a discrepancy between genetic backgrounds; sometimes the same odour can be attractive or aversive.

      We observed minor discrepancies in innate odor preferences across genetic backgrounds, which is a known and common occurrence. Different genotypes and temperatures can result in different baseline PI scores. However, the key finding is that the relative change in odor preference following an aversive stimulus is consistent: it increases the relative preference for an odor compared to air. This sometimes reverses valence (aversion to attraction) and other times simply reduces aversion. Our analysis focuses on this consistent, relative change.

      Different effects between the T-maze and the olfactory arena are found. The authors proposed that: "Punishment priming effect was still not detected, probably due to the insensitivity of the optogenetic arena". This is unclear to me, considering all prior work using this arena. The author should discuss it more clearly.

      The punishment effect with CS+ present was reliably detected in the T-maze (Figure 1A) but was not significant in the olfactory arena (Figure 2—figure supplement 1B-C). We hypothesize that the olfactory arena assay is less sensitive than the T-maze for detecting such subtle behavioral changes. This is evidenced by the fact that even classical odor-shock conditioning yields lower PI in the arena (typically ~0.4) than in the T-maze (~0.8), likely due to the greater distance flies must explore and travel. The higher variance in the arena may therefore mask more modest effects. Here the effect under investigation was induced by optogenetically activating only a small subset of aversive dopaminergic neurons, a stimulus that is likely weaker than full electric shock. This reduced stimulus strength may have contributed to the challenge of detecting a significant effect in the less sensitive arena paradigm.

      They mentioned that flies could not be conditioned with air and electric shock. However, flies could be conditioned with the context + shock, which is changing in the T-maze and not in the optogenetic area.

      While flies can be conditioned to context, during the optogenetic stimulation period in the arena, the light is delivered uniformly across all four quadrants. Therefore, any potential context conditioning would be equivalent across the entire chamber and should not bias the final distribution of flies between the odor and air quadrants during the test, nor affect the calculated PI score.

    1. eLife Assessment

      Liang et al. have conducted a small pilot study investigating the feasibility and tolerability of a regimen of neoadjuvant chemo-immunotherapy for non-small cell lung cancer, with lower cumulative dose of chemotherapy and with the immunotherapy delivered on D8 of each cycle. The clinical data are interesting and novel, and overall the findings of the study are valuable. However, the translational data and analyses are incomplete and do not support key claims in the title.

    2. Reviewer #1 (Public review):

      Liang et al. have conducted a small-scale pilot study focusing on the feasibility and tolerability of Low-dose chemotherapy combined with delayed immunotherapy in the neoadjuvant treatment of non-small cell lung cancer. The design of delayed immunotherapy after chemotherapy is relatively novel, while the reduced chemotherapy, although somewhat lacking in innovation, still serves as an early clue for exploring future feasible strategies. Also, the dynamic ctDNA and TCR profiles could give some important hints of intrinsic tumor reaction.

      However, as the author mentioned in the limitation part, due to the small sample size and lack of a control group, we cannot fully understand the advantages and disadvantages of this approach compared to standard treatment. Compared to standard immunotherapy, the treatment group in this study has three differences: (1) reduced chemotherapy, (2) the use of cisplatin instead of the commonly used carboplatin in neoadjuvant therapy trials, and (3) delayed immunotherapy. Generally, in the exploration of updated treatment strategies, the design should follow the principle of "controlling variables." If there are too many differences at once, it becomes difficult to determine which variable is responsible for the effects, leading to confusion in the interpretation of the results. Moreover, the therapeutic strategy may lack practical clinical operability due to the long treatment duration.

      Furthermore, in the exploration of biomarkers, the authors emphasized the procedure of whole RNA sequencing in tumor tissues in the method section, and this was also noted in the flowchart in Figure 1. However, I didn't find any mention of RNA-related analyses in the Results section, which raises some concerns about the quality of this paper for me. If the authors have inadvertently omitted some results, they should supplement the RNA-related analyses so that I can re-evaluate the paper.

      To sum up, this article exhibited a certain degree of innovation to some extent, However, due to its intrinsic design defects and data omissions, the quality of the research warranted further improvement.

    3. Reviewer #2 (Public review):

      Summary:

      In this single center, single arm, open label non-randomised study the authors tested the use of paclitaxel at 180-220 mg/m2 and cisplatin at 60mg/m2 in patients with squamous NSCLC and pemetrexed at 500mg/m2 and cisplatin at 60mg/m2 in adenocarcinoma of lung origin in the neoadjuvant setting. The chemotherapy appears to have been given at a relatively standard dose; though the platin dose at 60mg/m2 is somewhat lower than has been used in the checkmate 816 trial (75mg/m2/dose), this is a well-established dose for NSCLC.

      Key differences to currently approved neoadjuvant chemo-ICI treatment is that anti-PD1 antibody sintilimab (at 200mg/dose) was given on day 5 and that only 2 cycles of chemotherapy were given pre surgery, but then repeated on two occasions post surgery. Between May/2020 and Nov/2023 50 patients were screened, 38 went on to have this schedule of tx, 31 (~82%) went on to have surgery and 27 had the adjuvant treatment. The rate of surgery is entirely consistent with the checkmate 816 data.

      Question to the authors:

      It would be very helpful to understand why 7 (~18% of the population) patients did not make it to surgery and whether this is related to disease progression, toxicity or other reasons for withdrawal.

      The key clinical endpoints were pCR and mPR rates. 2/38 patients are reported to have achieved a radiological pCR but only 31 patients underwent surgery with histological verification. Supp table2 suggests that 10/31 patients achieved a pCR, 6/31 additional patients achieved a major pathological response and that 13/31 did not achieve a major pathological response

      It would be really helpful for understanding the clinical outcome to present the histopathological findings in the text in a bit more detail and to refer the outcome to the radiological findings. I note that the reference for pathological responses incorrectly is 38 patients as only 31 patients underwent surgery and were evaluated histologically.

      The treatment was very well tolerated with only 1 grade 3 AE reported. The longer term outcome will need to be assessed over time as the cohort is very 'young'. It is not clear what the adjuvant chemo-ICI treatment would add and how this extra treatment would be evaluated for benefit - if all the benefit is in the neoadjuvant treatment then the extra post-operative tx would only add toxicity

      Please consider what the two post-operative chemo-ICI cycles might add to the outcome and how the value of these cycles would be assessed. Would there be a case for a randomised assessment in the patients who have NOT achieved a mPR histologically?

      While the clinical dataset identifies that the proposed reduced chemo-ICI therapy has clinical merit and should be assessed in a randomized study, the translational work is less informative.

      The authors suggest that the treatment has a positive impact on T lymphocytes. Blood sampling was done at day 0 and day 5 of each of the four cycle of chemotherapy with an additional sample post cycle 4. The authors state that data were analysed at each stage.

      The data in Figure 3B are reported for three sets of pairs: baseline to pre day 5 in cycle 1, day 5 to day 21 in cycle 1, baseline of cycle to to day 5. It remains unclear whether the datasets contain the same top 20 clones and it would be very helpful to show kinetic change for the individual 'top 20 clones' throughout the events in individual patients; as it stands the 'top20 clones' may vary widely from timepoint to timepoint. Of note, the figures do not demonstrate that the top 20 TCR clones were 'continuously increased'.

      Instead, the data suggest that there are fluctuations in the relative distributions over time but that may simply be a reflection of shifts in T cell populations following chemotherapy rather than of immunological effects in the cancer tissue.<br /> Consistent with this the authors conclude (line 304/5): "No significant difference was observed in the diversity, evenness, and clonality of TCR clones across the whole treatment procedure" and this seems to be a more persuasive conclusion than the statement 'that a positive effect on T lymphocytes was observed' - where it is also not clear what 'positive' means.

      The text needs a more balanced representation of the data: only a small subset of four patients appear to have been evaluated to generate the data for figure 3B and only three patients (P5, P6, P7) can have contributed to figure 3C if the sample collection is represented accurately in Figure 3A.

      The text refers to flow cytometric results in SF3. However, no information is given on the flow cytometry in M&M, markers or gating strategy.

      Please consider changing the terminology of the 'phases' into something that is easier to understand. One option would be to use a reference to a more standard unit (cycle 1-4 of chemotherapy and then d0/d5/d21).

      Please make it explicit in the text that molecular analyses were undertaken for some patients only, and how many patients contribute to the data in figures 3B-F. Figure 3A suggests paired mRNA data were obtained in 2 patients (P2 and P5) but I cannot find the results on these analyses; four individual blood samples to assess TCR changes int PH1/PH2/PH3and PH4 were only available in four patients (P4,P5,P7,P9). Only three patients seem to have the right samples collected to allow the analysis for 'C3' in figure 3C.

      Please display for each of the 'top 20 clones' at any one timepoint how these clones evolve throughout the study; I expect that a clone that is 'top 20' at a given timepoint may not be among the 'top twenty' at all timepoints.

      Please also assess if the expanded clonotypes are present (and expanded) in the cancer tissue at resection, to link the effect in blood to the tumour. Given that tissue was collected for 31 patients, mRNA sequencing to generate TCR data should be possible to add to the blood analyses in the 12 patients in Figure 3A. Without this data no clear link can be made to events in the cancer.

      Please provide in M&M the missing information on the flow cytometry methodology (instrument, antibody clones, gating strategy) and what markers were used to define T cell subsets (naïve, memory, central memory, effector memory).

      The authors also describe that ctDNA reduces after chemo-ICI treatment. This is well documented in their data but ultimately irrelevant: if the cancer volume is reduced to the degree of a radiological or pathological response /complete response then the quantity of circulating DNA from the cancer cells must reduce. More interesting would be the question whether early changes predict clinical outcome and whether recurrent ct DNA elevations herald recurrence.

      Please probe whether the molecular data identify good radiological or pathological outcomes before cycle 2 is started and whether the ctDNA levels identify patients who will have a poor response and/or who relapse early.

    1. eLife Assessment

      Glioblastoma is among the most aggressive cancers without a cure, and its cells are characterized by high mitochondrial membrane potential. This manuscript provides solid evidence that glioblastoma tumorigenesis is closely linked to mitochondrial stress. The study makes a valuable contribution to the field by advancing our understanding of the metabolic mechanisms driving glioblastoma and highlighting potential therapeutic targets.

    2. Reviewer #1 (Public review):

      Summary:

      Cai et al have investigated the role of msiCAT-tailed mitochondrial proteins that frequently exist in glioblastoma stem cells. Overexpression of msiCAT-tailed mitochondrial ATP synthase F1 subunit alpha (ATP5) protein increases the mitochondrial membrane potential and blocks mitochondrial permeability transition pore formation/opening. These changes in mitochondrial properties provide resistance to staurosporine (STS)-induced apoptosis in GBM cells. Therefore, msiCAT-tailing can promote cell survival and migration, while genetic and pharmacological inhibition of msiCAT-tailing can prevent the overgrowth of GBM cells.

      Strengths:

      The CATailing concept has not been explored in cancer settings. Therefore, the present provides new insights for widening the therapeutic avenue.

      Weaknesses:

      Although the paper does have strengths in principle, the weaknesses of the paper are that these strengths are not directly demonstrated.

      The conclusions of this paper are mostly well supported by data, but some aspects of image acquisition and data analysis need to be clarified and extended.

    3. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Cai et al have investigated the role of msiCAT-tailed mitochondrial proteins that frequently exist in glioblastoma stem cells. Overexpression of msiCAT-tailed mitochondrial ATP synthase F1 subunit alpha (ATP5) protein increases the mitochondrial membrane potential and blocks mitochondrial permeability transition pore formation/opening. These changes in mitochondrial properties provide resistance to staurosporine (STS)-induced apoptosis in GBM cells. Therefore, msiCAT-tailing can promote cell survival and migration, while genetic and pharmacological inhibition of msiCAT-tailing can prevent the overgrowth of GBM cells.

      Strengths:

      The CAT-tailing concept has not been explored in cancer settings. Therefore, the present provides new insights for widening the therapeutic avenue. 

      Your acknowledgment of our study's pioneering elements is greatly appreciated.

      Weaknesses:

      Although the paper does have strengths in principle, the weaknesses of the paper are that these strengths are not directly demonstrated. The conclusions of this paper are mostly well-supported by data, but some aspects of image acquisition and data analysis need to be clarified and extended.

      We are grateful for your acknowledgment of our study’s innovative approach and its possible influence on cancer therapy. We sincerely appreciate your valuable feedback. In response, this updated manuscript presents substantial new findings that reinforce our central argument. Moreover, we have broadened our data analysis and interpretation, as well as refined our methodological descriptions.

      Reviewer #2 (Public Review):

      This work explores the connection between glioblastoma, mito-RQC, and msiCAT-tailing. They build upon previous work concluding that ATP5alpha is CAT-tailed and explore how CAT-tailing may affect cell physiology and sensitivity to chemotherapy. The authors conclude that when ATP5alpha is CAT-tailed, it either incorporates into the proton pump or aggregates and that these events dysregulate MPTP opening and mitochondrial membrane potential and that this regulates drug sensitivity. This work includes several intriguing and novel observations connecting cell physiology, RQC, and drug sensitivity. This is also the first time this reviewer has seen an investigation of how a CAT tail may specifically affect the function of a protein. However, some of the conclusions in this work are not well supported. This significantly weakens the work but can be addressed through further experiments or by weakening the text.

      We appreciate the recognition of our study's novelty. To address your concerns about our conclusions, we have revised the manuscript. This revision includes new data and corrections of identified issues. Our detailed responses to your specific points are outlined below.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) In Figure 1B, please replace the high-exposure blots of ATP5 and COX with representative results. The current results are difficult to interpret clearly. Additionally, it would be helpful if the author could explain the nature of the two different bands in NEMF and ANKZF1. Did the authors also examine other RQC factors and mitochondrial ETC proteins? I'm also curious to understand why CAT-tailing is specific to C-I30, ATP5, and COX-V, and why the authors did not show the significance of COX-V.

      We appreciate your inquiry regarding the data.  Additional attempts were made using new patient-derived samples; however, these results did not improve upon the existing ATP5⍺, (NDUS3)C-I30, and COX4 signals presented in the figure.  This is possibly due to the fact that CAT-tail modified mitochondrial proteins represent only a small fraction of the total proteins in these cells.  It is acknowledged that the small tails visible above the prominent main bands are not particularly distinct. To address this, the revised version includes updated images to better illustrate the differences. We believe the assertion that GBM/GSCs possess CAT-tailed proteins is substantiated by a combination of subsequent experimental findings. The figure (refer to new Fig. 1B) serves primarily as an introduction. It is important to note that the CAT-tailed ATP5⍺ plays a vital role in modulating mitochondrial potential and glioma phenotypes, a function which has been demonstrated through subsequent experiments.

      It is acknowledged that the CAT-tail modification is not exclusive to the ATP5⍺protein.  ATP5⍺ was selected as the primary focus of this study due to its prevalence in mitochondria and its specific involvement in cancer development, as noted by Chang YW et al.  Future research will explore the possibility of CAT tails on other mitochondrial ETC proteins. Currently, NDUS3 (C-I30), ATP5⍺, and COX4 serve as examples confirming the existence of these modifications. It remains challenging to detect endogenous CAT-tailing, and bulk proteomics is not yet feasible for this purpose. COX4 is considered significant.  We hypothesize that CAT-tailed COX4 may function similarly to the previously studied C-I30 (Wu Z, et al), potentially causing substantial mitochondrial proteostasis stress.  

      Concerning RQC proteins, our blotting analysis of GBM cell lines now includes additional RQC-related factors. The primary, more prominent bands (indicated by arrowheads) are, in our assessment, the intended bands for NEMF and ANKZF1.  Subsequent blotting analyses showed only single bands for both ANKZF1 and NEMF, respectively. The additional, larger molecular weight band of NEMF, which was initially considered for property analysis (phosphorylation, ubiquitination, etc.), was not examined further as it did not appear in subsequent experiments (refer to new Fig. S1C).

      References:

      Chang YW, et al. Spatial and temporal dynamics of ATP synthase from mitochondria toward the cell surface. Communications biology. 2023;6(1).

      Wu Z, et al. MISTERMINATE Mechanistically Links Mitochondrial Dysfunction With Proteostasis Failure. Molecular cell. 2019;75(4).

      (2) In addition to Figure 1B, it would be interesting to explore CAT-tailed mETC proteins in cancer tissue samples.

      This is an excellent point, and we appreciate the question. We conducted staining for ATP5⍺ and key RQC proteins in both tumor and normal mouse tissues. Notably, ATP5⍺ in GBM exhibited a greater tendency to form clustered punctate patterns compared to normal brain tissue, and not all of it co-localized with the mitochondrial marker TOM20 (refer to new Fig. S3C-E). Crucially, we observed a significant increase in NEMF expression within mouse xenograft tumor tissues, alongside a decrease in ANKZF1 expression (refer to new Fig. S1A, B). These findings align with our observations in human samples.

      (3) Please knock down ATP5 in the patient's cells and check whether both the upper band and lower band of ATP5 have disappeared or not.

      This control was essential and has been executed now. To validate the antibody's specificity, siRNA knockdown was performed. The simultaneous elimination of both upper and lower bands upon siRNA treatment (refer to new Fig. S2A) confirms they represent genuine signals recognized by the antibody.

      (4) In Figure 1C and ID, add long exposure to spot aggregation and oligomer. Figure 1D, please add the blots where control and ATP5 are also shown in NHA and SF (similar to SVG and GSC827).

      New data are included in the revised manuscript to address the queries. Specifically, the new Fig 1D now displays the full queue as requested, featuring blots for Control, ATP5α, AT3, and AT20. Our analysis reveals that AT20 aggregates exhibit higher expression and accumulation rates in GSC and SF cells.

      Fig. 1C has been updated to include experimental groups treated with cycloheximide and sgNEMF. Our results show that sgNEMF effectively inhibits CAT-tailing in GBM cell lines, whereas cycloheximide has no impact. After consulting with the Reporter's original creator and optimizing expression conditions, we observed no significant aggregates with β-globin-non-stop protein, potentially due to the length of endogenous CAT-tail formation (as noted by Inada, 2020, in Cell Reports). Our analysis focused on the ratio of CAT-tailed (red box blots) and non-CAT-tailed proteins (green box blots). Comparing these ratios revealed that both anisomycin treatment and sgNEMF effectively hinder the CAT-tailing process, while cycloheximide has no effect.

      (5) In Figure 1E, please double-check the results with the figure legend. ATP5A aggregated should be shown endogenously. The number of aggregates shown in the bar graph is not represented in micrographs. Please replace the images. For Figure 1E, to confirm the ATP5-specific aggregates, it would be better if the authors would show endogenous immunostaining of C-130 and Cox-IV.

      Labels in Fig. 1E were corrected to reflect that the bar graph in Fig. 1F indicates the number of cells with aggregates, not the quantity of aggregates per cell. The presence of endogenous ATP5⍺ is accurately shown. To address the specificity of ATP5⍺, immunostaining for endogenous NUDS3 was conducted. This revealed NUDS3 aggregation in GBM cells (SF and GSC) lacking TOM20, as demonstrated in the new Fig. S3A, B. These findings suggest NUDS3 also undergoes CAT-tailing modification, similar to ATP5⍺.

      (6) Figure 3A. Please add representative images in the anisomycin sections. It is difficult to address the difference.

      We appreciate your feedback. Upon re-examining the Calcein fluorescence intensity data in Fig. 3A, we believe the images accurately represent the statistical variations presented in Fig. 3B. To address your concerns more effectively, please specify which signals in Fig. 3A you find potentially misleading. We are prepared to revise or substitute those images accordingly.

      (7) Figure 3D. If NEMF is overexpressed, is the CAT-tailing of ATP 5 reversed?

      Thank you. Your prediction aligns with our findings. We've added data to the revised Fig. S6A, B, which demonstrates that both NEMF overexpression and ANKZF1 knockdown lead to elevated levels of CRC. This increase, however, was not statistically significant in GSC cells. A plausible explanation for this discrepancy is that the MPTP of GSC cells is already closed, thus any additional increase in CAT-tailing activity does not result in further amplification.

      (8) Figure 3G. Why on the BN page are AT20 aggregates not the same as shown in Figure 2E?

      We appreciate your inquiry regarding the ATP5⍺ blots, specifically those in the original Fig. 3G (left) and 2E (right). Careful observation of the ATP5⍺ band placement in these figures reveals a high degree of similarity. Notably, there are aggregates present at the top, and the diffuse signals extend downwards. Given that this is a gradient polyacrylamide native PAGE, the concentration diminishes towards the top. Consequently, the non-rigid nature of the Blue Native PAGE gel may lead to slight variations in the aggregate signals; however, the overall patterns are very much alike. To mitigate potential misinterpretations, we have rearranged the blot order in the new Fig. 3M.

      (9) Figure 4D. The amount of aggregation mediated by AT20 is more compared to AT3. Why are there no such drastic effects observed between AT3 and AT20 in the Tunnel assay?

      The previous Figure 4D presents the quantification of cell migration from the experiment depicted in Figure 4C. But this is a good point. TUNEL staining results are directly influenced by mitochondrial membrane potential and the state of mitochondrial permeability transition pores (MPTP), not by the degree of protein aggregation. Our previous experiments showed comparable effects of AT3 and AT20 on mitochondria (Fig. 2E, 3K), which aligns with the expected similar outcomes on TUNEL staining. As for its biological nature, this could be very complicated. We hope to explore it in future studies.

      (10) Figure 5C: The role of NEMF and ANKZF1 can be further clarified by conducting Annexin-PI assays using FACS. The inclusion of these additional data points will provide more robust evidence for CAT-tailing's role in cancer cells.

      In response to your suggestion, we have incorporated additional data into the revised version.

      Using the Annexin-PI kit, we labeled apoptotic cells and detected them using flow cytometry (FACS). Our findings indicate that anisomycin pretreatment, NEMF knockdown (sgNEMF), and ANZKF1 upregulation (oeANKZF1) significantly increase the rate of STS-induced apoptosis compared to the control group (refer to new Fig. S9D-G).

      (11) Figure 5F: STS is a known apoptosis inhibitor. Why it is not showing PARP cleavage?

      Also, cell death analysis would be more pronounced, if it could be shown at a later time point. What is the STS and Anisomycin at 24h or 48h time-point? Since PARP is cleaved, it would also be better if the authors could include caspase blots.

      I guess what you meant to say here is "Staurosporine is a protein kinase inhibitor that can induce apoptosis in multiple mammalian cell lines." Our study observed PARP cleavage even in GSCs, which are typically more resistant to staurosporine-induced apoptosis (C-PARP in Fig. S9B). The ratio of C-PARP to total PARP increased. We selected a 180-minute treatment duration because longer treatments with STS + anisomycin led to a late stage of apoptosis and non-specific protein degradation (e.g., at 24 or 48 hours), making PARP comparisons less meaningful. Following your suggestion, we also examined caspase 3/7 activity in GSC cells treated with DMSO, CHX, and anisomycin. We found that anisomycin treatment also activated caspases (Fig. S9A).

      (12) In Figure 5, the addition of an explanation, how CAT-tailing can induce cell death, would add more information such as BAX-BCL2 ratio, and cytochrome-c release from the mitochondria.

      Thank you for your suggestion. In this study, we state that specific CAT-tails inhibit GSC cell death/apoptosis rather than inducing it. Therefore, we do not expect that examining BAX-BCL2 and mitochondrial cytochrome c release would offer additional insights.

      (13) To confirm the STS resistance, it would be better if the author could do the experiments in the STS-resistant cell line and then perform the Anisomycin experiments.

      Thank you. We should emphasize that our data primarily originates from GSC cells. These cells already exhibit STS-resistance when compared to the control cells (Fig. S8A-C).

      (14) It would be more advantageous if the author could show ATP5 CATailed status under standard chemotherapy conditions in either cell lines or in vivo conditions.

      This is an interesting question. It's worth exploring this question; however, GSC cells exhibit strong resistance to standard chemotherapy treatments like temozolomide (TMZ).

      Additionally, we couldn't detect changes in CAT-tailed ATP5⍺ and thus did not include that data.

      (15) In vivo (cancer mouse model or cancer fly model) data will add more weight to the story.

      We appreciate your intriguing question. An effective approach would be to test the RQC pathway's function using the Drosophila Notch overexpression-induced brain tumor model. However, Khaket et al. have conducted similar studies, stating, "The RNAi of Clbn, VCP, and Listerin (Ltn), homologs of key components of the yeast RQC machinery, all attenuated NSC over-proliferation induced by Notch OE (Figs. 5A and S5A–D, G)." This data supports our theory, and we have incorporated it into the Discussion. While the mouse model more closely resembles the clinical setting, it is not covered by our current IACUC proposal. We intend to verify this hypothesis in a future study.

      Reference:

      Khaket TP, Rimal S, Wang X, Bhurtel S, Wu YC, Lu B. Ribosome stalling during c-myc translation presents actionable cancer cell vulnerability. PNAS Nexus. 2024 Aug 13;3(8):pgae321.

      Reviewer #2 (Recommendations For The Authors):

      Figure 1B, C: To demonstrate that Globin, ATP5alpha, and C-130 are CAT-tailed, it is necessary to show that the high mobility band disappears after NEMF deletion or mutagenesis of the NFACT domain of NEMF. This can be done in a cell line. The anisomycin experiment is not convincing because the intensity of the bands drops and because no control is done to show that the effects are not due to translation inhibition (e.g. cycloheximide, which inhibits translation but not CAT tailing). Establishing ATP5alpha as a bonafide RQC substrate and CAT-tailed protein is critical to the relevance of the rest of the paper.

      Thank you for suggesting this crucial control experiment.

      To confirm the observed signal is indeed a bona fide CAT-tail, it's essential to demonstrate that NEMF is necessary for the CAT-tailing process. We have incorporated data from NEMF knockdown (sgNEMF) and cycloheximide treatment into the revised manuscript. Our findings show that both sgNEMF and anisomycin treatment effectively inhibit the formation of CAT-tailing signals on the reporter protein (Fig. 1C). Similarly, NEMF knockdown in a GSC cell line also effectively eliminated CAT-tails on overexpressed ATP5⍺ (Fig. S2B).

      In general, the text should be weakened to reflect that conclusions were largely gleaned from artificial CAT tails made of AT repeats rather than endogenously CAT-tailed ATP5alpha. CAT tails could have other sequences or be made of pure alanine, as has been suggested by some studies.

      Thank you for your reminder. We have reviewed the recent studies by Khan et al. and Chang et al., and we found their analysis of CAT tail components to be highly insightful. We concur with your suggestion regarding the design of the CAT tail sequence. We aimed to design a tail that maintained stability and resisted rapid degradation, regardless of its length. In the revised version, we clarify that our conclusions are based on artificial CAT tails, specifically those composed of AT repeat sequences (p. 9). We acknowledge that the presence of other sequence components may lead to different outcomes (p. 19).

      Reference:

      Khan D, Vinayak AA, Sitron CS, Brandman O. Mechanochemical forces regulate the composition and fate of stalled nascent chains. bioRxiv [Preprint]. 2024 Oct 14:2024.08.02.606406. Chang WD, Yoon MJ, Yeo KH, Choe YJ. Threonine-rich carboxyl-terminal extension drives aggregation of stalled polypeptides. Mol Cell. 2024 Nov 21;84(22):4334-4349.e7. 

      Throughout the work (e.g. 3B, C), anisomycin effects should be compared to those with cycloheximide to observe if the effects are specific to a CAT tail inhibitor rather than a translation inhibitor.

      We agree that including cycloheximide control experiments is crucial. The revised version now incorporates new data, as depicted in Fig. S5A, B, illustrating alterations in the on/off state of MPTP following cycloheximide treatment. Furthermore, Fig. S6A, B present changes in Calcium Retention Capacity (CRC) under cycloheximide treatment. The consistency of results across these experiments, despite cycloheximide treatment, suggests that anisomycin's role is specifically as a CAT tail inhibitor, rather than a translation inhibitor.

      Line 110, it is unclear what "short-tailed ATP5" is. Do you mean ATP5alpha-AT3? If so this needs to be introduced properly. Line 132: should say "may indicate accumulation of CAT-tailed protein" rather than "imply".

      We acknowledge your points. We have clarified that the "short-tailed ATP5α" refers to ATP5α-AT3 and incorporated the requested changes into the revised manuscript.

      Figure 1C: how big are those potential CAT-tails (need to be verified as mentioned earlier)?

      They look gigantic. Include a ladder.

      In the revised Fig. 1D, molecular weight markers have been included to denote signal sizes. The aggregates in the previous Fig. 1C, also present in the control plasmid, are likely a result of signal overexposure. The CAT-tailed protein is observed just above the intended band in these blots. These aggregates have been re-presented in the updated figures, and their signal intensities quantified.

      Line 170: "indicating that GBM cells have more capability to deal with protein aggregation".

      This logic is unclear. Please explain.

      We appreciate your question and have thoroughly re-evaluated our conclusion. We offer several potential explanations for the data presented in Fig. 1D: (1) ATP5α-AT20 may demonstrate superior stability. (2) GSC (GBM) cells might lack adequate mechanisms to monitor protein accumulation. (3) GSC (GBM) cells could possess an increased adaptive capacity to the toxicity arising from protein accumulation. This discussion has been incorporated into the revised manuscript (lines 166-169).

      Line 177: how do you know the endogenous ATP5alpha forms aggregates due to CAT-tailing? Need to measure in a NEMF hypomorph.

      We understand your concern and have addressed it. Revised Fig. 3G, H demonstrates that a reduction in NEMF levels, achieved through sgNEMF in GSC cells, significantly diminishes ATP5α aggregation. This, in conjunction with the Anisomycin treatment data presented in revised Fig. 3E, F, confirms the substantial impact of the CAT-tailing process on this aggregation.

      Line 218: really need a cycloheximide or NEMF hypomorph control to show this specific to CAT-tailing.

      We have revised the manuscript to include data from sgNEMF and cycloheximide treatments, specifically Fig. 3G, H, and Fig. S5C, D, as detailed in our response above.

      Lines 249,266, Figure 5A: The mentioned experiments would benefit from controls including an extension of ATP5alpha that was not alanine and threonine, perhaps a gly-ser linker, as well as an NEMF hypomorph.

      We sincerely appreciate your insightful comments. In response, the revised manuscript now incorporates control data for ATP5α featuring a poly-glycine-serine (GS) tail. This data is specifically presented in Figs. S2E-G, S4E, S7A, D, E, and S8F, G. Our experimental findings consistently demonstrate that the overexpression of ATP5α, when modified with GS tails, had no discernible impact on protein aggregation, mitochondrial membrane potential, GSC cell mobility, or any other indicators assessed in our study.

      Figure S5A should be part of the main figures and not in the supplement.

      This has been moved to the main figure (Fig. 5C).

    1. Reviewer #2 (Public review):

      This study provides some interesting observations on how different flavour e-cigarettes can affect lung immunology; however, there are numerous flaws, including a low replicate number and a lack of effective validation methods, meaning findings may not be repeated. This is a revised article but several weaknesses remain related to the analysis and interpretation of the data.

      Strengths:

      The strength of the study is the successful scRNA-seq experiment which gives some preliminary data that can be used to create new hypotheses in this area.

      Weaknesses:

      Although some text weaknesses have been addressed since resubmission, other specific weaknesses remain: The major weakness is the n-number and analysis methods. Two biological n per group is not acceptable to base any solid conclusions. Any validatory data was too little (only cell % data) and not always supporting the findings (e.g. figure 3D does not match 3B/4A). Other examples include:

      (1) There aren't enough cells to justify analysis - only 300-1500 myeloid cells per group with not many of these being neutrophils or the apparent 'Ly6G- neutrophils'

      (2) The dynamic range of RNA measurement using scRNAseq is known to be limited - how do we know whether genes are not expressed or just didn't hit detection? This links into the Ly6G negative neutrophil comments, but in general the lack of gene expression in this kind of data should be viewed with caution, especially with a low n number and few cells. The data in the entire paper is not strong enough to base any solid conclusion - it is not just the RNA-sequencing data.

      (3) There is no data supporting the presence of Ly6G negative neutrophils. In the flow cytometry only Ly6G+ cells are shown with no evidence of Ly6G negative neutrophils (assuming equal CD11b expression). There is no new data to support this claim since resubmission and the New figures 4C and D actually show there are no Ly6G negative cells - the cells that the authors deem Ly6G negative are actually positive - but the red overlay of S100A8 is so strong it blocks out the green signal - looking to the Ly6G single stains (green only) you can see that the reported S100A8+Ly6G- cells all have Ly6G (with different staining intensities).

      (4) Eosinophils are heavily involved in lung macrophage biology, but are missing from the analysis - it is highly likely the RNA-sequence picked out eosinophils as Ly6G- neutrophils rather than 'digestion issues' the authors claim

      (5) After author comments, it appears the schematic in Figure 1A is misleading and there are not n=2/group/sex but actually only n=1/group/sex (as shown in Figure 6A). Meaning the n number is even lower than the previous assumption.

    2. Reviewer #3 (Public review):

      This work aims to establish cell-type specific changes in gene expression upon exposure to different flavors of commercial e-cigarette aerosols compared to control or vehicle. Kaur et al. conclude that immune cells are most affected, with the greatest dysregulation found in myeloid cells exposed to tobacco-flavored e-cigs and lymphoid cells exposed to fruit-flavored e-cigs. The up- and down-regulated genes are heavily associated with innate immune response. The authors suggest that a Ly6G-deficient subset of neutrophils is found to be increased in abundance for the treatment groups, while gene expression remains consistent, which could indicate impaired function. Increased expression of CD4+ and CD8+ T cells along with their associated markers for proliferation and cytotoxicity is thought to be a result of activation following this decline in neutrophil-mediated immune response.

      Strengths:

      - Single cell sequencing data can be very valuable in identifying potential health risks and clinical pathologies of lung conditions associated with e-cigarettes considering they are still relatively new.

      - Not many studies have been performed on cell-type specific differential gene expression following exposure to e-cig aerosols.

      - The assays performed address several factors of e-cig exposure such as metal concentration in the liquid and condensate, coil composition, cotinine/nicotine levels in serum and the product itself, cell types affected, which genes are up- or down-regulated and what pathways they control.

      - Considerations were made to ensure clinical relevance such as selecting mice whose ages corresponded with human adolescents so that data collected was relevant.

      Weaknesses:

      - The exposure period of 1 hour a day for 5 days is not representative of chronic use and this time point may be too short to see a full response in all cell types. The experimental design is not well-supported based on the literature available for similar mouse models. Clinical relevance of this short exposure remains unclear.

      - Several claims lack supporting evidence or use data that is not statistically significant. In particular, there were no statistical analyses to compare results across sex, so conclusions stating there is a sex bias for things like Ly6G+ neutrophil percentage by condition are observational.

      - Overall, the paper and its discussion are relatively surface-level and do not delve into the significance of the findings or how they fit into the bigger picture of the field. It is not clear whether this paper is intended to be used as a resource for other researchers or as an original research article.

      - The manuscript has some validation of findings but not very comprehensive.

      This paper provides a strong foundation for follow-up experiments that take a closer look at the effects of e-cig exposure on innate immunity. There is still room to elaborate on the differential gene expression within and between various cell types.

      Comments on revisions:

      The reviewers have addressed major concerns with better validation of data and improved organization of the paper. However, we still have some concerns and suggestions pertaining to the statistical analyses and justifications for experimental design.

      - We appreciate the nuance of this experimental design, and the reviewers have adequately commented on why they chose nose-only exposure over whole body exposure. However, the justification for the duration of the exposure, and the clinical relevance of a short exposure, have not been addressed in the revised manuscript.

      - The presentation of cell counts should be represented by a percentage/proportion rather than a raw number of cells. Without normalization to the total number of cells, comparisons cannot be made across groups/conditions. This comment applies to several figures.

      - We appreciate that the authors have taken the reviewers' advice to validate their findings. However, we have concerns regarding the immunofluorescent staining shown in Figure 4. If the red channel is showing a pan-neutrophil marker (S100A8) and the green channel is showing only a subset of neutrophils (LY6G+), then the green channel should have far less signal than the red channel. This expected pattern is not what is shown in the figure, with the Ly6G marker apparently showing more expression than S100A8. Additionally, the FACS data states that only 4-5% of cells are neutrophils, but the red channel co-localizes with far more than 4-5% of the DAPI stain, meaning this population is overrepresented, potentially due to background fluorescence (noise). In addition, some of the shapes in the staining pattern do not look like true neutrophils, although it is difficult to tell because there remains a lot of background staining. The authors need to verify that their S100A8 and Ly6G antibodies work and are specific to the populations they intend to target. It is possible that only the brightest spots are truly S100A8+ or Ly6G+.

      - Paraffin sections do not always yield the best immunostaining results and the images themselves are low magnification and low resolution.

      - Please change the scale bars to white so they are more visible in each channel.

      - We appreciate that this is a preliminary test used as a resource for the community, but there is interesting biology regarding immune cells that warrants DEG analysis by the authors. This computational analysis can be easily added with no additional experiments required.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The authors tackled the public concern about E-cigarettes among young adults by examining the lung immune environment in mice using single-cell RNA sequencing, discovering a subset of Ly6G- neutrophils with reduced IL-1 activity and increased CD8 T cells following exposure to tobaccoflavored e-cigarettes. Preliminary serum cotinine (nicotine metabolite) measurements validated the effective exposure to fruit, menthol, and tobacco-flavored e-cigarettes with air and PG:VG serving as control groups. They also highlighted the significance of metal leaching, which fluctuated over different exposure durations to flavored e-cigarettes, underscoring the inherent risks posed by these products. The scRNAseq analysis of e-cig exposure to flavors and tobacco demonstrated the most notable differences in the myeloid and lymphoid immune cell populations. Differentially expressed genes (DEGs) were identified for each group and compared against the air control. Further subclustering revealed a flavor-specific rise in Ly6G- neutrophils and heightened activation of cytotoxic T cells in response to tobacco-flavored e-cigarettes. These effects varied by sex, indicating that immune changes linked to e-cig use are dependent on gender. By analyzing the expression of various genes and employing gene ontology and gene enrichment analysis, they identified key pathways involved in this immune dysregulation resulting from flavor exposure. Overall, this study affirmed that e-cigarette exposure can suppress the neutrophil-mediated immune response, subsequently enhancing T cell toxicity in the lung tissue of mice.

      Strengths:

      This study used single-cell RNA sequencing to comprehensively analyze the impact of e-cigarettes on the lung. The study pinpointed alterations in immune cell populations and identified differentially expressed genes and pathways that are disrupted following e-cigarette exposure. The manuscript is well written, the hypothesis is clear, the experiments are logically designed with proper control groups, and the data is thoroughly analyzed and presented in an easily interpretable manner. Overall, this study suggested novel mechanisms by which e-cigs impact lung immunity and created a dataset that could benefit the lung immunity field.

      Weaknesses:

      The authors included a valuable control group - the PG:VG group, since PG:VG is the foundation of the e-liquid formulation. However, most of the comparative analyses use the air group as the control. Further analysis comparing the air group to the PG:VG group, and the PG:VG group to the individual flavored e-cig groups will provide more clear insights into the true source of irritation. This is done for a few analyses but not consistently throughout the paper. Flavor-specific effects should be discussed in greater detail. For example, Figure 1E shows that the Fruit flavor group exhibits more severe histological pathology, but similar effects were not corroborated by the singlecell data.

      We thank the reviewer for this query. We agree that PG:VG group is the foundation of the e-liquid formulation and hence comparisons with this group are of significance to understand the effect of individual flavors on the cell population. Though we compared the flavored e-cig groups with PG:VG group, we did not discuss it in detail within the manuscript to avoid confusions in interpretation for this study. However, we have now included the comparisons with the PG:VG group as a Supplement File S13-S18 in our revised manuscript to facilitate proper interpretation of our omics data to interested readers.

      While we agree that flavor-specific effects might be of interest, we did not delve into exploring them in detail as the fruit flavor e-liquids have now been regulated/banned from sale in the US. Thus, from regulatory point of view, the effects of tobacco-flavored e-liquids hold most interest. Since at the time of conducting this study, fruit flavors were in the market, we have still included the data. However, studying it further was not the focus of this work.

      The characterization of Ly6g+ vs Ly6g- neutrophils is interesting and potentially very impactful. Key results like this from scRNAseq analyses should be validated by qPCR and flow cytometry.

      Also, a recent study by Ruscitti et al reported Ly6g+ macrophages in the lung which can potentially confound the cell type analysis. A more detailed marker gene and sub-population analysis of the myeloid clusters could rule out this potential confounding factor.

      We agree with the reviewer that the loss of Ly6G on neutrophils is a very interesting finding and we have designed a neutrophil specific experiment to study the impact of e-cig exposure on neutrophil maturation and function which will be discussed in subsequent work by our group. To address the concerns raised by the reviewer, we stained the lung tissue samples from air-and tobacco flavored e-cig aerosol exposed mouse lungs with Ly6G and S100A8 (universal marker for neutrophil) to see the infiltration of Ly6G+ vs Ly6G- neutrophils within the lungs of exposed and unexposed mice. Results from this study showed that exposure to tobacco-flavored e-cig aerosol affects the neutrophil population within the mouse lungs. In fact, the changes were more pronounced for female mice. The data have now been shown in Figure 4.

      Reviewer #2 (Public review):

      This study provides some interesting observations on how different flavors of e-cigarettes can affect lung immunology, however there are numerous flaws including a low number of replicates and a lack of effective validation methods which reduces the robustness and rigor of the findings.

      Strengths:

      The strength of the study is the successful scRNA-seq experiment which gives good preliminary data that can be used to create new hypotheses in this area.

      Weaknesses:

      The major weakness is the low number of replicates and the limited analysis methods. Two biological n per group is not acceptable to base any solid conclusions. Any validatory data was too little (only cell % data) and did not always support the findings (e.g. Figure 4D does not match 4C). Often n seems to be combined and only one data point is shown, it is not at all clear how the groups were analyzed and how many cells in each group were compared.

      We thank the reviewer for recognizing the strengths of this manuscript while pointing out the errors to allow us to improve our analyses. We understand that the low number of replicates in this work makes the analyses difficult to draw solid conclusions, but this was a pilot study to identify the changes in the mouse lung upon acute exposures to flavored e-cig aerosols at a single cell level. So far, the e-cig field has been primarily focused on conducting toxicological studies to help regulatory bodies to set standards and enforce laws to better regulate the manufacture, sale and distribution of e-cig products. However, adolescents and young adults are still getting access to these products, and there is little to no understanding of how this may affect the lung health upon acute and chronic exposures. Single cell technology is a powerful tool to analyze the gene expression changes within cell populations to study cell heterogeneity and function. Yet, it is a costly tool owing to which conducting such analyses on large sample sizes is not ideal. This pilot study was designed to get some initial leads for our future studies involving larger sample sizes and chronic exposures. However, due to the vast information that is provided by a single cell RNA sequencing experiment, we intend to share it with a larger audience to support research and further study in this area. We understand that the validations are limited in our current work and so we have now conducted coimmunostaining to validate the Ly6G+ and Ly6G- neutrophil population. We have now included single cell findings with the validating experiments using classical methods of experimentation including ELISA, immunostaining or flow cytometry and revamped the whole manuscript. However, it is important to mention that such validations are sometimes challenging as many of these techniques still investigate the tissue while the changes shown in single cell analyses are mainly pertaining to a single cell type. This could be well-understood by looking at the flow cytometry results for neutrophils where we use Ly6G as a marker to stain for neutrophils which is only found in mature neutrophil population.

      Only 71,725 cells mean only 7,172 per group, which is 3,586 per animal - how many of these were neutrophils, T-cells, and macrophages? This was not shown and could be too low.

      We do agree that the number of cells could be too low. To avoid this, we did not study gene expression variations at the finest level of cell identity. We classified the cell clusters into general annotations -myeloid, lymphoid, endothelial, stromal and epithelial- and identified the changes in the gene expressions. Of these, only two clusters (myeloid and lymphoid) with more than ~1000 cells per cell type per group were studied in detail. We have included the cell count information to allow better interpretation of our results in the revised manuscript. For a single cell point of view, a cell count of ~3500 each with over 20000 features (genes) has good statistical strength and merit in our opinion.

      The dynamic range of RNA measurement using scRNA seq is known to be limited - how do we know whether genes are not expressed or just didn't hit detection? This links into the Ly6G negative neutrophil comment, but in general, the lack of gene expression in this kind of data should be viewed with caution, especially with a low n number and few cells.

      This is a well-taken point, and we thank the reviewer for this comment. We agree that the dynamic range RNA measurement is limited low cell numbers that could lead to bias. However, none of the clusters with counts lower than 150 were included for differential gene analyses. To avoid confusion, we now show immunofluorescence results to validate the findings. We are certain that with the inclusion of these validation experiments, will convince the reviewer about the loss of Ly6G marker from neutrophils and lack of proper neutrophilic response in exposed mouse lungs as compared to the controls.

      There is no rigorous quantification of Ly6G+ and Ly6G- cells int he flow cytometry data.

      We understand that flow-based quantification of our scRNA seq findings would be interesting. However, flow cytometry and single cell suspension to perform sequencing were performed parallelly for this study. We used a basic flow panel using single markers to identify individual immune cell type. We did identify changes in the Ly6G population in our treated and control samples using scRNA seq and intend to exclude it as a marker for our future studies using flow cytometry. Unfortunately, the same analyses could not be performed for the current batch of samples. We have now included results from IHC staining to identify the Ly6G+ and Ly6G- population in the lung tissues from control and treated mice in revised manuscript to address some of the concerns raised here. 

      Eosinophils are heavily involved in lung biology but are missing from the analysis.

      We use RBC lysis buffer to remove the excess RBCs during lung digestion for preparation of single cell suspension for scRNA seq in this study. Reports suggest that RBC lysis could adversely affect the eosinophil number and function. We did not identify any cell cluster, representing markers for eosinophils through our scRNA seq data and we believe that our lung digestion protocol could be the reason for it. We have studied the eosinophil changes through flow cytometry in these samples and have found significant changes as well. However, due to our inability to find cell clusters for eosinophil through scRNA seq data, we did not include these results in the final manuscript previously. To avoid confusion and maintain transparency, we have now included the changes in eosinophils through flow cytometry in revised manuscript (Figure S4).

      The figures had no titles so were difficult to navigate.

      We have now revamped the figures to make it easier for the readers to navigate.

      PGVG is not defined and not introduced early enough.

      We have made the necessary changes in the revised manuscript.

      Neutrophils are not well known to proliferate, so any claims about proliferation need to be accompanied by validation such as BrdU or other proliferation assays.

      We have now removed the cell cycle scoring information from the revised manuscript. Performing BrDU assay was not possible for these tissues due to limited samples and resources. However, we may consider performing it in our future studies.

      It was not clear how statistics were chosen and why Table S2 had a good comparison (two-way ANOVA with gender as a variable) but this was not used for other data particularly when looking at more functional RNA markers (Table S2 also lacks the interaction statistic which is most useful here).

      We have now included the two-way ANOVA statistics (Supplementary File S3) for other data included in the revised manuscript. It is important to note that since we did not identify any significant changes upon two-way ANOVA, the interaction statistics were not available for the abovementioned statistical test. We have included the interaction information wherever available.

      Many statistics are only vs air control, but it would be more useful as a flavor comparison to see these vs PGVG. In some cases, the carrier PGVG looks worse than some of the flavors (which have nicotine).

      While we agree with this comment of the reviewer, comparisons with PG:VG were not included due to the low cell numbers for PG:VG samples obtained following quality control and filtering of scRNA seq analyses.  However, considering the reviewer’s question we still include the details of comparisons with PG:VG included as supplementary files S13-S18 in the revised manuscript.

      The n number is a large issue, but in Figures such as 4, 6, and 7 it could be a bigger factor. The number of significant genes identified has been determined by chance rather than any real difference, e.g. Is Il1b not identified in Fruit flavor vs air because there wasn't enough n, while in Air vs Tobacco, it randomly hit the significance mark. This is but an example of the problems with the analysis and conclusions.

      While we agree in part with the concern raised here. In our opinion, an omics study is not necessarily aimed at finding the changes at transcript level with absolute certainty, but rather to identify probable cell and gene targets to validate with subsequent work. We did not claim that our findings are absolute outcomes but rather add the limitation of sample number and need for further research at every step. The strength of this work is to be the first study of its kind looking at changes in the lung cell population at single cell level upon e-cig aerosol exposure. This study has provided us with interesting gene and cell targets that we are now validating with future work. We still strongly believe that a dataset like this is a useful resource for a wider audience.  

      The data in Figure 7A is confusing, if this is a comparison to air, then why does air vs air not equal 1? Even if this was the comparison to the average of air between males and females, then this doesn't explain why CCL12 is >1 in both. Is this z-score instead? Regardless the data is difficult to interpret in this format.

      We have now changed the format of data representation in the figure.

      Individual n was not shown for almost all experiments - e.g. Figure 1D - what is this representative of? Figure 2D - is this bulk-grouped data for all cells and all mice? The heatmaps are also pooled from 2n and don't show the variability.

      Wherever needed, the n number has been included in the Figure legend. Additionally, the n number is shown in Figure 1A. However, with respect to the second comment we would like to differ from the reviewer’s opinion. Each scRNA seq data had 2 samples – one for male and another for female which has been clearly shown in the current figures. The pooling of cells as mentioned in the comment happened at the stage of preparation of cell suspension from each sex/group at the start of the sequencing. We show the results of the pooled sample showing the variability amongst pooled samples, which we acknowledge is a shortcoming of our work. In terms of representation of the heat maps and data analyses we have included all the needed information to uphold transparency of our study design and data visualization for each figure and would like to stick to the current representations. However, validation cohort does not involve any pooling of sample and still agrees with most of the deductions made from this study. So we are confident that no over statements have been made in this work and we still provide a useful dataset to inform future research in this area.

      Reviewer #3 (Public review):

      This work aims to establish cell-type specific changes in gene expression upon exposure to different flavors of commercial e-cigarette aerosols compared to control or vehicle. Kaur et al. conclude that immune cells are most affected, with the greatest dysregulation found in myeloid cells exposed to tobacco-flavored e-cigs and lymphoid cells exposed to fruit-flavored e-cigs. The up-and-downregulated genes are heavily associated with innate immune response. The authors suggest that a Ly6G-deficient subset of neutrophils is found to be increased in abundance for the treatment groups, while gene expression remains consistent, which could indicate impaired function. Increased expression of CD4+ and CD8+ T cells along with their associated markers for proliferation and cytotoxicity is thought to be a result of activation following this decline in neutrophil-mediated immune response.

      Strengths:

      (1) Single-cell sequencing data can be very valuable in identifying potential health risks and clinical pathologies of lung conditions associated with e-cigarettes considering they are still relatively new.

      (2) Not many studies have been performed on cell-type specific differential gene expression following exposure to e-cig aerosols.

      (3) The assays performed address several factors of e-cig exposure such as metal concentration in the liquid and condensate, coil composition, cotinine/nicotine levels in serum and the product itself, cell types affected, which genes are up- or down-regulated and what pathways they control.

      (4)Considerations were made to ensure clinical relevance such as selecting mice whose ages corresponded with human adolescents so that the data collected was relevant.

      Weaknesses:

      The exposure period of 1 hour a day for 5 days is not representative of chronic use and this time point may be too short to see a full response in all cell types. The experimental design is not well-supported based on the literature available for similar mouse models.

      This study was not designed to study the effects of chronic exposures on lung tissues. We were interested in delineating the effect of acute exposures for which the proposed study design was chosen. Previous work by our group has performed similar exposures and has been well received by the community. We understand that chronic exposures will be interesting to look at, but that was beyond the scope of this pilot study. Longer / chronic exposures will be conducted considering disease modifying effects of e-cigarettes.

      Several claims lack supporting evidence or use data that is not statistically significant. In particular, there were no statistical analyses to compare results across sex, so conclusions stating there is a sex bias for things like Ly6G+ neutrophil percentage by condition are observational.

      We thank the reviewer for this observation, and we have now included the necessary validations and details of the sex-based statistical analyses in the revised version of this manuscript. 

      Statistical analyses lack rigor and are not always displayed with the most appropriate graphical representation.

      We thank the reviewer and have included all the necessary statistical details with more details in the revised manuscript.

      Overall, the paper and its discussion are relatively limited and do not delve into the significance of the findings or how they fit into the bigger picture of the field.

      As pointed out by the reviewers themselves the strength of this work is in the first ever scRNA seq analyses of mice exposed to differently flavored e-cig aerosols in vivo. We also show cellspecific differential gene expressions and address some of the major queries made around e-cig research including release of metals on a day-to-day basis from the same coil. The limited sample number makes it difficult to draw solid conclusions from this work, which has been discussed as a shortcoming. Nevertheless, the major strength of this work is not in identifying specific trends, but rather to determine the possible cell and gene targets to expand the study for longer (chronic) exposures with a larger sample group. We have mentioned the significance of the study with respect to vaping effects on cellular heterogeneity leading to deleterious effects.

      The manuscript lacks validation of findings in tissue by other methods such as staining.

      We have now included some validation experiments and revamped the revised manuscript to support scRNA seq findings.

      This paper provides a foundation for follow-up experiments that take a closer look at the effects of e-cig exposure on innate immunity. There is still room to elaborate on the differential gene expression within and between various cell types.

      We thank the reviewer for this observation. The cell numbers for some cell clusters (especially epithelial cells) were too low. So, though we have performed the differential gene expression analyses on all the cell clusters, we refrained from discussing it in the manuscript to avoid over interpretation of our results. Only clusters with high enough (> 150) cells per sex per group were used to plot the heatmaps. We have now included the cell numbers for each cell type in the revisions to allow better interpretation of our data. Furthermore, the raw data from this study will be freely available to the public upon publication of this manuscript. This would enable the interested readers to access the raw data and study the cell types of interest in detail based on their study requirements. This data will be a useful resource for all in this community to inform and design future studies. 

      Recommendation For The Author:

      Major comments

      Mouse experiments are extremely variable and an n of 2 is not enough. Because of the complexity of separating male and female mice, the analyses are not adequately powered to support conclusions. The two-way ANOVA style approach to consider sex as a separate variable was a great idea in Table S2 - but this was not used elsewhere, and there is a need to show the interaction statistic (which would say if there is a flavor effect dependent on sex).

      We thank the reviewers for this recommendation. We agree that the experiments are highly variable. However, it is not merely an outcome of a small sample size (which we address as one of the limitations). What is important to mention here is the fact that validating results from single cell technologies using regular molecular biology techniques is challenging and may not completely align. It is because we are comparing single cell population in the former and a heterogeneous cell population in latter. However, considering this comment, we have now toned down our conclusions and performed some extra experiments to validate single cell findings. We also provide the results from two-way ANOVA statistics for all the figures/experiments performed in this work. 

      More validatory data with PCR, immunostaining, and flow cytometry would be very helpful. This includes validating the neutrophil functional and phenotype data and the T-cell data by flow cytometry.

      To validate the presence of Ly6G+ and Ly6G- neutrophil population, we performed coimmunostaining experiments and proved that exposure to tobacco-flavored e-cig aerosols results in increase in cell percentages of two neutrophil population in female mice. We also re-analyzed our Flow cytometry data to align with scRNA seq results. Multiplex protein assay was another technique used to show altered innate/adaptive immune responses upon exposure to differently flavored e-cig aerosol. Of note, considering the short duration of exposure we did not identify significant changes in cell numbers or inflammatory responses. But we have now validated our scRNA seq results using various techniques to draw meaningful conclusions.

      The in vivo experimental design seems to model very short-term exposure. In the literature, including the papers cited in the references, much longer time points are used, extending from several weeks to months of exposure. There seem to be few examples of papers using 5-day exposure and those that do are inspired by traditional cigarette smoke rather than e-cig aerosols or model acute exposure by making the daily duration longer. It is important to consider the possibility that the greatest number of up- or down-regulated genes are found in immune cell populations solely because they are the first to be affected by e-cig exposure and the other cell types just do not have time to become dysregulated in 5 days.

      We thank the reviewers for this comment. We do not refute the fact that our observations of major changes in the immune cell population are due to the short duration of exposure. This was one of the first studies using single cell technologies to look at cell specific changes in the mouse lungs exposed to e-cig aerosols. However, the future experiments being conducted in our lab are using more controlled approach to mimic chronic exposures to e-cig aerosols to identify changes in other cell types and long-term effects of e-cig exposures in vivo. However, since this was not the focus of this work, we have not discussed it in detail.

      The validity of the claims pertaining to septal thickening and mean linear intercept (MLI) are questionable due to the poor lung inflation of the treatment group, which the authors acknowledge. Thus, MLI cannot be accurately used. It is contradictory to state that the fruit-flavored treatment group presented challenges with inflation but then concluded that there is a phenotype. In addition, inflation with low-melting agarose is not an ideal method because it does not use a liquid column to maintain constant pressure. For these metrics to be used and evaluated, it is imperative that all lobes are properly inflated. Therefore, these data should either be repeated or removed.

      We agree with this critique and have removed the MLI quantification from the revised manuscripts, we also do not make claims regarding much histological changes upon exposure. We suggest further work in future to get better understanding of the effect of differently flavored e-cig aerosol exposure on mouse lungs.

      What is the purpose of analyzing cell cycle scores? Why is it relevant that neutrophils are in G2M-phase? Figure 3B shows that neutrophils are clearly in both G1- and G2M-phase and this cluster includes both Ly6G+ and Ly6G- subsets, so it does not seem accurate to claim that they are in the G2M-phase of the cell cycle, nor does it reveal anything novel about Ly6G- neutrophils. Is it possible that the cell cycle score is noting a point in differentiation when neutrophils acquire/begin expressing Ly6G? Ly6G expression in neutrophils has been found to be associated with differentiation and maturation. To rule out the possibility that this is a cell state being identified, differential gene expression between the 2 neutrophil subsets should be shown in a volcano plot. It would also be useful to stain for Ly6G+/- neutrophils using either IF or RNAscope to prove they are present. If the claim is that Ly6G- neutrophils are a "unique" population, it must be established to what extent they are unique. Immune cells cluster together on UMAPs, so what if these are a different cell type entirely, like another immature myeloid lineage, and this is an artifact of clustering? This could be clarified with a trajectory analysis and further subsetting of the immune population.

      We thank the reviewers for this comment. We now realize that analyzing the cell cycle scores was not serving the intended purpose in this work. Moreover, due to the use of pooled samples for scRNA seq analyses, it may not be best to perform such downstream analyses in our datasets. We have thus removed these graphs from the revised version and have tried to simplify the conclusions of our study to the readers. 

      Our main take home from this study is the increase in number of mature (Ly6G+) and immature (Ly6G-) neutrophils in tobacco-flavored e-cig aerosol exposed mouse lungs as compared to air control. This result was validated using co-immunofluorescence in the revised manuscript (Figure 4).

      In vivo validation of findings should be included, especially for the claimed changes. As of now, this paper serves more as a dataset that could be further explored by other groups, which in itself is valuable, but it is just one single cell sequencing experiment without validation.

      We thank the reviewers for this comment. We have used multiple techniques (flow cytometry, multiplex protein assay, co-immunofluorescence) in the revised manuscript to validate the scRNA seq findings. However, this was a preliminary study which was designed to generate a small dataset for future experiments, and we do not have resources to add more validatory experiments for this study. We are currently designing chronic e-cig exposure studies to elaborate upon certain hypothesis generated through this study in future.

      Minor Comments

      There are several examples of typos or small errors in the text that would benefit from proofreading. Examples: line 51 "in the many countries including (the) United States (US), (the) United Kingdom..."; on line 54, the reference cited states that 9.4% of middle schoolers are daily users, not 9.2%; on line 55 the reference cited states that these are the most commonly used flavors, not the most preferred, which explains why the percentages do not add up to 100; line 120 "the lungs were in a collapsed state than the other groups"; line 127 "to confirm out speculations"; line 136 "PGVG" instead of the previously used "PG:VG"; line 140 "(single cell capture))"; line 999 "result in" rather than "results in" for Figure 4 title, etc.

      We thank the reviewer for this comment. The manuscript has been thoroughly proofread and edited to avoid typos and grammatical errors.

      If this is a "pilot study" (as it is stated in the introduction) it is meant to assess the validity of experimental design on a small scale to later test a hypothesis. The authors should change the phrasing.

      We have now changed the phrasing as suggested.

      The introduction lacked the necessary context and background. Some information described in the results section could be addressed in the intro. For example: What is the significance of neutrophils having a Ly6G deficiency? Why was the exposure duration of 1 hour a day for 5 days chosen? Why use nose-only exposure when many models use whole-body exposure? Why look at cell-type-specific changes?

      We have made the necessary amendments in the introduction.

      Some figure titles only address certain panels rather than summarizing the figure as a whole. For example, the title of Figure 1 only refers to panel D and is unrelated to serum cotinine levels, septa thickening, or mean linear intercept. The text discussed conclusions about septa thickening and Lm values for the fruit-flavored treatment group, so they are equally relevant to the figure compared to the metal levels.

      We have now changed the Figures and Figure legends to summarize the figure.

      significance level is not defined in Figure 1 legend although it is used in Figure 1C.

      The Figure legend has now been updated.

      Figure 1E does not include a scale bar.

      We have now included the scale bar in updated figures.

      The multiplex ELISA shown in the experimental design schematic is not further discussed in the paper. Flow cytometry plots should be displayed in addition to the data they generated.

      The flow cytometry plots have now been included (Figures 3&5) and the results for Multiplex ELISA are shown as Figure S3D and lines 327-342 of the revised manuscript.

      In Figure 1F, a multivariate ANOVA should be used so that multiple groups can be compared across sex, rather than plotting in a sex-specific manner and claiming there exists a sex bias. The small sample size also introduces an issue because a p-value cannot be generated with so few samples.

      Per the suggestions made previously, figure 1F has now been removed from the revised manuscript.

      The protocol for achieving a single-cell suspension should be detailed in the methods section. As is, it only describes the sample collection and preparation. This could help elucidate to the reader why the UMAP shows such a large abundance of immune cells.

      We have now included the protocol in the revised manuscript.

      Clarify whether PG:VG was used as a control in the scRNA sequencing in addition to air to generate the UMAP in Figure 2A.

      Yes, PG:VG was used as one of the controls which has now been illustrated as groupwise comparison in Figure 2D. We have also included the comparisons to identify DEGs in myeloid and lymphoid clusters upon comparison of various treatment groups versus PGVG (Supplementary Files S13-S18)

      A UMAP should be shown for each treatment group/flavor. The overall UMAP in Figure 1A is good, but there could be another panel with separate projections for each condition.

      A groupwise UMAP has now been included in Figure 2D.

      In Figure 2C, relative cell percentage is not a reliable method to quantify cell type and the histogram is not a great way to visualize the data or its statistical significance. These claims should also be validated in tissue.

      We thank the reviewers for this comment and have tried to validate the findings using Flow cytometry. However, we may want to add that the changes observed in single cell technologies cannot be validated using simple molecular biology techniques as the markers used to specify cell clusters in scRNA seq is too specific which was not the case for the design of flow panel in this work. Our major purpose of using cell percentages was to show the flavor-specific changes in generalized cell populations in mouse lungs. So, we have still included these graphs in the revised manuscript.

      Figure 2D could be better illustrated with a volcano plot to show which genes are being dysregulated rather than just how many. Knowing which genes are affected is more valuable than knowing just the number of genes.

      Figure 2D is no longer a part of the revised manuscript. For the other comparisons we have still used heatmaps as they also depict sex-specific changes in gene expressions, which would have been difficult to elucidate using volcano plots.

      Assuming Figure 3C is representative of all conditions, then Figures 3C and D demonstrate that Ly6G- neutrophils are present in all conditions including controls. To see whether they are truly present in different abundances between treatment and control groups, separate UMAPs of the neutrophil subsets should be made per condition or use a dot plot for Figure 3A. This also applies to Figure 3B.

      We thank the reviewers for pointing this out. We have now revamped the whole manuscript and used additional validation experiments to show the presence of Ly6G- and Ly6G+ neutrophil population upon exposure to tobacco-flavored e-cig aerosols. 

      Figure 3E shows that there is no statistically significant change in % of Ly6G+ neutrophils across treatment groups, but the text claims that there is "an increase in the levels of Ly6G+ neutrophils in lung digests from mouse lungs exposed to tobacco-flavored e-cig aerosols" (lines 207-209). The text also claims that "The observed increase was more pronounced in males as compared to females" (lines 209-210), but there was no statistical analysis across sexes to support this statement. It is clear that the change in % of Ly6G+ neutrophils is more pronounced in males than females, but it is still not statistically significant. This figure should also be repeated for analysis of Ly6G- neutrophils. Lines 272-274 mention that the % increase is higher for Ly6G- neutrophils than for Ly6G+ neutrophils, but there is not an analogous histogram to demonstrate this. The claims made in lines 275-280 are not clearly shown in any figure.

      We thank the reviewer for this query. This was an error on our part. We have now added sex-specific changes using scRNA seq, flow cytometry and co-immunofluorescence-based experiments to prove that more pronounces changes in the Ly6G+ and Ly6G- neutrophil population occurs in female mice and not males.

      Figures 4 and 6 have an overwhelming amount of heatmaps. Volcano plots with downstream analyses could be used to make some of this data more legible. The main findings should be validated in vivo/in tissue.

      We have now revamped the figures and data distribution to make the data legible and remove overwhelming amount of data from the slides.

      For Figure 5, show cell type by condition and do differential gene expression analysis displayed in a volcano plot. Then, stain tissue to validate the findings. Compare across sex during statistical analysis.

      The necessary changes have been made.

      Figure 6 error: panels E and F should be labeled as "tobacco" rather than "fruit".

      Error has now been fixed.

      Figure 7C can be placed in the supplemental materials.

      It has now been included in supplemental materials.

      The Figure 6E title should have been tobacco instead of fruit.

      This error has now been fixed.

      Line 381 mentioned the wrong subfigure. (Figure 7B instead of 7E).

      We have now made the necessary edits.

    4. eLife Assessment

      This manuscript by Kaur et al. identifies differential gene expression observed in distinct cell populations, namely myeloid and lymphoid cells, upon short-term exposure to e-cig aerosols with various flavors. Their findings are useful because they provide a single cell sequencing data resource for assessing which genes and cellular pathways are most affected by e-cig aerosols and their components. However, the evidence is incomplete due to limited analyses and replicates per condition, as well as the lack of in vivo validation.

    5. Reviewer #1 (Public review):

      Summary:

      The authors assess the impact of E-cigarette smoke exposure on mouse lungs using single cell RNA sequencing. Air was used as control and several flavors (fruit, menthol, tobacco) were tested. Differentially expressed genes (DEGs) were identified for each group and compared against the air control. Changes in gene expression in either myeloid or lymphoid cells were identified for each flavor and the results varied by sex. The scRNAseq dataset will be of interest to the lung immunity and e-cig research communities and some of the observed effects could be important. Unfortunately, the revision did not address the reviewers' main concerns about low replicate numbers and lack of validations. The study remains preliminary and no solid conclusions could be drawn about the effects of E-cig exposure as a whole or any flavor-specific phenotypes.

      Strengths:

      The study is the first to use scRNAseq to systematically analyze the impact of e-cigarettes on the lung. The dataset will be of broad interest.

      Weaknesses:

      scRNAseq studies may have low replicate numbers due to the high cost of studies but at least 2 or 3 biological replicates for each experimental group is required to ensure rigor of the interpretation. This study had only N=1 per sex per group and some sex-dependent effects were observed. This could have been remedied by validating key observations from the study using traditional methods such as flow cytometry and qPCR, but the limited number of validation experiments did not support the conclusions of the scRNAseq analysis. An important control group (PG:VG) had extremely low cell numbers and was basically not useful. Statistical analysis is lacking in almost all figures. Overall, this is a preliminary study with some potentially interesting observations but no solid conclusions can be made from the data presented.

      (1) The only new validation experiment is the immunofluorescent staining of neutrophils in Figure 4. The images are very low resolution and low quality and it is not clear which cells are neutrophils. S100A8 (calprotectin) is highly abundant in neutrophils but not strictly neutrophil-specific. It's hard to distinguish positive cells from autofluorescence in both Ly6g and S100a8 channels. No statistical analysis in the quantification.

      (2) It is unclear what the meaning of Fig. 3A and B is, since these numbers only reflect the number of cells captured in the scRNAseq experiment and are not biologically meaningful. Flow cytometry quantification is presented as cell counts, but the percentage of cells from the CD45+ gate should be shown. No statistical analysis is shown, and flow cytometry results do not support the conclusions of scRNAseq data.

    1. eLife Assessment

      This fundamental study uncovers the unique molecular features of Arabidopsis phloem companion cells that highly express FLOWERING LOCUS T (FT). These FT-expressing cells constitute a distinct subpopulation marked by elevated ATP biosynthesis and co-expression of small mobile proteins such as FLP1 and BFT, highlighting a fine balance between florigen and anti-florigen signals. Motif analyses and transgenic studies further identify NIGT1 transcription factors as direct, nitrogen-inducible repressors of FT, providing a mechanism for delayed flowering under nitrogen-rich conditions. Together, the compelling findings show that florigen-producing companion cells integrate energy metabolism, systemic protein signals, and nutrient-responsive repression to fine-tune the seasonal and nutritional regulation of flowering.

    2. Reviewer #1 (Public review):

      Summary:

      The authors revealed the cellular heterogeneity of companion cells (CCs) and demonstrated that the florigen gene FT is highly expressed in a specific subpopulation of these CCs in Arabidopsis. Through a thorough characterization of this subpopulation, they further identified NITRATE-INDUCIBLE GARP-TYPE TRANSCRIPTIONAL REPRESSOR 1 (NIGT1)-like transcription factors as potential new regulators of FT. Overall, these findings are intriguing and valuable, contributing significantly to our understanding of florigen and the photoperiodic flowering pathway. However, there is still room for improvement in the quality of the data and the depth of the analysis. I have several comments that may be beneficial for the authors.

      Strengths:

      The usage of snRNA-seq to characterize the FT-expressing companion cells (CCs) is very interesting and important. Two findings are novel: 1) Expression of FT in CCs is not uniform. Only a subcluster of CCs exhibits high expression level of FT. 2) Based on consensus binding motifs enriched in this subcluster, they further identify NITRATE-INDUCIBLE GARP-TYPE TRANSCRIPTIONAL REPRESSOR 1 (NIGT1)-like transcription factors as potential new regulators of FT.

      Weaknesses:

      (1) Title: "A florigen-expressing subpopulation of companion cells". It is a bit misleading. The conclusion here is that only a subset of companion cells exhibit high expression of FT, but this does not imply that other companion cells do not express it at all.

      (2) Data quality: Authors opted for fluorescence-activated nuclei sorting (FANS) instead of traditional cell sorting method. What is the rationale behind this decision? Readers may wonder, especially given that RNA abundance in single nuclei is generally lower than that in single cells. This concern also applies to snRNA-seq data. Specifically, the number of genes captured was quite low, with a median of only 149 genes per nucleus. Additionally, the total number of nuclei analyzed was limited (1,173 for the pFT:NTF and 3,650 for the pSUC2:NTF). These factors suggest that the quality of the snRNA-seq data presented in this study is quite low. In this context, it becomes challenging for the reviewer to accurately assess whether this will impact the subsequent conclusions of the paper. Would it be possible to repeat this experiment and get more nuclei?

      (3) Another disappointment is that the authors did not utilize reporter genes to identify the specific locations of the FT-high expressing cells (cluster 7 cells) within the CC population in vivo. Are there any discernible patterns that can be observed?

      (4) The final disappointment is that the authors only compared FT expression between the nigtQ mutants and the wild type. Does this imply that the mutant does not have a flowering time defect particularly under high nitrogen conditions?

      Comments on revisions:

      I think the authors took my comments seriously and addressed most of my concerns. Overall, I find this to be a very interesting paper.

    3. Reviewer #2 (Public review):

      This manuscript submitted by Takagi et al. details the molecular characterization of the FT-expressing cell at a single-cell level. The authors examined what genes are expressed specifically in FT-expressing cells and other phloem companion cells by exploiting bulk nuclei and single-nuclei RNA-seq and transgenic analysis. The authors found the unique expression profile of FT-expressing cells at a single-cell level and identified new transcriptional repressors of FT such as NIGT1.2 and NIGT1.4.

      Although previous researchers have known that FT is expressed in phloem companion cells, they have tended to neglect the molecular characterization of the FT-expressing phloem companion cells. To understand how FT, which is expressed in tiny amounts in phloem companion cells that make up a very small portion of the leaf, can be a key molecule in the regulation of the critical developmental step of floral transition, it is important to understand the molecular features of FT-expressing cells in detail. In this regard, this manuscript provides insight into the understanding of detailed molecular characteristics of the FT-expressing cell. This endeavor will contribute to the research field of flowering time.

      During the initial review process, I proposed the following two points for improving this manuscript:

      (1) The most noble finding of this manuscript is the identification of NTGI1.2 as the upstream regulator of FT-expressing cluster 7 gene expression. The flowering phenotypes of the nigtQ mutant and the transgenic plants in which NIGT1.2 was expressed under the SUC2 gene promoter support that NIGT1.2 functions as a floral repressor upstream of the FT gene. Nevertheless, the expression patterns of NIGT1.2 genes do not appear to have much overlap with those of NIGT1.2-downstream genes in the cluster 7 (Figs S14 and F3). An explanation for this should be provided in the discussion section.

      (2) To investigate gene expression in the nuclei of specific cell populations, the authors generated transgenic plants expressing a fusion gene encoding a Nuclear Targeting Fusion protein (NTF) under the control of various cell type-specific promoters. Since the public audience would not know about NTF without reading reference 16, some explanation of NTF is necessary in the manuscript. Please provide a schematic of the constructs the authors used to make the transformants.

      The revised manuscript has addressed my comments well. I am deeply grateful for the authors' efforts to address concerns raised by me and other reviewers.<br /> I have no doubt that the manuscript in its current form is worthy of publication in this journal and will provide valuable insights into flowering time for many readers.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      The authors revealed the cellular heterogeneity of companion cells (CCs) and demonstrated that the florigen gene FT is highly expressed in a specific subpopulation of these CCs in Arabidopsis. Through a thorough characterization of this subpopulation, they further identified NITRATE-INDUCIBLE GARP-TYPE TRANSCRIPTIONAL REPRESSOR 1 (NIGT1)-like transcription factors as potential new regulators of FT. Overall, these findings are intriguing and valuable, contributing significantly to our understanding of florigen and the photoperiodic flowering pathway. However, there is still room for improvement in the quality of the data and the depth of the analysis. I have several comments that may be beneficial for the authors. 

      Strengths: 

      The usage of snRNA-seq to characterize the FT-expressing companion cells (CCs) is very interesting and important. Two findings are novel: 1) Expression of FT in CCs is not uniform. Only a subcluster of CCs exhibits high expression level of FT. 2) Based on consensus binding motifs enriched in this subcluster, they further identify NITRATE-INDUCIBLE GARP-TYPE TRANSCRIPTIONAL REPRESSOR 1 (NIGT1)-like transcription factors as potential new regulators of FT. 

      We are pleased to hear that reviewer 1 noted the novelty and importance of our work. As reviewer 1 mentioned, we are also excited about the identification of a subcluster of companion cells with very high FT expression. We believe that this work is an initial step to describe the molecular characteristics of these FT-expressing cells. We are also excited to share our new findings on NIGT1s as potential FT regulators. We believe this finding will attract a broader audience, as the molecular factor coordinating plant nutrition status with flowering time remains largely unknown despite its well-known phenomenon.

      Weaknesses: 

      (1) Title: "A florigen-expressing subpopulation of companion cells". It is a bit misleading. The conclusion here is that only a subset of companion cells exhibit high expression of FT, but this does not imply that other companion cells do not express it at all. 

      We agree with this comment, as it was not our intention to sound like that FT is not produced in other companion cells than the subpopulation we identified. We revised the title to more accurately reflect the point. The new title is “Companion cells with high florigen production express other small proteins and reveal a nitrogen-sensitive FT repressor.”

      (2) Data quality: Authors opted for fluorescence-activated nuclei sorting (FANS) instead of traditional cell sorting method. What is the rationale behind this decision? Readers may wonder, especially given that RNA abundance in single nuclei is generally lower than that in single cells. This concern also applies to snRNA-seq data. Specifically, the number of genes captured was quite low, with a median of only 149 genes per nucleus. Additionally, the total number of nuclei analyzed was limited (1,173 for the pFT:NTF and 3,650 for the pSUC2:NTF). These factors suggest that the quality of the snRNA-seq data presented in this study is quite low. In this context, it becomes challenging for the reviewer to accurately assess whether this will impact the subsequent conclusions of the paper. Would it be possible to repeat this experiment and get more nuclei?

      We appreciate this comment; we noticed that we did not clearly explain the rationale for using single-nucleus RNA sequencing (snRNA-seq) instead of single-cell RNA-seq (scRNA-seq). As reviewer 1 mentioned, RNA abundance in scRNA-seq is higher than in snRNA-seq. To conduct scRNA-seq using plant cells, protoplasting is the necessary step. However, in our study, protoplasting has many drawbacks in isolating our target cells from the phloem. First, it is technically challenging to efficiently isolate protoplasts from highly embedded phloem companion cells from plant tissues. Typically, at least several hours of enzymatic incubation are required to obtain protoplasts from companion cells (often using semi-isolated vasculatures), and the efficiency of protoplasting vasculature cells remains low. Secondly, for our analysis, restoring the time information within a day is also crucial. Therefore, we employed a more rapid isolation method. In the revision, we will explain our rationale for choosing snRNA-seq due to the technical limitations. In the revised manuscripts, we added four new sentences in the Introduction section to clearly explain these points.

      Reviewer 1 also raised a concern about the quality of our snRNA-seq data, referring to the relatively low readcounts per nucleus. Although we believe that shallow reads do not necessarily indicate low quality and are confident in the accuracy of our snRNA-seq data, as supported by the detailed follow-up experiments (e.g., imaging analysis in Fig. 4B), we agree that it is important to address this point in the revision and alleviate readers’ concerns regarding the data quality. 

      We believe the primary reason for the low readcounts per cell is the small amount of RNA present in each Arabidopsis vascular cell nucleus that we isolated. For bulk nuclei RNAseq, we collected 15,000 nuclei. However, the total RNA amount was approximately 3 ng. It indicates that each nucleus isolated contains a very limited amount of RNA (by the simple calculation, 3,000 pg / 15,000 nuclei = 0.2 pg/nucleus). It appears that the size of cells and nuclei was still small in 2-week-old seedlings; thus, each nucleus may contain lower levels of RNA. During the optimization process, we also tried to fix the tissues that we hoped to restore nuclear retained RNA, but unfortunately, in our hands, we encountered the technical issue of nuclei aggregation that hindered the sorting process, which is not suitable for single-nucleus RNA-seq.

      Reviewer 1 suggested that we repeat the same snRNA-seq experiment. We agree that having more cells increases the reliability of data. However, to our knowledge, higher cell numbers enhance the confidence of clustering, but not readcounts per cell. In our snRNAseq data, our target, FT-expressing cells, were observed in cluster 7, which projected at an obvious distance from other cell clusters. Therefore, we think that having more nuclei does not significantly help in separating high FT-expressing cluster 7 cells and different types of cells, although we may obtain more DEGs from the cluster 7 cells. Considering the costs and time required for additional snRNA-seq experiments, we think that adding more followup molecular biology experiment data would be more practical. We clearly stated the limitations of our approach in the Discussion section. “A drawback of our snRNA-seq analysis was shallow reads per nucleus. It appears mainly due to the low abundance of mRNA in nuclei from 2-week-old leaves. Based on our calculation, the average mRNA level per nucleus is approximately 0.2 pg (3,000 pg mRNA from 15,000 sorted nuclei). Future technological advance is needed to improve the data quality“

      In this revised version of the manuscript, we silenced FT gene expression using an amiRNA against FT driven by tissue-specific promoters [pROXY10, cluster 7; pSUC2, companion cells; pPIP2.6, cluster 4 (for the spatial expression pattern of PIP2.6, please see the new data shown in Fig. S8F); pGC1, guard cells]. Given that both FT and ROXY10 were highly expressed in cluster 7 of our snRNA-seq dataset, we anticipated the late flowering phenotype of pROXY10:amiRNA-ft. As we expected, pROXY10:amiR-ft but not pPIP2.6:amiR-ft lines showed delayed flowering phenotypes (Fig. S14A), supporting the validity of our snRNA-seq approach. We are also now more confident in the resolution of our snRNA-seq analysis, since cluster 4-specific PIP2.6 did not cause late flowering despite its higher basal expression than ROXY10 (Fig. S14B).

      (3) Another disappointment is that the authors did not utilize reporter genes to identify the specific locations of the FT-high expressing cells (cluster 7 cells) within the CC population in vivo. Are there any discernible patterns that can be observed? 

      In the original manuscript, as we showed only limited spatial images of overlap between FT and other cluster 7 genes in Fig. 4B, this comment is totally understandable. To respond to it, we added whole leaf images showing the spatial expression of FT and other cluster 7 genes (Fig. S12). These data indicate that cluster 7 genes including FT are expressed highly in minor veins in the distal part of the leaf but weakly in the main vein. We also added enlarged images of spatial expression of FT and cluster 7 genes (FLP1 and ROXY10) to note that those genes do not overlap completely (Fig. S13).

      In contrast to cluster 7 genes, genes highly expressed in cluster 4, such as LTP1 and MLP28, are reportedly highly expressed in the main leaf vein. To further confirm it, we established a transgenic line that expresses a GFP-fusion protein controlled by the promoter of a cluster 4-specific gene PIP2.6 (Fig. S8F). It also showed strong GFP signals in the main vein, consistent with previous observations of LTP1 and MLP28.   In summary, FT-expressing cells (cluster 7 cells) are enriched in companion cells in the minor vein, and their expression patterns show a clear distinction from genes expressed in the main vein (e.g., cluster 4-specific genes). 

      (4) The final disappointment is that the authors only compared FT expression between the nigtQ mutants and the wild type. Does this imply that the mutant does not have a flowering time defect particularly under high nitrogen conditions? 

      We agree with reviewer 1 that more experiments are required to conclude the role of NIGT1 on FT regulation, in addition to our Y1H data, flowering time data of NIGT1 overexpressors, and FT expression in NIGT1 overexpressors and nigtQ mutant.

      First, to test the direct regulation of NIGT1s on FT transcription, we conducted a transient luciferase (LUC) assay in tobacco leaves using effectors (p35S:NIGT1.2, p35S:NIGT1.4, and p35S:GFP) and reporters [pFT:LUC (FT promoter fused with LUC) and pFTm:LUC (the same FT promoter with mutations in NIGT1-binding sites fused with LUC)]. Our result showed that NIGT1.2 and NIGT1.4, but not GFP, decreased the activity of pFT:LUC but not pFTm:LUC (Fig. 5C). This indicates that NIGT1s directly repress the FT gene.

      Second, to address reviewer 1’s suggestion about the effect of of nigtQ mutation on flowering time, we have grown WT and nigtQ plants on 20 mM and 2 mM NH<sub>4</sub>NO<sub>3</sub>. Under 20 mM NH<sub>4</sub>NO<sub>3</sub>, the nigtQ line bolted at earlier days than WT; under 2 mM NH<sub>4</sub>NO<sub>3</sub>, nigtQ and WT bolted at almost same timing (Fig. S17D and E). This result suggests that the nigtQ mutation affects flowering timing depending on nitrogen nutrient status. However, leaf numbers of bolted plants were not different between WT and nigtQ lines (Fig. S17E). Therefore, it appears that nigtQ mutation also accelerated overall growth of plants rather than flowering promotion. We also have measured flowering time by counting leaf numbers of the nigtQ and WT plants at bolting on nitrogen-rich soil. The mutant generated slightly more leaves than WT when they flowered (Fig. S17G). These results suggest that the NIGT-derived fine-tuning of FT regulation is conditional on higher nitrogen conditions. 

      Minor: 

      (1) Abstract: "Our bulk nuclei RNA-seq demonstrated that FT-expressing cells in cotyledons and in true leaves differed transcriptionally.". This sentence is not informative. What exactly is the difference in FT-expressing cells between cotyledons and true leaves? 

      We modified the sentence to clarify the differences between cotyledons and true leaves. “Our bulk nuclei RNA-seq demonstrated that FT-expressing cells in cotyledons and true leaves showed differences especially in FT repressor genes.”

      (2) As a standard practice, to support the direct regulation of FT by NIGT1, the authors should provide EMSA and ChIP-seq data. Ideally, they should also generate promoter constructs with deletions or mutations in the NIGT1 binding sites. 

      To test direct interaction of NIGT1 to the FT promoter sequences, we performed the transient reporter assay using FT promoter driven luciferase reporter (Fig. 5C). NIGT1.2 and NIGT1.4 repressed the FT promoter activity; however, with NIGT1 binding site mutations, this repression was not observed, indicating that NIGT1 binds to the ciselements in the FT promoter to repress its transcription.

      (3) Sorting: Did the authors fix the samples before preparing the nuclei suspension? If not, could this be the reason the authors observed the JA-responsive clusters (Fig. 2J)? Please provide more details related to nuclei sorting in the Methods section. 

      We added a new subsection in the Materials and Methods section to explain a detail of the nuclei sorting procedure. We did not include a sample fixation step. We have tried formaldehyde fixation; however, it clumped nuclei, which was not suitable for snRNA-seq. Moreover, fixation steps generally reduce readcounts of single-cell RNA-seq according to the 10X Genomics’ guideline.

      We agree that JA responses were triggered during the FANS nuclei isolation. Therefore, we added the following sentence. “Since our FANS protocol did not include a sample fixation step to avoid clumping, these cells likely triggered wounding responses during the chopping and sorting process (Fig. S1B).  

      Reviewer #2 (Public review): 

      This manuscript submitted by Takagi et al. details the molecular characterization of the FTexpressing cell at a single-cell level. The authors examined what genes are expressed specifically in FT-expressing cells and other phloem companion cells by exploiting bulk nuclei and single-nuclei RNA-seq and transgenic analysis. The authors found the unique expression profile of FT-expressing cells at a single-cell level and identified new transcriptional repressors of FT such as NIGT1.2 and NIGT1.4. 

      Although previous researchers have known that FT is expressed in phloem companion cells, they have tended to neglect the molecular characterization of the FT-expressing phloem companion cells. To understand how FT, which is expressed in tiny amounts in phloem companion cells that make up a very small portion of the leaf, can be a key molecule in the regulation of the critical developmental step of floral transition, it is important to understand the molecular features of FT-expressing cells in detail. In this regard, this manuscript provides insight into the understanding of detailed molecular characteristics of the FT-expressing cell. This endeavor will contribute to the research field of flowering time. 

      We are grateful that reviewer 2 recognizes the importance of transcriptome profiling of FTexpressing cells at the single-cell level.

      Here are my comments on how to improve this manuscript. 

      (1) The most noble finding of this manuscript is the identification of NTGI1.2 as the upstream regulator of FT-expressing cluster 7 gene expression. The flowering phenotypes of the nigtQ mutant and the transgenic plants in which NIGT1.2 was expressed under the SUC2 gene promoter support that NIGT1.2 functions as a floral repressor upstream of the FT gene. Nevertheless, the expression patterns of NIGT1.2 genes do not appear to have much overlap with those of NIGT1.2-downstream genes in the cluster 7 (Figs S14 and F3). An explanation for this should be provided in the discussion section. 

      We agree with reviewer 2 that the spatial expression patterns of NIGT1.2 and cluster 7 genes do not overlap much, and some discussion should be provided in the manuscript. Although we do not have a concrete answer for this phenomenon, we obtained the new data showing that NIGT1.2 and NIGT1.4 directly repress the FT gene in planta (Fig. 5C).  As NIGT1.2/1.4 are negative regulators of FT, it is plausible that NIGT1.2/1.4 may suppress FT gene expression in non-cluster 7 cells to prevent the misexpression of FT. We added this point in the Results section.

      (2) To investigate gene expression in the nuclei of specific cell populations, the authors generated transgenic plants expressing a fusion gene encoding a Nuclear Targeting Fusion protein (NTF) under the control of various cell type-specific promoters. Since the public audience would not know about NTF without reading reference 16, some explanation of NTF is necessary in the manuscript. Please provide a schematic of constructs the authors used to make the transformants.

      As reviewer 2 pointed out, we lacked a clear explanation of why we used NTF in this study. NTF is the fusion protein that consists of a nuclear envelope targeting WPP domain, GFP, and a biotin acceptor peptide. It was initially designed for the INTACT (isolation of nuclei tagged in specific cell types) method, which enables us to isolate bulk nuclei from specific tissues. Although our original intention was to profile the bulk transcriptome of mRNAs that exist in nuclei of the FT-expressing cells using INTACT, we utilized our NTF transgenic lines for snRNA-seq analysis. To explain what NTF is to readers, we included a schematic diagram of NTF (Fig. S1A) and more explanation about NTF in the Results section.

      Again, we appreciate all reviewers’ careful and constructive comments. With these changes, we hope our revised manuscript is now satisfactory.

    1. eLife Assessment

      This manuscript presents an important finding that D1- and D2-striatal neurons receive distinct cortical inputs, offering key insights into corticostriatal function. For instance, in the context of striatal-dependent learning, this distinction is highly informative for interpreting synaptic physiology data, particularly when inputs to one neuron subtype may change independently of the other. The strength of the evidence is solid, with anatomical and electrophysiological findings aligning well with results from optogenetic and behavioral studies. The study would be of interest to neuroscientists studying basal ganglia circuits in health and disease.

    2. Joint Public Review:

      Summary:

      Klug et al. use monosynaptic rabies tracing of inputs to D1- vs D2-SPNs in the striatum to study how separate populations of cortical neurons project to D1- and D2-SPNs. They use rabies to express ChR2, then patch D1-or D2-SPNs to measure synaptic input. They report that cortical neurons labeled as D1-SPN-projecting preferentially project to D1-SPNs over D2-SPNs. In contrast, cortical neurons labeled as D2-SPN-projecting project equally to D1- and D2-SPNs. They go on to conduct pathway-specific behavioral stimulation experiments. They compare direct optogenetic stimulation of D1- or D2-SPNs to stimulation of MCC inputs to DMS and M1 inputs to DLS. In three different behavioral assays (open field, intra-cranial self-stimulation, and a fixed ratio 8 task), they show that stimulating MCC or M1 cortical inputs to D1-SPNs is similar to D1-SPN stimulation, but that stimulating MCC or M1 cortical inputs to D2-SPNs does not recapitulate the effects of D2-SPN stimulation (presumably because both D1- and D2-SPNs are being activated by these cortical inputs).

      Strengths:

      Showing these same effects in three distinct behaviors is strong. Overall, the functional verification of the consequences of the anatomy is very nice to see. It is a good choice to patch only from mCherry-negative non-starter cells in the striatum. This study adds to our understanding of the logic of corticostriatal connections, suggesting a previously unappreciated structure.

      Editors' note:

      The concerns raised by Reviewers #1, and #2, have been addressed during the first round of revision. The specific concern raised by Reviewer #3 is about the Rabis virus-based circuit tracing itself. This version of the work has been assessed by the editors without going back to the reviewers.

    3. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review): 

      Summary: 

      The study by Klug et al. investigated the pathway specificity of corticostriatal projections, focusing on two cortical regions. Using a G-deleted rabies system in D1-Cre and A2a-Cre mice to retrogradely deliver channelrhodopsin to cortical inputs, the authors found that M1 and MCC inputs to direct and indirect pathway spiny projection neurons (SPNs) are both partially segregated and asymmetrically overlapping. In general, corticostriatal inputs that target indirect pathway SPNs are likely to also target direct pathway SPNs, while inputs targeting direct pathway SPNs are less likely to also target indirect pathway SPNs. Such asymmetric overlap of corticostriatal inputs has important implications for how the cortex itself may determine striatal output. Indeed, the authors provide behavioral evidence that optogenetic activation of M1 or MCC cortical neurons that send axons to either direct or indirect pathway SPNs can have opposite effects on locomotion and different effects on action sequence execution. The conclusions of this study add to our understanding of how cortical activity may influence striatal output and offer important new clues about basal ganglia function. 

      The conceptual conclusions of the manuscript are supported by the data, but the details of the magnitude of afferent overlap and causal role of asymmetric corticostriatal inputs on some behavioral outcomes may be a bit overstated given technical limitations of the experiments. 

      For example, after virally labeling either direct pathway (D1) or indirect pathway (D2) SPNs to optogenetically tag pathway-specific cortical inputs, the authors report that a much larger number of "non-starter" D2-SPNs from D2-SPN labeled mice responded to optogenetic stimulation in slices than "non-starter" D1 SPNs from D1-SPN labeled mice did. Without knowing the relative number of D1 or D2 SPN starters used to label cortical inputs, it is difficult to interpret the exact meaning of the lower number of responsive D2-SPNs in D1 labeled mice (where only ~63% of D1-SPNs themselves respond) compared to the relatively higher number of responsive D1-SPNs (and D2-SPNs) in D2 labeled mice. While relative differences in connectivity certainly suggest that some amount of asymmetric overlap of inputs exists, differences in infection efficiency and ensuing differences in detection sensitivity in slice experiments make determining the degree of asymmetry problematic. 

      It is also unclear if retrograde labeling of D1-SPN- vs D2-SPN- targeting afferents labels the same densities of cortical neurons. This gets to the point of specificity in some of the behavioral experiments. If the target-based labeling strategies used to introduce channelrhodopsin into specific SPN afferents label significantly different numbers of cortical neurons, might the difference in the relative numbers of optogenetically activated cortical neurons itself lead to behavioral differences? 

      We thank the reviewer for the comments and for raising additional interpretations of our results. We agree that determining the relative number of D1- versus D2-SPN starter cells would allow a more accurate estimate of connectivity. However, due to current technical limitations, achieving this level of precision remains challenging. As the reviewer also noted, differences in the number of cortical neurons targeting D1- versus D2-SPNs could introduce additional complexity to the functional effects observed in the behavioral experiments. Moreover, functional heterogeneity is likely to exist not only among cortical neurons projecting to striatal D1- or D2-SPNs, but also within the striatal D1- and D2-SPN populations themselves. Addressing these questions at the single-neuron level will require more refined viral tools in combination with improved recording and manipulation techniques. Despite these limitations, our results suggest that a subpopulation of cortical neurons selectively targets striatal D1-SPNs, supporting a functional dichotomy of pathway-specific corticostriatal subcircuits in the control of behavior.   

      Reviewer #2 (Public review): 

      Summary: 

      Klug et al. use monosynaptic rabies tracing of inputs to D1- vs D2-SPNs in the striatum to study how separate populations of cortical neurons project to D1- and D2-SPNs. They use rabies to express ChR2, then patch D1-or D2-SPNs to measure synaptic input. They report that cortical neurons labeled as D1-SPN-projecting preferentially project to D1-SPNs over D2-SPNs. In contrast, cortical neurons labeled as D2-SPN-projecting project equally to D1- and D2-SPNs. They go on to conduct pathway-specific behavioral stimulation experiments. They compare direct optogenetic stimulation of D1- or D2-SPNs to stimulation of MCC inputs to DMS and M1 inputs to DLS. In three different behavioral assays (open field, intra-cranial self-stimulation, and a fixed ratio 8 task), they show that stimulating MCC or M1 cortical inputs to D1-SPNs is similar to D1-SPN stimulation, but that stimulating MCC or M1 cortical inputs to D2-SPNs does not recapitulate the effects of D2-SPN stimulation (presumably because both D1- and D2-SPNs are being activated by these cortical inputs). 

      Strengths: 

      Showing these same effects in three distinct behaviors is strong. Overall, the functional verification of the consequences of the anatomy is very nice to see. It is a good choice to patch only from mCherry-negative non-starter cells in the striatum. This study adds to our understanding of the logic of corticostriatal connections, suggesting a previously unappreciated structure. 

      Weaknesses: 

      One limitation is that all inputs to SPNs are expressing ChR2, so they cannot distinguish between different cortical subregions during patching experiments. Their results could arise because the same innervation patterns are repeated in many cortical subregions or because some subregions have preferential D1-SPN input while others do not. 

      Thank you for raising this thoughtful concern. It is indeed not feasible to restrict ChR2 expression to a specific cortical region using the first-generation rabies-ChR2 system alone. A more refined approach would involve injecting Cre-dependent TVA and RG into the striatum of D1- or A2A-Cre mice, followed by rabies-Flp infection. Subsequently, a Flp-dependent ChR2 virus could be injected into the MCC or M1 to selectively label D1- or D2-projecting cortical neurons. This strategy would allow for more precise targeting and address many of the current limitations.

      However, a significant challenge lies in the cytotoxicity associated with rabies virus infection. Neuronal health begins to deteriorate substantially around 10 days post-infection, which provides an insufficient window for robust Flp-dependent ChR2 expression. We have tested several new rabies virus variants with extended survival times (Chatterjee et al., 2018; Jin et al., 2024), but unfortunately, they did not perform effectively or suitably in the corticostriatal systems we examined.

      In our experimental design, the aim is to delineate the connectivity probabilities to D1 or D2-SPNs from cortical neurons. Our hypothesis considered includes the possibility that similar innervation patterns could occur across multiple cortical subregions, or that some subregions might show preferential input to D1-SPNs while others do not, or a combination of both scenarios. This leads us to perform a series behavior test that using optogenetic activation of the D1- or D2-projecting cortical populations to see which could be the case.

      In the cortical areas we examined, MCC and M1, during behavioral testing, there is consistency with our electrophysiological results. Specifically, when we stimulated the D1-projecting cortical neurons either in MCC or in M1, mice exhibited facilitated local motion in open field test, which is the same to the activation of D1 SPNs in the striatum along (MCC: Fig 3C & D vs. I; M1: Fig 3F & G vs. L). Conversely, stimulation of D2-projecting MCC or M1 cortical neurons resulted in behavioral effects that appeared to combine characteristics of both D1- and D2-SPNs activation in the striatum (MCC: Fig 3C & D vs. J; M1: Fig 3F & G vs. M). The similar results were observed in the ICSS test. Our interpretation of these results is that the activation of D1-projecting neurons in the cortex induces behavior changes akin to D1 neuron activation, while activation of D2-projecting neurons in the cortex leads to a combined effect of both D1 and D2 neuron activation. This suggests that at least some cortical regions, the ones we tested, follow the hypothesis we proposed.

      There are also some caveats with respect to the efficacy of rabies tracing. Although they only patch non-starter cells in the striatum, only 63% of D1-SPNs receive input from D1-SPN-projecting cortical neurons. It's hard to say whether this is "high" or "low," but one question is how far from the starter cell region they are patching. Without this spatial indication of where the cells that are being patched are relative to the starter population, it is difficult to interpret if the cells being patched are receiving cortical inputs from the same neurons that are projecting to the starter population. The authors indicate they are patching from mCherry-negative neurons within the region of the mCherry-positive neurons, but since the mCherry population will include both true starter cells and monosynaptically connected cells, this is not perfectly precise. Convergence of cortical inputs onto SPNs may vary with distance from the starter cell region quite dramatically, as other mapping studies of corticostriatal inputs have shown specialized local input regions can be defined based on cortical input patterns (Hintiryan et al., Nat Neurosci, 2016, Hunnicutt et al., eLife 2016, Peters et al., Nature, 2021). 

      This is a valid concern regarding anatomical studies. Investigating cortico-striatal connectivity at the single-cell level remains technically challenging due to current methodological limitations. At present, we rely on rabies virus-mediated trans-synaptic retrograde tracing to identify D1- or D2-projecting cortical populations. This anatomical approach is coupled with ex vivo slice electrophysiology to assess the functional connectivity between these projection-defined cortical neurons and striatal SPNs. This enables us to quantify connection ratios, for example, the proportion of D1-projecting cortical neurons that functionally synapse onto non-starter D1-SPNs.

      To ensure the robustness of our conclusions, it is essential that both the starter cells and the recorded non-starter SPNs receive comparable topographical input from the cortex and other brain regions. Therefore, we carefully designed our experiments so that all recorded cells were located within the injection site, were mCherry-negative (i.e., non-starter cells), and were surrounded by ChR2-mCherry-positive neurons. This configuration ensured that the distance between recorded and starter cells did not exceed 100 µm, maintaining close anatomical proximity and thereby preserving the likelihood of shared cortical innervation within the examined circuitry.

      These methodological details are also described in the section on ex vivo brain slice electrophysiology, specifically in the Methods section, lines 453–459:

      “D1-SPNs (eGFP-positive in D1-eGFP mice, or eGFP-negative in D2-eGFP mice) or D2-SPNs (eGFP-positive in D2-eGFP mice, or eGFP-negative in D1-eGFP mice) that were ChR2-mCherry-negative, but in the injection site and surrounded by cells expressing ChR2-mCherry were targeted for recording. This configuration ensured that the distance between recorded and starter cells did not exceed 100 µm, maintaining close anatomical proximity and thereby preserving the likelihood of shared cortical innervation within the examined circuitry.”

      This experimental strategy was implemented to control for potential spatial biases and to enhance the interpretability of our connectivity measurements.

      A caveat for the optogenetic behavioral experiments is that these optogenetic experiments did not include fluorophore-only controls, although a different control (with light delivered in M1) is provided in Supplementary Figure 3. Another point of confusion is that other studies (Cui et al, J Neurosci, 2021) have reported that stimulation of D1-SPNs in DLS inhibits rather than promotes movement. This study may have given different results due to subtly different experimental parameters, including fiber optic placement and NA.

      We appreciate the reviewer’s thoughtful evaluation and comments. We have added a short discussion of Cui et al.’s study on optogenetic stimulation of D1-SPNs in the DLS (lines 341-343), which reports findings that contrast with ours and those of other studies.

      Reviewer #3 (Public review): 

      Review of resubmission: The authors provided a response to the reviews from myself and other reviewers. While some points were made satisfactorily, particularly in clarification of the innervation of cortex to striatum and the effects of input stimulation, many of my points remain unaddressed. In several cases, the authors chose to explain their rationale rather than address the issues at hand. A number of these issues (in fact, the majority) could be addressed simply by toning done the confidence in conclusions, so it was disappointing to see that the authors by and large did not do this. I repeat my concerns below and note whether I find them to have been satisfactorily addressed or not. 

      In the manuscript by Klug and colleagues, the investigators use a rabies virus-based methodology to explore potential differences in connectivity from cortical inputs to the dorsal striatum. They report that the connectivity from cortical inputs onto D1 and D2 MSNs differs in terms of their projections onto the opposing cell type, and use these data to infer that there are differences in cross-talk between cortical cells that project to D1 vs. D2 MSNs. Overall, this manuscript adds to the overall body of work indicating that there are differential functions of different striatal pathways which likely arise at least in part by differences in connectivity that have been difficult to resolve due to difficulty in isolating pathways within striatal connectivity, and several interesting and provocative observations were reported. Several different methodologies are used, with partially convergent results, to support their main points. 

      However, I have significant technical concerns about the manuscript as presented that make it difficult for me to interpret the results of the experiments. My comments are below. 

      Major: 

      There is generally a large caveat to the rabies studies performed here, which is that both TVA and the ChR2-expressing rabies virus have the same fluorophore. It is thus essentially impossible to determine how many starter cells there are, what the efficiency of tracing is, and which part of the striatum is being sampled in any given experiment. This is a major caveat given the spatial topography of the cortico-striatal projections. Furthermore, the authors make a point in the introduction about previous studies not having explored absolute numbers of inputs, yet this is not at all controlled in this study. It could be that their rabies virus simply replicates better in D1-MSNs than D2-MSNs. No quantifications are done, and these possibilities do not appear to have been considered. Without a greater standardization of the rabies experiments across conditions, it is difficult to interpret the results. 

      This is still an issue. The authors point out why they chose various vectors. I can understand why the authors chose the fluorophores etc. that they did, yet the issues I raised previously are still valid. The discussion should mention that this is a potential issue. It does not necessarily invalidate results, but it is an issue. Furthermore, it is possible (in all systems) that rabies replicates better/more efficiently in some cells than others. This is one possible interpretation that has not really been explored in any study. I don't suggest the authors attempt to do that, but it should be raised as a potential interpretation. If the rabies results could mean several different things, the authors owe it to the readership to state all possible interpretations of data.

      We thank the reviewer for the comments and suggestions. Because the same fluorophore (mCherry) was used in both TVA- and ChR2-expressing viruses, it was not possible to distinguish true starter SPNs from TVA-only SPNs or monosynaptically labeled SPNs. This limitation makes it difficult to precisely assess the efficiency of rabies labeling and retrograde tracing in our experimental setup. Moreover, differences in rabies replication efficiency between D1- and D2-SPNs could potentially lead to an apparent lower connection probability from D1-projecting cortical neurons to D2-SPNs than from D2-projecting cortical neurons to D1-SPNs. We have added this clarification to the Discussion (lines 280-297).

      The authors claim using a few current clamp optical stimulation experiments that the cortical cells are healthy, but this result was far from comprehensive. For example, membrane resistance, capacitance, general excitability curves, etc are not reported. In Figure S2, some of the conditions look quite different (e.g., S2B, input D2-record D2, the method used yields quite different results that the authors write off as not different). Furthermore, these experiments do not consider the likely sickness and death that occurs in starter cells, as has been reported elsewhere. Health of cells in the circuit is overall a substantial concern that alone could invalidate a large portion, if not all, of the behavioral results. This is a major confound given those neurons are thought to play critical roles in the behaviors being studied. This is a major reason why first-generation rabies viruses have not been used in combination with behavior, but this significant caveat does not appear to have been considered, and controls e.g., uninfected animals, infected with AAV helpers, etc, were not included. 

      This issue remains unaddressed. I did not request clarity about experimental design, but rather, raised issues about the potential effects of toxicity. I believe this to be a valid concern that needs to be discussed in the manuscript, especially given what look visually like potential differences in S2. 

      We understand and appreciate the reviewer’s concern regarding the potential cytotoxicity of rabies virus infection. Although we performed the in vivo optogenetic behavioral experiments during a period when rabies-infected cells are generally considered relatively healthy, some deficits in starter cells may still occur and could contribute to the observed effects of optogenetic cortical stimulation. We have added this clarification to the Discussion (lines 298-306).

      The overall purity (e.g., EnvA pseudotyping efficiency) of the RABV prep is not shown. If there was a virus that was not well EnvA-pseudotyped and thus could directly infect cortical (or other) inputs, it would degrade specificity. This issue has not been addressed. Viral strain is irrelevant. The quality of the specific preparations used is what matters.

      While most of the study focuses on the cortical inputs, in slice recordings, inputs from the thalamus are not considered, yet likely contribute to the observed results. Related to this, in in vivo optogenetic experiments, technically, if the thalamic or other inputs to the dorsal striatum project to the cortex, their method will not only target cortical neurons but also terminals of other excitatory inputs. If this cannot be ruled it, stating that the authors are able to selectively activate the cortical inputs to one or the other population should be toned down. 

      The authors added text to the discussion to address this point. While it largely does what is intended, based on the one study cited, I disagree with the authors' conclusions that it is "clear" that potential contamination from other sites does not play a role. The simplest interpretation is the one the authors state, and there is some supporting evidence to back up that assertion, but to me that falls short of making the point "clear" that there are no other interpretations. 

      The statements about specificity of connectivity are not well founded. It may be that in the specific case where they are assessing outside of the area of injections, their conclusions may hold (e.g., excitatory inputs onto D2s have more inputs onto D1s than vice versa). However, how this relates to the actual site of injection is not clear. At face value, if such a connectivity exists, it would suggest that D1-MSNs receive substantially more overall excitatory inputs than D2s. It is thus possible that this observation would not hold over other spatial intervals. This was not explored and thus the conclusions are over-generalized. e.g., the distance from the area of red cells in the striatum to recordings was not quantified, what constituted a high level of cortical labeling was not quantified, etc. Without more rigorous quantification of what was being done, it is difficult to interpret the results. 

      Again, the goal here would be to make a statement about this in the discussion to clarify limitations of the study. I don't expect the authors to re-do all of these experiments, but since they are discussing the corticostriatal circuits, which have multiple subdomains, this remains a relevant point. It has not been addressed. 

      The results in Figure 3 are not well controlled. The authors show contrasting effects of optogenetic stimulation of D1-MSNs and D2-MSNs in the DMS and DLS, results which are largely consistent with the canon of basal ganglia function. However, when stimulating cortical inputs, stimulating the inputs from D1-MSNs gives the expected results (increased locomotion) while stimulating putative inputs to D2-MSNs had no effect. This is not the same as showing a decrease in locomotion - showing no effect here is not possible to interpret. 

      I think that the caveat of showing no clear effects of inputs to D2 stimulation should be pointed out. Yes, I understand that the viruses appeared to express etc., but again it remains possible that the results are driven by a lack of e.g., sufficient ChR2 expression. Aside from a full quantification of the number of cells expressing ChR2, overlap in fiber placement and ChR2 expression (which I don't suggest), this remains a possibility and should be pointed out, as it remains a possibility. 

      In the light of their circuit model, the result showing that inputs to D2-MSNs drive ICSS is confusing. How can the authors account for the fact that these cells are not locomotor-activating, stimulation of their putative downstream cells (D2-MSNs) does not drive ICSS, yet the cortical inputs drive ICSS? Is the idea that these inputs somehow also drive D1s? If this is the case, how do D2s get activated, if all of the cortical inputs tested net activate D1s and not D2s? Same with the results in Figure 4 - the inputs and putative downstream cells do not have the same effects. Given potential caveats of differences in viral efficiency, spatial location of injections, and cellular toxicity, I cannot interpret these experiments. 

      The explanation the authors provide in their rebuttal makes sense, however this should be included in the discussion of the manuscript, as it is interesting and relevant. 

      We thank the reviewer for the valuable comments and suggestions. In line with the reviewer’s recommendation, we have incorporated these explanations into the Discussion (lines 242–279) to help interpret the complex behavioral outcomes of optogenetic stimulation of cortical neurons projecting to D1- or D2-SPNs.

      Reviewer #2 (Recommendations for the authors): 

      I appreciate the authors' responses, which helped clarify some experimental choices. I appreciate that the experiment in Fig S3 serves as a reasonable light control for optogenetics experiments. The careful comparison with methods in Cui et al (2021) is useful, although not added to the main manuscript. Some of the other citations here don't really address the controversy, e.g. Kravitz at al is in DMS, but perhaps fully addressing this issue is outside the scope of the current manuscript and awaits further experiments. I also appreciate the clarification for recording locations that "This configuration ensured that the distance between recorded and starter cells did not exceed 100 µm, maintaining close anatomical proximity and thereby preserving the likelihood of shared cortical innervation within the examined circuitry." However, the statement in the reviewer response does not seem to be added to the manuscript's methods, which I think would be helpful. The criteria for choosing recorded cells are still a bit fuzzy without a map of recording locations and histology. There is also a problem that mCherry-positive cells could be starter cells or could be monosynaptically traced cells, so it is hard to know the area of the starter cell population in these experiments for sure. My evaluation of the manuscript remains largely the same as the original. However, I have adjusted my public review a bit to incorporate the authors' responses. I still think this paper has valuable information, suggesting an interesting and previously unappreciated structure of corticostriatal inputs that I hope this group and others will continue to investigate and incorporate into models of basal ganglia function.

      We thank the reviewer for the valuable suggestions. We have now included a comparison with Cui et al. in the Discussion. In addition, we have added the criteria for selecting recorded cells to the Methods section: ‘This configuration ensured that the distance between recorded and starter cells did not exceed 100 µm, maintaining close anatomical proximity and thereby preserving the likelihood of shared cortical innervation within the examined circuitry.’

    1. eLife Assessment

      This work introduces a new Python package, Avian Vocalization Analysis (AVN) that provides several key analysis pipelines for birdsong research. This tool is likely to prove useful to researchers in neuroscience and beyond, as demonstrated by convincing experiments using a wide range of publicly available birdsong data.

    2. Reviewer #2 (Public review):

      Summary:

      In this work, the authors present a new Python software package, Avian Vocalization Network (AVN) aimed at facilitating the analysis of birdsong, especially the song of the zebra finch, the most common songbird model in neuroscience. The package handles some of the most common (and some more advanced) song analyses, including segmentation, syllable classification, featurization of song, calculation of tutor-pupil similarity, and age prediction, with a view toward making the entire process friendlier to experimentalists with limited coding experience working in the field.

      For many years, Sound Analysis Pro has served as a standard in the songbird field, the first package to extensively automate songbird analysis and facilitate the computation of acoustic features that have helped define the field. More recently, the increasing popularity of Python as a language, along with the emergence of new machine learning methods, has resulted in a number of new software tools, including the vocalpy ecosystem for audio processing, TweetyNet (for segmentation), t-SNE and UMAP (for visualization), and autoencoder-based approaches for embedding.

      As with any software package, this one necessarily makes a number of design choices, which may or may not fit the needs of all users. Those who prefer a more automated pipeline with fewer knobs to turn may appreciate AVN in cases where the existing recipes fit their needs, while those who require more customization and flexibility may require a more bespoke (and thus code-intensive) approach.

      Strengths:

      The AVN package overlaps several of these earlier efforts, albeit with a focus on more traditional featurization that many experimentalists may find more interpretable than deep learning-based approaches. Among the strengths of the paper are its clarity in explaining the several analyses it facilitates, along with high-quality experiments across multiple public datasets collected from different research groups. As a software package, it is open source, installable via the pip Python package manager, and features high-quality documentation, as well as tutorials. For experimentalists who wish to replicate any of the analyses from the paper, the package is likely to be a useful time saver.

      Weaknesses:

      I think the potential limitations of the work are predominantly on the software end, with one or two quibbles about the methods.

      First, the software: It's important to note that the package is trying to do many things, of which it is likely to do several well and a few comprehensively. Rather than a package that presents a number of new analyses or a new analysis framework, it is more a codification of recipes, some of which are reimplementations of existing work (SAP features), some of which are essentially wrappers around other work (interfacing with WhisperSeg segmentations), and some of which are new (similarity scoring). All of this has value, but in my estimation, it has less value as part of a standalone package and potentially much more as part of an ecosystem like vocalpy that is undergoing continuous development and has long-term support. While the code is well-documented, including web-based documentation for both the core package and the GUI, the latter is available only on Windows, which might limit the scope of adoption.

      That is to say, whether AVN is adopted by the field in the medium term will have much more to do with the quality of its maintenance and responsiveness to users than any particular feature, but I believe that many of the analysis recipes that the authors have carefully worked out may find their way into other code and workflows.

      In the revised version of the paper, the authors have expanded their case for the design choices made in AVN and remain committed to maintaining the tool. Given the low cost for users in trying new methods and the work the authors have put into further reducing this overhead via documentation, those curious about the package are likely best served by simply downloading it and giving it a try on their own data.

      Second, two notes about new analysis approaches:

      (1) The authors propose a new means of measuring tutor-pupil similarity based on first learning a latent space of syllables via a self-supervised learning (SSL) scheme and then using the earth mover's distance (EMD) to calculate transport costs between the distributions of tutors' and pupils' syllables. While, to my knowledge, this exact method has not previously been proposed in birdsong, I suspect it is unlikely to differ substantially from the approach of autoencoding followed by MMD used in the Goffinet et al. paper. That is, SSL, like the autoencoder, is a latent space learning approach, and EMD, like MMD, is an integral probability metric that measures discrepancies between two distributions. (Indeed, the two are very closely related: https://stats.stackexchange.com/questions/400180/earth-movers-distance-and-maximum-mean-discrepency.) Without further experiments, it is hard to tell whether these two approaches differ meaningfully. Likewise, while the authors have trained on a large corpus of syllables to define their latent space in a way that generalizes to new birds, it is unclear why such an approach would not work with other latent space learning methods.

      Update: The authors now provide an extensive comparison with the Goffinet et al. paper and also consider differences between MMD and EMD. This comparison both adds value to the original paper and provides useful benchmarking for others looking to develop latent space comparison methods.

      (2) The authors propose a new method for maturity scoring by training a model (a generalized additive model) to predict the age of the bird based on a selected subset of acoustic features. This is distinct from the "predicted age" approach of Brudner, Pearson, and Mooney, which predicts based on a latent representation rather than specific features, and the GAM nicely segregates the contribution of each. As such, this approach may be preferred by many users who appreciate its interpretability.

      In summary, my view is that this is a nice paper detailing a well-executed piece of software whose future impact will be determined by the degree of support and maintenance it receives from others over the near and medium term.

    3. Reviewer #3 (Public review):

      This paper introduces the Avian Vocalization Network (AVN), a novel birdsong analysis pipeline using deep learning. By automating vocal annotation tasks, the AVN generates interpretable song features and song similarity scores on novel datasets without retraining. The performance of the network is solid and is comparable to that of human annotators.

      The authors have improved the manuscript in several aspects, such as the comparison with the Goffinet work. Overall, the AVN feature set could become a useful tool for evaluating birdsongs. But the authors also chose not to address a certain number of criticisms, and some issues remain poorly addressed, and the work is not reproducible at this stage. With a little effort, these issues could get resolved in my view. I will just pick on four issues that I think can be easily addressed:

      (1) Limitation of feature set: They claim that AVN satisfies the criteria (line 60) of "creating a common feature space for the comparison of behavioural phenotypes ..."(line 51), but then on LDA analysis, explained on line 910 they say "excluding amplitude and amplitude modulation features as they were found to vary". Since their feature set is not stable and not truly 'common' to all tasks, this limitation needs addressing in the discussion (that some features seem to vary undesirably, and they need exclusion based on some criteria to be defined).

      (2) Missing information on classification training loss: The Authors insist that their triplet loss is not related to classification, and they brush off my request for more information. In their rebuttal, they write: 'The loss function is related to the relative distance between embeddings of syllables with the same or different labels, not the classification of syllables as same or different.' Perplexingly, however, in the revised paper, authors speak themselves of 'classes', in Line 1004: this allows the model to begin learning an easier task, of separating syllables of different classes by a smaller margin.' So it seems the authors actually agree with me that there is an underlying classification task. I am therefore going to make it a bit more explicit here what I'm asking for, hoping this will better resonate with them.

      In line 984 they define their loss function and in lines 994-996 they define 'hard' and 'semi-hard' triplets. Authors then train a system to minimize the loss with a ratio of 75 percent semi-hard triplets and 25 percent hard triplets and a final weighing parameter value alpha=0.7. What I'm asking for is this 'classification' loss their trained model achieves, or in other words, the fraction of triplets that end up producing a loss, either of the 'hard' or 'semi-hard' type. For example, if their model manages to separate all 'possible triplets' by a margin of at least alpha, then the loss would be zero. If the model achieves to separate all triplets except one, then the loss would correspond to the amount by which the separation differences between the anchor and the positive vs negative samples exceeds alpha. So, an important number to provide in the paper is the fraction of triplets that incur a nonzero loss, i.e., the fraction of semi-hard triplets. And another important quantity is the fraction of hard triplets, i.e. the fraction of triplets that would incur a loss if alpha were set to zero, or, in other words, the triplets for which the negative sample is closer to the anchor than the positive sample. By the way, I assume this latter fraction of hard cases will be zero - that their model does not confuse any positive and negative training samples...<br /> Note: the quantification chosen by the authors termed 'contrast index' is interesting, but it is a derived quantity, it is not the quantity authors chose to optimize during training. If authors were to report both the training loss achieved and the 'contrast index', follow-up work could be benchmarked against both these quantities. If for example, a follow-up model achieves smaller loss but worse contrast, then the loss is not a good placeholder measure for optimizing contrast. Alternatively, follow-up work could focus on the contrast index as training objective, obliterating the need for the triplet loss as an intermediate step (I don't buy the authors' argument that such an optimization would be infeasible).

      (3) Reproducibility: they explain the way they train the CNN with triplet loss to produce the embeddings, but we're missing both actual scripts on GitHub to train and inference from scratch, and model weights, or even hyper parameters they used. Authors only provide the architecture, and I don't think that's enough to be considered replicable in today's standards. I would suggest they release complete model checkpoint weights for the result they report, the exact data splits, the hyper parameters they used and training and testing code, so that one can very easily verify their claims and apply their methods to other datasets. Note: for example, the code to extract the embeddings is incomplete (the function definition of single_bird_extract_embeddings cannot be found on GitHub) and the model weights they used are missing.

      (4) With regards to the age prediction model, the authors should specify that this model is mainly useful for comparisons across studies but less so for precise evaluation of the effects of a treatment within a study. Namely, the effect on song of a treatment is best assessed by comparison to within-subject past song, and by comparison to age-matched control birds (ideally siblings) raised in identical conditions, rather than to invoke a generic model trained on other birds and from different colonies and breeding conditions as authors propose to do. In other words, to introduce a generic model for evaluation of song maturity introduces measurement noise in terms of the additional birds and their variable conditions, which can hinder precise assessment of treatment effects. Note that to state that in past work such maturity models were used is not a good justification, scientifically speaking.

      Finally, the authors write that methods for syllable segmentation have not been systematically compared but the whisperseg work they use did such a comparison. So the authors should revise their novelty claim of being the first to compare syllable segmentation methods.

    4. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary: 

      This paper applies methods for segmentation, annotation, and visualization of acoustic analysis to zebra finch song. The paper shows that these methods can be used to predict the stage of song development and to quantify acoustic similarity. The methods are solid and are likely to provide a useful tool for scientists aiming to label large datasets of zebra finch vocalizations. The paper has two main parts: 1) establishing a pipeline/ package for analyzing zebra finch birdsong and 2) a method for measuring song imitation. 

      Strengths: 

      It is useful to see existing methods for syllable segmentation compared to new datasets.

      It is useful, but not surprising, that these methods can be used to predict developmental stage, which is strongly associated with syllable temporal structure.

      It is useful to confirm that these methods can identify abnormalities in deafened and isolated songs. 

      Weaknesses: 

      For the first part, the implementation seems to be a wrapper on existing techniques. For instance, the first section talks about syllable segmentation; they made a comparison between whisperseg (Gu et al, 2024), tweetynet (Cohen et al, 2022), and amplitude thresholding. They found that whisperseg performed the best, and they included it in the pipeline. They then used whisperseg to analyze syllable duration distributions and rhythm of birds of different ages and confirmed past findings on this developmental process (e.g. Aronov et al, 2011). Next, based on the segmentation, they assign labels by performing UMAP and HDBScan on the spectrogram (nothing new; that's what people have been doing). Then, based on the labels, they claimed they developed a 'new' visualization - syntax raster ( line 180 ). That was done by Sainburg et. al. 2020 in Figure 12E and also in Cohen et al, 2020 - so the claim to have developed 'a new song syntax visualization' is confusing. The rest of the paper is about analyzing the finch data based on AVN features (which are essentially acoustic features already in the classic literature). 

      First, we would like to thank this reviewer for their kind comments and feedback on this manuscript. It is true that many of the components of this song analysis pipeline are not entirely novel in isolation. Our real contribution here is bringing them together in a way that allows other researchers to seamlessly apply automated syllable segmentation, clustering, and downstream analyses to their data. That said, our approach to training TweetyNet for syllable segmentation is novel. We trained TweetyNet to recognize vocalizations vs. silence across multiple birds, such that it can generalize to new individual birds, whereas Tweetynet had only ever been used to annotate song syllables from birds included in its training set previously. Our validation of TweetyNet and WhisperSeg in combination with UMAP and HDBSCAN clustering is also novel, providing valuable information about how these systems interact, and how reliable the completely automatically generated labels are for downstream analysis. We have added a couple sentences to the introduction to emphasize the novelty of this approach and validation.

      Our syntax raster visualization does resemble Figure 12E in Sainburg et al. 2020, however it differs in a few important ways, which we believe warrant its consideration as a novel visualization method. First, Sainburg et al. represent the labels across bouts in real time; their position along the x axis reflects the time at which each syllable is produced relative to the start of the bout. By contrast, our visualization considers only the index of syllables within a bout (ie. First syllable vs. second syllable etc) without consideration of the true durations of each syllable or the silent gaps between them. This makes it much easier to detect syntax patterns across bouts, as the added variability of syllable timing is removed. Considering only the sequence of syllables rather than their timing also allows us to more easily align bouts according to the first syllable of a motif, further emphasizing the presence or absence of repeating syllable sequences without interference from the more variable introductory notes at the start of a motif. Finally, instead of plotting all bouts in the order in which they were produced, our visualization orders bouts such that bouts with the same sequence of syllables will be plotted together, which again serves to emphasize the most common syllable sequences that the bird produces. These additional processing steps mean that our syntax raster plot has much starker contrast between birds with stereotyped syntax and birds with more variable syntax, as compared to the more minimally processed visualization in Sainburg et al. 2020. There doesn’t appear to be any similar visualizations in Cohen et al. 2020. 

      The second part may be something new, but there are opportunities to improve the benchmarking. It is about the pupil-tutor imitation analysis. They introduce a convolutional neural network that takes triplets as an input (each tripled is essentially 3 images stacked together such that you have (anchor, positive, negative), Anchor is a reference spectrogram from, say finch A; positive means a different spectrogram with the same label as anchor from finch A, and negative means a spectrogram not related to A or different syllable label from A. The network is then trained to produce a low-dimensional embedding by ensuring the embedding distance between anchor and positive is less than anchor and negative by a certain margin. Based on the embedding, they then made use of earth mover distance to quantify the similarity in the syllable distribution among finches. They then compared their approach performance with that of sound analysis pro (SAP) and a variant of SAP. A more natural comparison, which they didn't include, is with the VAE approach by Goffinet et al. In this paper (https://doi.org/10.7554/eLife.67855, Fig 7), they also attempted to perform an analysis on the tutor pupil song.  

      We thank the reviewer for this suggestion. We have included a comparison of our triplet loss embedding model to the VAE model proposed in Goffinet et al. 2021. We also included comparisons of similarity scoring using each of these embedding models combined with either earth mover’s distance (EMD) or maximum mean discrepancy (MMD) to calculate the similarity of the embeddings, as was done in Goffinet et al. 2021. As discussed in the updated results section of the paper and shown in the new Figure 6–figure supplement 1, the Triplet loss model with MMD performs best for evaluating song learning on new birds, not included in model training. We’ve updated the main text of the paper to reflect this switch from EMD to MMD for the primary similarity scoring approach.

      Reviewer #2 (Public Review):

      Summary: 

      In this work, the authors present a new Python software package, Avian Vocalization Network (AVN) aimed at facilitating the analysis of birdsong, especially the song of the zebra finch, the most common songbird model in neuroscience. The package handles some of the most common (and some more advanced) song analyses, including segmentation, syllable classification, featurization of song, calculation of tutor-pupil similarity, and age prediction, with a view toward making the entire process friendlier to experimentalists working in the field.

      For many years, Sound Analysis Pro has served as a standard in the songbird field, the first package to extensively automate songbird analysis and facilitate the computation of acoustic features that have helped define the field. More recently, the increasing popularity of Python as a language, along with the emergence of new machine learning methods, has resulted in a number of new software tools, including the vocalpy ecosystem for audio processing, TweetyNet (for segmentation), t-SNE and UMAP (for visualization), and autoencoder-based approaches for embedding.

      Strengths: 

      The AVN package overlaps several of these earlier efforts, albeit with a focus on more traditional featurization that many experimentalists may find more interpretable than deep learning-based approaches. Among the strengths of the paper are its clarity in explaining the several analyses it facilitates, along with high-quality experiments across multiple public datasets collected from different research groups. As a software package, it is open source, installable via the pip Python package manager, and features high-quality documentation, as well as tutorials. For experimentalists who wish to replicate any of the analyses from the paper, the package is likely to be a useful time saver.

      Weaknesses: 

      I think the potential limitations of the work are predominantly on the software end, with one or two quibbles about the methods.

      First, the software: it's important to note that the package is trying to do many things, of which it is likely to do several well and few comprehensively. Rather than a package that presents a number of new analyses or a new analysis framework, it is more a codification of recipes, some of which are reimplementations of existing work (SAP features), some of which are essentially wrappers around other work (interfacing with WhisperSeg segmentations), and some of which are new (similarity scoring). All of this has value, but in my estimation, it has less value as part of a standalone package and potentially much more as part of an ecosystem like vocalpy that is undergoing continuous development and has long-term support. 

      We appreciate this reviewer’s comments and concerns about the structure of the AVN package and its long-term maintenance. We have considered incorporating AVN into the VocalPy ecosystem but have chosen not to for a few key reasons. (1) AVN was designed with ease of use for experimenters with limited coding experience top of mind. VocalPy provides excellent resources for researchers with some familiarity with object-oriented programming to manage and analyze their datasets; however, we believe it may be challenging for users without such experience to adopt VocalPy quickly. AVN’s ‘recipe’ approach, as you put it, is very easily accessible to new users, and allows users with intermediate coding experience to easily navigate the source code to gain a deeper understanding of the methodology. AVN also consistently outputs processed data in familiar formats (tables in .csv files which can be opened in excel), in an effort to make it more accessible to new users, something which would be challenging to reconcile with VocalPy’s emphasis on their `dataset`classes. (2) AVN and VocalPy differ in their underlying goals and philosophies when it comes to flexibility vs. standardization of analysis pipelines. VocalPy is designed to facilitate mixing-and-matching of different spectrogram generation, segmentation, annotation etc. approaches, so that researchers can design and implement their own custom analysis pipelines. This flexibility is useful in many cases. For instance, it could allow researchers who have very different noise filtering and annotation needs, like those working with field recordings versus acoustic chamber recordings, to analyze their data using this platform. However, when it comes to comparisons across zebra finch research labs, this flexibility comes at the expense of direct comparison and integration of song features across research groups. This is the context in which AVN is most useful. It presents a single approach to song segmentation, labeling, and featurization that has been shown to generalize well across research groups, and which allows direct comparisons of the resulting features. AVN’s single, extensively validated, standard pipeline approach is fundamentally incompatible with VocalPy’s emphasis on flexibility. We are excited to see how VocalPy continues to evolve in the future, and recognize the value that both AVN and VocalPy bring to the songbird research community, each with their own distinct strengths, weaknesses, and ideal use cases. 

      While the code is well-documented, including web-based documentation for both the core package and the GUI, the latter is available only on Windows, which might limit the scope of adoption. 

      We thank the reviewer for their kind words about AVN’s documentation. We recognize that the GUI’s exclusive availability on Windows is a limitation, and we would be happy to collaborate with other researchers and developers in the future to build a Mac compatible version, should the demand present itself. That said, the python package works on all operating systems, so non-Windows users still have the ability to use AVN that way.

      That is to say, whether AVN is adopted by the field in the medium term will have much more to do with the quality of its maintenance and responsiveness to users than any particular feature, but I believe that many of the analysis recipes that the authors have carefully worked out may find their way into other code and workflows. 

      Second, two notes about new analysis approaches:

      (1) The authors propose a new means of measuring tutor-pupil similarity based on first learning a latent space of syllables via a self-supervised learning (SSL) scheme and then using the earth mover's distance (EMD) to calculate transport costs between the distributions of tutors' and pupils' syllables. While to my knowledge this exact method has not previously been proposed in birdsong, I suspect it is unlikely to differ substantially from the approach of autoencoding followed by MMD used in the Goffinet et al. paper. That is, SSL, like the autoencoder, is a latent space learning approach, and EMD, like MMD, is an integral probability metric that measures discrepancies between two distributions. (Indeed, the two are very closely related: https://stats.stackexchange.com/questions/400180/earth-movers-distance-andmaximum-mean-discrepency.) Without further experiments, it is hard to tell whether these two approaches differ meaningfully. Likewise, while the authors have trained on a large corpus of syllables to define their latent space in a way that generalizes to new birds, it is unclear why such an approach would not work with other latent space learning methods.  

      We recognize the similarities between these approaches and have included comparisons of the VAE and MMD as in the Goffinet paper to our triplet loss model and EMD.  As discussed in the updated results section of the paper and shown in the new Figure 6–figure supplement 1, the Triplet loss model with MMD performs best for evaluating song learning on new birds, not included in model training. We’ve updated the main text of the paper to reflect this switch from EMD to MMD for the primary similarity scoring approach. 

      (2) The authors propose a new method for maturity scoring by training a model (a generalized additive model) to predict the age of the bird based on a selected subset of acoustic features. This is distinct from the "predicted age" approach of Brudner, Pearson, and Mooney, which predicts based on a latent representation rather than specific features, and the GAM nicely segregates the contribution of each. As such, this approach may be preferred by many users who appreciate its interpretability.  

      In summary, my view is that this is a nice paper detailing a well-executed piece of software whose future impact will be determined by the degree of support and maintenance it receives from others over the near and medium term.

      Reviewer #3 (Public Review):

      Summary: 

      The authors invent song and syllable discrimination tasks they use to train deep networks. These networks they then use as a basis for routine song analysis and song evaluation tasks. For the analysis, they consider both data from their own colony and from another colony the network has not seen during training. They validate the analysis scores of the network against expert human annotators, achieving a correlation of 80-90%. 

      Strengths: 

      (1) Robust Validation and Generalizability: The authors demonstrate a good performance of the AVN across various datasets, including individuals exhibiting deviant behavior. This extensive validation underscores the system's usefulness and broad applicability to zebra finch song analysis, establishing it as a potentially valuable tool for researchers in the field.

      (2) Comprehensive and Standardized Feature Analysis: AVN integrates a comprehensive set of interpretable features commonly used in the study of bird songs. By standardizing the feature extraction method, the AVN facilitates comparative research, allowing for consistent interpretation and comparison of vocal behavior across studies.

      (3) Automation and Ease of Use. By being fully automated, the method is straightforward to apply and should introduce barely an adoption threshold to other labs.

      (4) Human experts were recruited to perform extensive annotations (of vocal segments and of song similarity scores). These annotations released as public datasets are potentially very valuable. 

      Weaknesses: 

      (1) Poorly motivated tasks. The approach is poorly motivated and many assumptions come across as arbitrary. For example, the authors implicitly assume that the task of birdsong comparison is best achieved by a system that optimally discriminates between typical, deaf, and isolated songs. Similarly, the authors assume that song development is best tracked using a system that optimally estimates the age of a bird given its song. My issue is that these are fake tasks since clearly, researchers will know whether a bird is an isolated or a deaf bird, and they will also know the age of a bird, so no machine learning is needed to solve these tasks. Yet, the authors imagine that solving these placeholder tasks will somehow help with measuring important aspects of vocal behavior.  

      We appreciate this reviewer’s concerns and apologize for not providing sufficiently clear rationale for the inclusion of our phenotype classifier and age regression models in the original manuscript. These tasks are not intended to be taken as a final, ultimate culmination of the AVN pipeline. Rather, we consider the carefully engineered 55-interpretable feature set to be AVN’s final output, and these analyses serve merely as examples of how that feature set can be applied. That said, each of these models do have valid experimental use cases that we believe are important and would like to bring to the attention of the reviewer.

      For one, we showed how the LDA model that can discriminate between typical, deaf, and isolate birds’ songs not only allows us to evaluate which features are most important for discriminating between these groups, but also allows comparison of the FoxP1 knock-down (FP1 KD) birds to each of these phenotypes. Based on previous work (Garcia-Oscos et al. 2021), we hypothesized that FP1 KD in these birds specifically impaired tutor song memory formation while sparing a bird’s ability to refine their own vocalizations through auditory feedback. Thus, we would expect their songs to resemble those of isolate birds, who lack a tutor song memory, but not to resemble deaf birds who lack a tutor song memory and auditory feedback of their own vocalizations to guide learning. The LDA model allowed us to make this comparison quantitatively for the first time and confirm our hypothesis that FP1 KD birds’ songs are indeed most like isolates’. In the future, as more research groups publish their birds’ AVN feature sets, we hope to be able to make even more fine-grained comparisons between different groups of birds, either using LDA or other similar interpretable classifiers. 

      The age prediction model also has valid real-world use cases. For instance, one might imagine an experimental manipulation that is hypothesized to accelerate or slow song maturation in juvenile birds. This age prediction model could be applied to the AVN feature sets of birds having undergone such a manipulation to determine whether their predicted ages systematically lead or lag their true biological ages, and which song features are most responsible for this difference. We didn’t have access to data for any such birds for inclusion in this paper, but we hope that others in the future will be able to take inspiration from our methodology and use this or a similar age regression model with AVN features in their research. We have added a couple lines to the ‘Comparing Song Disruptions with AVN Features’ and ‘Tracking Song Development with AVN Features’ sections of the results to make this more clear. 

      Along similar lines, authors assume that a good measure of similarity is one that optimally performs repeated syllable detection (i.e. to discriminate same syllable pairs from different pairs). The authors need to explain why they think these placeholder tasks are good and why no better task can be defined that more closely captures what researchers want to measure. Note: the standard tasks for self-supervised learning are next word or masked word prediction, why are these not used here? 

      This reviewer appears to have misunderstood our similarity scoring embedding model and our rationale for using it. We will explain it in more depth here and have added a paragraph to the ‘Measuring Song Imitation’ section of the results explaining this rationale more briefly.

      First, nowhere are we training a model to discriminate between same and different syllable pairs. The triplet loss network is trained to embed syllables in an 8-dimensional space such that syllables with the same label are closer together than syllables with different labels. The loss function is related to the relative distance between embeddings of syllables with the same or different labels, not the classification of syllables as same or different. This approach was chosen because it has repeatedly been shown to be a useful data compression step (Schorff et al. 2015, Thakur et al. 2019) before further downstream tasks are applied on its output, particularly in contexts where there is little data per class (syllable label). For example, Schorff et al. 2015 trained a deep convolutional neural network with triplet loss to embed images of human faces from the same individual closer together than images of different individuals in a 128dimensional space. They then used this model to compute 128-dimensional representations of additional face images, not included in training, which were used for individual facial recognition (this is a same vs. different category classifier), and facial clustering, achieving better performance than the previous state of the art. The triplet loss function results in a model that can generate useful embeddings of previously unseen categories, like new individuals’ faces, or new zebra finches’ syllables, which can then be used in downstream analyses. This meaningful, lower dimensional space allows comparisons of distributions of syllables across birds, as in Brainard and Mets 2008, and Goffinet et al. 2021. 

      Next word and masked word prediction are indeed common self-supervised learning tasks for models working with text data, or other data with meaningful sequential organization. That is not the case for our zebra finch syllables, where every bird’s syllable sequence depends only on its tutor’s sequence, and there is no evidence for strong universal syllable sequencing rules (James et al. 2020). Rather, our embedding model is an example of a computer vision task, as it deals with sets of two-dimensional images (spectrograms), not sequences of categorical variables (like text). It is also not, strictly speaking, a selfsupervised learning task, as it does require syllable labels to generate the triplets. A common selfsupervised approach for dimensionality reduction in a computer vision task such as this one would be to train an autoencoder to compress images to a lower dimensional space, then faithfully reconstruct them from the compressed representation.  This has been done using a variational autoencoder trained on zebra finch syllables in Goffinet et al. 2021. In keeping with the suggestions from reviewers #1 and #2, we have included a comparison of our triplet loss model with the Goffinet et al. VAE approach in the revised manuscript. 

      (2) The machine learning methodology lacks rigor. The aims of the machine learning pipeline are extremely vague and keep changing like a moving target. Mainly, the deep networks are trained on some tasks but then authors evaluate their performance on different, disconnected tasks. For example, they train both the birdsong comparison method (L263+) and the song similarity method (L318+) on classification tasks. However, they evaluate the former method (LDA) on classification accuracy, but the latter (8-dim embeddings) using a contrast index. In machine learning, usually, a useful task is first defined, then the system is trained on it and then tested on a held-out dataset. If the sensitivity index is important, why does it not serve as a cost function for training?

      Again, this reviewer seems not to understand our similarity scoring methodology. Our similarity scoring model is not trained on a classification task, but rather on an embedding task. It learns to embed spectrograms of syllables in an 8-dimensional space such that syllables with the same label are closer together than syllables with different labels. We could report the loss values for this embedding task on our training and validation datasets, but these wouldn’t have any clear relevance to the downstream task of syllable distribution comparison where we are using the model’s embeddings. We report the contrast index as this has direct relevance to the actual application of the model and allows comparisons to other similarity scoring methods, something that the triplet loss values wouldn’t allow. 

      The triplet loss method was chosen because it has been shown to yield useful low-dimensional representations of data, even in cases where there is limited labeled training data (Thakur et al. 2019). While we have one of the largest manually annotated datasets of zebra finch songs, it is still quite small by industry deep learning standards, which is why we chose a method that would perform well given the size of our dataset. Training a model on a contrast index directly would be extremely computationally intensive and require many more pairs of birds with known relationships than we currently have access to. It could be an interesting approach to take in the future, but one that would be unlikely to perform well with a dataset size typical to songbird research. 

      Also, usually, in solid machine learning work, diverse methods are compared against each other to identify their relative strengths. The paper contains almost none of this, e.g. authors examined only one clustering method (HDBSCAN).  

      We did compare multiple methods for syllable segmentation (WhisperSeg, TweetyNet, and Amplitude thresholding) as this hadn’t been done previously. We chose not to perform extensive comparison of different clustering methods as Sainburg et al. 2020 already did so and we felt no need to reduplicate this effort. We encourage this reviewer to refer to Sainburg et al.’s excellent work for comparisons of multiple clustering methods applied to zebra finch song syllables.

      (3) Performance issues. The authors want to 'simplify large-scale behavioral analysis' but it seems they want to do that at a high cost. (Gu et al 2023) achieved syllable scores above 0.99 for adults, which is much larger than the average score of 0.88 achieved here (L121). Similarly, the syllable scores in (Cohen et al 2022) are above 94% (their error rates are below 6%, albeit in Bengalese finches, not zebra finches), which is also better than here. Why is the performance of AVN so low? The low scores of AVN argue in favor of some human labeling and training on each bird.  

      Firstly, the syllable error rate scores reported in Cohen et al. 2022 are calculated very differently than the F1 scores we report here and are based on a model trained with data from the same bird as was used in testing, unlike our more general segmentation approach where the model was tested on different birds than were used in training. Thus, the scores reported in Cohen et al. and the F1 scores that we report cannot be compared. 

      The discrepancy between the F1<sub>seg</sub> scores reported in Gu et al. 2023 and the segmentation F1 scores that we report are likely due to differences in the underlying datasets. Our UTSW recordings tend to have higher levels of both stationary and non-stationary background noise, which make segmentation more challenging. The recordings from Rockefeller were less contaminated by background noise, and they resulted in slightly higher F1 scores. That said, we believe that the primary factor accounting for this difference in scores with Gu et al. 2023 is the granularity of our ‘ground truth’ syllable segments. In our case, if there was never any ambiguity as to whether vocal elements should be segmented into two short syllables with a very short gap between them or merged into a single longer syllable, we chose to split them. WhisperSeg had a strong tendency to merge the vocal elements in ambiguous cases such as these. This results in a higher rate of false negative syllable onset detections, reflected in the low recall scores achieved by WhisperSeg (see Figure 2–figure supplement 1b), but still very high precision scores (Figure 2–figure supplement 1a). While WhisperSeg did frequently merge these syllables in a way that differed from our ground truth segmentation, it did so consistently, meaning it had little impact on downstream measures of syntax entropy (Figure 3c) or syllable duration entropy (Figure 3–figure supplement 2a). It is for that reason that, despite a lower F1 score, we still consider AVN’s automatically generated annotations to be sufficiently accurate for downstream analyses. 

      Should researchers require a higher degree of accuracy and precision with their annotations (for example, to detect very subtle changes in song before and after an acute manipulation) we suggest they turn toward one of the existing tools for supervised song annotation, such as TweetyNet.

      (4) Texas bias. It is true that comparability across datasets is enhanced when everyone uses the same code. However, the authors' proposal essentially is to replace the bias between labs with a bias towards birds in Texas. The comparison with Rockefeller birds is nice, but it amounts to merely N=1. If birds in Japanese or European labs have evolved different song repertoires, the AVN might not capture the associated song features in these labs well.  

      We appreciate the author’s concern about a bias toward birds from the UTSW colony. However, this paper shows that despite training (for the similarity scoring) and hyperparameter fitting (for the HDBSCAN clustering) on the UTSW birds, AVN performs as well if not better on birds from Rockefeller than from UTSW. To our knowledge, there are no publicly available datasets of annotated zebra finch songs from labs in Europe or in Asia but we would be happy to validate AVN on such datasets, should they become available. Furthermore, there is no evidence to suggest that there is dramatic drift in zebra finch vocal repertoire between continents which would necessitate such additional validation. While we didn’t have manual annotations for this dataset (which would allow validation of our segmentation and labeling methods), we did apply AVN to recordings shared with us by the Wada lab in Japan, where visual inspection of the resulting annotations suggested comparable accuracy to the UTSW and Rockefeller datasets. 

      (5) The paper lacks an analysis of the balance between labor requirement, generalizability, and optimal performance. For tasks such as segmentation and labeling, fine-tuning for each new dataset could potentially enhance the model's accuracy and performance without compromising comparability. E.g. How many hours does it take to annotate hundred song motifs? How much would the performance of AVN increase if the network were to be retrained on these? The paper should be written in more neutral terms, letting researchers reach their own conclusions about how much manual labor they want to put into their data.  

      With standardization and ease of use in mind, we designed AVN specifically to perform fully automated syllable annotation and downstream feature calculations. We believe that we have demonstrated in this manuscript that our fully automated approach is sufficiently reliable for downstream analyses across multiple zebra finch colonies. That said, if researchers require an even higher degree of annotation precision and accuracy, they can turn toward one of the existing methods for supervised song annotation, such as TweetyNet. Incorporating human annotations for each bird processed by AVN is likely to improve its performance, but this would require significant changes to AVN’s methodology, and is outside the scope of our current efforts.

      (6) Full automation may not be everyone's wish. For example, given the highly stereotyped zebra finch songs, it is conceivable that some syllables are consistently mis-segmented or misclassified. Researchers may want to be able to correct such errors, which essentially amounts to fine-tuning AVN. Conceivably, researchers may want to retrain a network like the AVN on their own birds, to obtain a more fine-grained discriminative method.  

      Other methods exist for supervised or human-in-the-loop annotation of zebra finch songs, such as TweetyNet and DAN (Alam et al. 2023). We invite researchers who require a higher degree of accuracy than AVN can provide to explore these alternative approaches for song annotation. Incorporating human feedback into AVN was never the goal of our pipeline, would require significant changes to AVN’s design and is outside the scope of this manuscript.

      (7) The analysis is restricted to song syllables and fails to include calls. No rationale is given for the omission of calls. Also, it is not clear how the analysis deals with repeated syllables in a motif, whether they are treated as two-syllable types or one.  

      It is true that we don’t currently have any dedicated features to describe calls. This could be a useful addition to AVN in the future. 

      What a human expert inspecting a spectrogram would typically call ‘repeated syllables’ in a bout are almost always assigned the same syllable label by the UMAP+HDBSCAN clustering. The syntax analysis module includes features examining the rate of syllable repetitions across syllable types, as mentioned in lines 222-226 of the revised manuscript. See https://avn.readthedocs.io/en/latest/syntax_analysis_demo.html#Syllable-Repetitions for further details.

      (8) It seems not all human annotations have been released and the instruction sets given to experts (how to segment syllables and score songs) are not disclosed. It may well be that the differences in performance between (Gu et al 2023) and (Cohen et al 2022) are due to differences in segmentation tasks, which is why these tasks given to experts need to be clearly spelled out. Also, the downloadable files contain merely labels but no identifier of the expert. The data should be released in such a way that lets other labs adopt their labeling method and cross-check their own labeling accuracy.  

      All human annotations used in this manuscript have indeed been released as part of the accompanying dataset. Syllable annotations are not provided for all pupils and tutors used to validate the similarity scoring, as annotations are not necessary for similarity comparisons. We have expanded our description of our annotation guidelines in the methods section of the revised manuscript. All the annotations were generated by one of two annotators. The second annotator always consulted with the first annotator in cases of ambiguous syllable segmentation or labeling, to ensure that they had consistent annotation styles. Unfortunately, we haven’t retained records about which birds were annotated by which of the two annotators, so we cannot share this information along with the dataset. The data is currently available in a format that should allow other research groups to use our annotations either to train their own annotation systems or check the performance of their existing systems on our annotations.  

      (9) The failure modes are not described. What segmentation errors did they encounter, and what syllable classification errors? It is important to describe the errors to be expected when using the method. 

      As we discussed in our response to this reviewer’s point (3), WhisperSeg has a tendency to merge syllables when the gap between them is very short, which explains its lower recall score compared to its precision on our dataset (Figure 2–figure supplement 1). In rare cases, WhisperSeg also fails to recognize syllables entirely, again impacting its precision score. TweetyNet hardly ever completely ignores syllables, but it does tend to occasionally merge syllables together or over-segment them. Whereas WhisperSeg does this very consistently for the same syllable types within the same bird, TweetyNet merges or splits syllables more inconsistently. This inconsistent merging and splitting has a larger effect on syllable labeling, as manifested in the lower clustering v-measure scores we obtain with TweetyNet compared to WhisperSeg segmentations. TweetyNet also has much lower precision than WhisperSeg, largely because TweetyNet often recognizes background noises (like wing flaps or hopping) as syllables whereas WhisperSeg hardly ever segments non-vocal sounds. 

      Many errors in syllable labeling stem from differences in syllable segmentation. For example, if two syllables with labels ‘a’ and ‘b’ in the manual annotation are sometimes segmented as two syllables, but sometimes merged into a single syllable, the clustering is likely to find 3 different syllable types; one corresponding to ‘a’, one corresponding to ‘b’ and one corresponding to ‘ab’ merged. Because of how we align syllables across segmentation schemes for the v-measure calculation, this will look like syllable ‘b’ always has a consistent cluster label (or is missing a label entirely), but syllable ‘a’ can carry two different cluster labels, depending on the segmentation. In certain cases, even in the absence of segmentation errors, a group of syllables bearing the same manual annotation label may be split into 2 or 3 clusters (it is extremely rare for a single manual annotation group to be split into more than 3 clusters). In these cases, it is difficult to conclusively say whether the clustering represents an error, or if it actually captured some meaningful systematic difference between syllables that was missed by the annotator. Finally, sometimes rare syllable types with their own distinct labels in the manual annotation are merged into a single cluster. Most labeling errors can be explained by this kind of merging or splitting of groups relative to the manual annotation, not to occasional mis-classifications of one manual label type as another.

      For examples of these types of errors, we encourage this reviewer and readers to refer to the example confusion matrices in figure 2f and Figure 2–figure supplement 3b&e. We also added two paragraphs to the end of the ‘Accurate, fully unsupervised syllable labeling’ section of the Results in the revised manuscript. 

      (10) Usage of Different Dimensionality Reduction Methods: The pipeline uses two different dimensionality reduction techniques for labeling and similarity comparison - both based on the understanding of the distribution of data in lower-dimensional spaces. However, the reasons for choosing different methods for different tasks are not articulated, nor is there a comparison of their efficacy.  

      We apologize for not making this distinction sufficiently clear in the manuscript and have added a paragraph to the ‘Measuring Song Imitation’ section of the Results explaining the rational for using an embedding model for similarity scoring. 

      We chose to use UMAP for syllable labeling because it is a common embedding methodology to precede hierarchical clustering and has been shown to result in reliable syllable labels for birdsong in the past (Sainburg et al. 2020). However, it is not appropriate for similarity scoring, because comparing EMD or MMD scores between birds requires that all the birds’ syllable distributions exist within the same shared embedding space. This can be achieved by using the same triplet loss-trained neural network model to embed syllables from all birds. This cannot be achieved with UMAP because all birds whose scores are being compared would need to be embedded in the same UMAP space, as distances between points cannot be compared across UMAPs. In practice, this would mean that every time a new tutor-pupil pair needs to be scored, their syllables would need to be added to a matrix with all previously compared birds’ syllables, a new UMAP would need to be computed, and new EMD or MMD scores between all bird pairs would need to be calculated using their new UMAP embeddings. This is very computationally expensive and quickly becomes unfeasible without dedicated high power computing infrastructure. It also means that similarity scores couldn’t be compared across papers without recomputing everything each time, whereas EMD and MMD scores obtained with triplet loss embeddings can be compared, provided they use the same trained model (which we provide as part of AVN) to embed their syllables in a common latent space. 

      (11) Reproducibility: are the measurements reproducible? Systems like UMAP always find a new embedding given some fixed input, so the output tends to fluctuate.

      There is indeed a stochastic element to UMAP embeddings which will result in different embeddings and therefore different syllable labels across repeated runs with the same input. We observed that v-measures scores were quite consistent within birds across repeated runs of the UMAP, and have added an additional supplementary figure to the revised manuscript showing this (Figure 2–figure supplement 4).

      Reviewer #1 (Recommendations For The Authors):

      (1) Benchmark their similarity score to the method used by Goffinet et al, 2021 from the Pearson group. Such a comparison would be really interesting and useful.  

      This has been added to the paper. 

      (2) Please clarify exactly what is new and what is applied from existing methods to help the reader see the novelty of the paper.  

      We have added more emphasis on the novel aspects of our pipeline to the paper’s introduction. 

      Minor:

      It's unclear if AVN is appropriate as the paper deals only with zebra finch song - the scope is more limited than advertised.

      We assume this is in reference to ‘Birdsong’ in the paper’s title and ‘Avian’ in Avian Vocalization Network. There is a brief discussion of how these methods are likely to perform on other commonly studied songbird species at the end of the discussion section.

      Reviewer #2 (Recommendations For The Authors):

      A few points for the authors to consider that might strengthen or inform the paper:

      (1) In the public review, I detailed some ways in which the SSL+EMD approach is unlikely to be appreciably distinct from the VAE+MMD approach -- in fact, one could mix and match here. It would strengthen the authors' claim if they showed via experiments that their method outperforms VAE+MMD, but in the absence of that, a discussion of the relation between the two is probably warranted.  

      This comparison has been added to the paper.

      (2) ll. 305-310: This loss of accuracy near the edge is expected on general Bayesian grounds. Any regression approach should learn to estimate the conditional mean of the age distribution given the data, so ages estimated from data will be pulled inward toward the location of most training data. This bias is somewhat mitigated in the Brudner paper by a more flexible model, but it's a general (and expected) feature of the approach.

      (3) While the online AVA documentation looks good, it might benefit from a page on design philosophy that lays out how the various modules fit together - something between the tutorials and the nitty-gritty API. That way, users would be able to get a sense of where they should look if they want to harness pieces of functionality beyond the tutorials.

      Thank you for this suggestion. We will add a page on AVN’s design philosophy to the online documentation. 

      (4) While the manuscript does compare AVN to packages like TweetyNet and AVA that share some functionality, it doesn't really mention what's been going on with the vocalpy ecosystem, where the maintainers have been doing a lot to standardize data processing, integrate tools, etc. I would suggest a few words about how AVN might integrate with these efforts.

      We thank the reviewer for this suggestion.

      (5) ll. 333-336: It would be helpful to provide a citation to some of the self-supervised learning literature this procedure is based on. Some citations are provided in methods, but the general approach is worth citing, in my opinion. 

      We have added a paragraph to the results section with more background on self-supervised learning for dimensionality reduction, particularly in the context of similarity scoring.

      (6) One software concern for medium-term maintenance: AVN docs say to use Python 3.8, and GitHub says the package is 3.9 compatible. I also saw in the toml file that 3.10 and above are not supported. It's worth noting that Python 3.9 reaches its end of life in October 2025, so some dependencies may have to be altered or changed for the package to be viable going forward.  

      Thank you for this comment. We will continue to maintain AVN and update its dependencies as needed.

      Minor points:

      (1) It might be good to note that WhisperSeg is a different install from AVN. May be hard for novice users, though there's a web interface that's available. 

      We’ve added a line to the methods section making this clear. 

      (2) Figure 6b: Some text in the y-axis labels is overlapping here. 

      This has been fixed. Thank you for bringing it to our attention. 

      (3) The name of the Python language is always capitalized.  

      We’ve fixed this capitalization error throughout the manuscript. Thank you.

      Reviewer #3 (Recommendations For The Authors):

      (1) I recommend that the authors improve the motivation of the chosen tasks and data or choose new tasks that more clearly speak to the optimizations they want to perform. 

      We have included more details about the motivation for our LDA classification analysis, age prediction model and embedding model for similarity scoring in the results of the revised manuscript, as discussed in more detail in the above responses to this reviewer. Thank you for these suggestions. 

      (2) They need to rigorously report the (classification) scores on the test datasets: these are the scores associated with the cost function used during training.  

      Based on this reviewer’s ‘Weaknesses: 3’ comment in the public reviews, we believe that they are referring to a classification score for the triplet loss model. As we explained in response to that comment, this is not a classification task, therefor there is no classification score to report. The loss function used to train the model was a triplet loss function. While we could report these values, they are not informative for how well this approach would perform in a similarity scoring context, as explained above. As such, we prefer to include contrast index and tutor contrast index scores to compare the models’ performance for similarity score, as these are directly relevant to the task and are established in the field for said task.

      (3) They need to explain the reasons for the poor performance (or report on the inconsistencies with previous work) and why they prefer a fully automated system rather than one that needs some fine-tuning on bird-specific data.

      We’ve addressed this comment in the public response to this reviewer’s weakness points 3, 5, and 6. 

      (4) They should consider applying their method to data from Japanese and European labs.  

      We’ve addressed this comment in the public response to this reviewer’s weakness point 4.

      (5) The need to document the failure modes and report all details about the human annotations.  

      We’ve added additional description of the failure modes for our segmentation and labeling approaches in the results section of the revised manuscript.

      Details: 

      The introduction is very vague, it fails to make a clear case of what the problem is and what the approach is. It reads a bit like an advertisement for machine learning: we are given a hammer and are looking for a nail.  

      We thank the reviewer for this viewpoint; however, we disagree and have decided to keep our Introduction largely unchanged. 

      L46 That interpretability is needed to maximize the benefits of machine learning is wrong, see self-driving cars and chat GPT.  

      This line states that ‘To truly maximize the benefits of machine learning and deep learning methods for behavior analysis, their power must be balanced with interpretability and generalizability’. We firmly believe that interpretability is critically important when using machine learning tools to gain a deeper scientific understanding of data, including animal behavior data in a neuroscience context. We believe that the introduction and discussion of this paper already provide strong evidence for this claim. 

      L64 What about zebra finches that repeat a syllable in the motif, how are repetitions dealt with by AVN?  

      This is already described in the results section in lines 222-226, and in the methods in the ‘Syntax Features: Repetition Bouts’ section.

      L107 Say a bit more here, what exactly has been annotated?  

      We’ve added a sentence in the introduction to clarify this. Line 113-115. 

      L112 Define spectrogram frames. Do these always fully or sometimes partially contain a vocalization? 

      Spectrogram frames are individual time bins used to compute the spectrogram using a short-term Fourier transform. As described in the ‘Methods; Labeling : UMAP Dimensionality Reduction” section, our spectrograms are computed using ‘The short term Fourier transform of the normalized audio for each syllable […] with a window length of 512 samples and a hop length of 128 samples’. Given that the song files have a standard sampling rate of 44.1kHz, this means each time bin represents 11.6ms of song data, with successive frames advancing in time by 2.9ms. These contain only a small fraction of a vocalization. 

      L122 The reported TweetyNet score of 0.824 is lower than the one reported in Figure 2a.  

      The center line in the box plot in Figure 2a represents the median of the distribution of TweetyNet vmeasure scores. Given that there are a couple outlying birds with very low scores, the mean (0.824 as reported in the text of the results section) is lower than the median. This is not an error.

      L155 Some of the differences in performance are very small, reporting of the P value might be necessary. 

      These methods are unlikely to statistically significantly differ in their validation scores. This doesn’t mean that we cannot use the mean/median values reported to justify favoring one method over another. This is why we’ve chosen not to report p-values here.

      L161 The authors have not really tested more than a single clustering method, failing to show a serious attempt to achieve good performance.  

      We’ve addressed this comment in the public response to this reviewer’s weakness point 2.

      L186 Did isolate birds produce stereotyped syllables that can be clustered? 

      Yes, they did. The validation for clustering of isolate bird songs can be found in Figure 2–figure supplement 4. 

      Fig. 3e: How were the multiple bouts aligned?

      This is described in lines 857-876 in the ‘Methods: Song Timing Features: Rhythm Spectrograms” section of the paper.

      L199 There is a space missing in front of (n=8).  

      Thank you for bringing this to our attention. It’s been corrected in the updated manuscript. 

      L268 Define classification accuracy.  

      We’ve added a sentence in lines 953-954 of the methods section defining classification accuracy. 

      L325 How many motifs need to be identified, why does this need to be done manually? There are semiautomated methods that can allow scaling, these should be  cited here. Also, the mention of bias here should be removed in favor of a more extensive discussion on the experimenter bias (traditionally vs Texas bias (in this paper).  

      All of the methods cited in this line have graphical user interfaces that require users to select a file containing song and manually highlight the start and end each motif to be compared. The exact number of motifs required varies depending on the specific context (e.g. more examples are needed to detect more subtle differences or changes in song similarity) but it is fairly standard for reviewers to score 30 – 100 pairs of motifs. 

      We’ve discussed the tradeoffs between full automation and supervised or human-in-the loop methods in response to this reviewer’s public comment ‘weakness #5 and 6’. Briefly, AVN’s aim is to standardize song analysis, to allow direct comparisons between song features and similarity scores across research groups. We believe, as explained in the paper, that this can be best achieve by having different research groups use the same deep learning models, which perform consistently well across those groups. Introducing semi-automated methods would defeat this benefit of AVN. 

      We’ve also addressed the question of ‘Texas bias’ in response to their reviewer’s public comment ‘Weakness #4’. 

      L340 How is EMD applied? Syllables are points in 8-dim space, but now suddenly authors talk about distributions without explaining how they got from points to distributions. Same in L925.  

      We apologize for the confusion here. The syllable points in the 8-d space are collectively an empirical distribution, not a probability distribution. We referred to them simply as ‘distributions’ to limit technical jargon in the results of the paper, but have changed this to more precise language in the revised manuscript.

      L351 Why do authors now use 'contrast index' to measure performance and no longer 'classification accuracy'?  

      We’ve addressed this comment in the public response to this reviewer’s weakness points 1 and 2.

      Figure 6 What is the confusion matrix, i.e. how well can the model identify pupil-pupil pairings from pupiltutor and from pupil-unrelated pairings? I guess that would amount to something like classification accuracy.  

      There is no model classifying comparisons as pupil-pupil vs. pupil-tutor etc. These comparisons exist only to show the behavior of the similarity scoring approach, which consists of a dissimilarity measure (MMD or EMD) applied to low dimensional representations of syllable generated by the triplet loss model or VAE. This was clarified further in our public response to this reviewer’s weakness points 1 and 2. 

      L487 What are 'song files', and what do they contain?   

      ‘Song files’ are .wav files containing recordings of zebra finch song. They typically contain a single song bout, but they can include multiple song bouts if they are produced close together, or incomplete song bouts if the introductory notes were very soft or the bouts were very long (>30s from the start of the file). Details of these recordings are provided in the ‘Methods: Data Acquisition: UTSW Dataset’ section of the manuscript.

      L497 Calls were only labelled for tweetynet but not for other tasks.  

      That is correct. The rationale for this is provided in the ‘Methods: Manual Song Annotation’ section of the manuscript. 

      L637 There is a contradiction (can something be assigned to the 'own manual annotation category' when the same sentence states that this is done 'without manual annotation'?) 

      We believe there is confusion here between automated annotation and validation. Any bird can be automatically annotated without the need for any existing manual annotations for that individual bird. However, manual labels are required to compare automatically generated annotations against for validation of the method.

      L970 Spectograms of what? (what is the beginning of a song bout, L972). 

      The beginning of a song bout is the first introductory note produced by a bird after a period without vocalizations. This is standard.

    1. eLife Assessment

      This valuable study tests whether prediction error or prediction uncertainty controls how the brain segments continuous experience into events. The paper uses validated models that predict human behavior to analyze multivariate neural pattern changes during naturalistic movie watching. The authors provide solid evidence that there are overlapping but partially distinct brain dynamics for each signal.

    2. Reviewer #1 (Public review):

      Summary:

      This paper investigates the control signals that drive event model updating during continuous experience. The authors apply predictions from previously published computational models to fMRI data acquired while participants watched naturalistic video stimuli. They first examine the time course of BOLD pattern changes around human-annotated event boundaries, revealing pattern changes preceding the boundary in anterior temporal and then parietal regions, followed by pattern stabilization across many regions. The authors then analyze time courses around boundaries generated by a model that updates event models based on prediction error and another that uses prediction uncertainty. These analyses reveal overlapping but partially distinct dynamics for each boundary type, suggesting that both signals may contribute to event segmentation processes in the brain.

      Strengths:

      (1) The question addressed by this paper is of high interest to researchers working on event cognition, perception, and memory. There has been considerable debate about what kinds of signals drive event boundaries, and this paper directly engages with that debate by comparing prediction error and prediction uncertainty as candidate control signals.

      (2) The authors use computational models that explain significant variance in human boundary judgments, and they report the variance explained clearly in the paper.

      (3) The authors' method of using computational models to generate predictions about when event model updating should occur is a valuable mechanistic alternative to methods like HMM or GSBS, which are data-driven.

      (4) The paper utilizes an analysis framework that characterizes how multivariate BOLD pattern dissimilarity evolves before and after boundaries. This approach offers an advance over previous work focused on just the boundary or post-boundary points.

      Weaknesses:

      (1) While the paper raises the possibility that both prediction error and uncertainty could serve as control signals, it does not offer a strong theoretical rationale for why the brain would benefit from multiple (empirically correlated) signals. What distinct advantages do these signals provide? This may be discussed in the authors' prior modeling work, but is left too implicit in this paper.

      (2) Boundaries derived from prediction error and uncertainty are correlated for the naturalistic stimuli. This raises some concerns about how well their distinct contributions to brain activity can be separated. The authors should consider whether they can leverage timepoints where the models make different predictions to make a stronger case for brain regions that are responsive to one vs the other.

      (3) The authors refer to a baseline measure of pattern dissimilarity, which their dissimilarity measure of interest is relative to, but it's not clear how this baseline is computed. Since the interpretation of increases or decreases in dissimilarity depends on this reference point, more clarity is needed.

      (4) The authors report an average event length of ~20 seconds, and they also look at +20 and -20 seconds around each event boundary. Thus, it's unclear how often pre- and post-boundary timepoints are part of adjacent events. This complicates the interpretations of the reported time courses.

      (5) The authors describe a sequence of neural pattern shifts during each type of boundary, but offer little setup of what pattern shifts we might expect or why. They also offer little discussion of what cognitive processes these shifts might reflect. The paper would benefit from a more thorough setup for the neural results and a discussion that comments on how the results inform our understanding of what these brain regions contribute to event models.