10,000 Matching Annotations
  1. Apr 2026
    1. Reviewer #1 (Public review):

      [Editors' note: this version has been assessed by the Reviewing Editor without further input from the original reviewers. The authors have addressed the comments raised in the previous round of review.]

      Summary:

      This study provided key experimental evidence for the "Solstice-as-Phenology-Switch Hypothesis" through two temperature manipulation experiments.

      Strengths:

      The research is data-rich, particularly in exploring the effects of pre- and post-solstice cooling, as well as daytime versus nighttime cooling, on bud set timing, showcasing significant innovation. The article is well-written, logically clear, and is likely to attract a wide readership.

    2. Reviewer #2 (Public review):

      In 'Developmental constraints mediate the reversal of temperature effects on the autumn phenology of European beech after the summer solstice', Rebindaine and co-authors report on two experiments on Fagus sylvatica where they manipulated temperatures of saplings between day and night and at different times of year. I think the experiments are interesting, but note that the treatments are extreme compared to natural conditions. Further, given that much of the experiment happened outside, I am not sure how much we can generalize from one year for each experiment, especially when conducted on one population of one species.

    3. Author Response:

      The following is the authors’ response to the previous reviews

      eLife Assessment

      This article presents useful findings on how the timing of cooling affects the timing of autumn bud set in European beech saplings. The study leverages extensive experimental data and provides an interesting conceptual framework for the various ways in which warming can affect but set timing. The statistical analysis is compelling, but indicates some factors that may temper the authors' claims, while the designs of experiments offer incomplete support for the current claims as they rely on one population under extreme conditions for only one year each while a confounding effect (time in a chamber) sometimes lacks a control.

      We thank the editor and reviewers for their consideration of our revised manuscript and for their constructive suggestions. In response to the editor’s guidance, we have ensured that: 1) the experimental design is clearly presented as physiological forcing, 2) the Solstice-as-Phenology-Switch concept is explicitly defined, limited, and framed as inferred, 3) conclusions are strictly aligned with the scope of the evidence, and limitations are acknowledged transparently.

      We hope these revisions fully address the remaining concerns and clarify both the conceptual framework and the appropriate scope of inference.

      Public Review:

      Reviewer #1 (Public review):

      The authors identified the summer solstice (June 21) as a phenological "switch point", but the flexibility of this switch point remains poorly understood. A more precise explanation of what "flexibility" means in this context is needed, along with a description of the specific experimental results that would demonstrate this flexibility.

      We agree that the concept of “flexibility” required clearer definition and a more explicit link to the experimental results. In the Introduction, we now explicitly define flexibility as the capacity for the effective timing of the phenological switch to shift earlier or later depending on developmental progression, rather than occurring at a fixed calendar date. This switch occurs at the compensatory point between the antagonistic influences of early-season development [ESD effect] and late-season temperature [LST effect](L92-98). We have extended and clarified our explanation of the summer solstice’s role in this framework (L69-90). We propose that the solstice acts as an environmental switch that initiates the LST effect, as declining daylengths signal trees to become responsive to late-season cooling (L92-94). The compensatory point then occurs where the advancing ESD effect is balanced by the delaying LST effect. This point should therefore not be fixed to a calendar date but instead vary with developmental progression each year (L75-95).

      In the Discussion, we clarify that flexibility is demonstrated experimentally by the observation that the magnitude of July cooling effects (LST effect) on autumn phenology depend on prior developmental rate (ESD effect) [3.4 times greater delay in late-leafing trees], indicating that the position of the compensatory point is development-dependent rather than fixed to June 21 (L398-410). We have made consistent edits throughout the Discussion, in particular in the ‘Support for the Solstice-as-Phenology-Switch Hypothesis’ subsection (L514-530).

      The experiment did not directly measure the specific date of the phenological switch point. Instead, it was inferred by comparing temperature effects before and after the solstice. The manuscript should clearly state that this switch point remains an inferred conceptual node rather than a directly measured variable.

      We fully agree and have clarified this in the revised manuscript. In the Discussion, we now clearly state that the compensatory point is a conceptual node inferred from responses to cooling before the solstice (June), directly after it (July), or later in the growing season (August) rather than a directly observed phenological event (L352-358 & L405-406).

      In Experiment 1, the effect of bud type (terminal vs. lateral) was inconsistent across the overall model and the different leafing groups. The authors should provide a more thorough discussion of potential reasons for this inconsistency.

      This inconsistency reflects biological complexity. In the Discussion, we now expand our interpretation to note that terminal and lateral buds may differ in developmental status, resource allocation and hormonal context. We emphasize that bud-type effects are therefore expected to be context-dependent and to interact with wholeplant developmental state, which plausibly explains why effects differ across leafing groups and models (L390-396).

      In addition, the statistical model for Experiment 1 indicates that the measured variables (summer cooling and leaf emergence date) explain only 23.4% of the variation in bud formation timing. This leaves over 76% of the variation unexplained, suggesting that other important factors are involved. The discussion should address this limitation in greater depth, moving beyond a focus on the measured variables.

      We now discuss the explained and unexplained variance in more detail. We also make it clear that our experiment was designed to test specific mechanistic pathways rather than to fully explain all phenological variability or maximise predictive power L417-419).

      In the Discussion, we acknowledge that a substantial fraction of variation remains unexplained (L419-421). We discuss the possibility of other physiological mechanisms, such as photosynthetic assimilation, contributing to the unexplained variation (L421-427). However, large inter-individual variability is commonplace in autumn phenology. A low intra-class correlation coefficient (ICC = 0.26; see L276-280 for methods) suggests much of the remaining variation is attributable to individual-level differences rather than missing explanatory variables (L429-431). In line with the literature, we suggest that genetic and epigenetic differences likely contributed significantly to inter-individual variation, even within a single provenance population (L431-434). In this context of high individual variability, leaf-out timing (ESD effect) and summer cooling treatment (LST effect) together explaining 23.4% of variation in bud set timing is biologically meaningful and demonstrates the mechanistic importance of these processes (L438-441). For completeness, we also briefly discuss alternate sources of within-treatment variability (L434-437).

      Reviewer #2 (Public review):

      I think the experiments are interesting, but I found the exact methods of them somewhat extreme compared to how the authors present them.

      We appreciate this concern and have substantially revised the manuscript to clarify the experimental logic. In the Introduction, we now state explicitly that the study uses temperature regimes that were designed as strong physiological forcing treatments, intended to deeply constrain development and isolate mechanisms rather than to simulate natural or future climatic conditions (L113-115).

      In the Methods, we have enhanced our description of the non-linear effects of temperatures below 10°C on physiological processes (L154-158).

      At the start of the Discussion, we have added a dedicated paragraph clarifying the scope of inference: the experiment tests causality and constraint (i.e. whether specific physiological processes can drive phenological shifts), not quantitative responses under realistic climate scenarios (L346-363). Throughout the Discussion, we have revised language that could be read as scenario-based interpretation, replacing it with mechanistic phrasing.

      Further, given that much of the experiment happened outside, I am not sure how much we can generalize from one year for each experiment, especially when conducted on one population of one species.

      Given the large individual variation expected in phenological experiments, we used single experimental populations of single provenance beech saplings to minimise uncontrolled for variation arising from genetic differences (L358-360). This allowed us to elucidate mechanisms despite noisy biological heterogeneity associated with phenology.

      In the last round of revision, we toned down statements of generalisation. In the Discussion, we now go further to clarify what mechanistic understanding can be gleamed directly from our findings and then cautiously make suggestions how these mechanisms may play out in natural systems. We repeatedly state the intention of the study as mechanistic inference rather than predictive power, e.g. “However, extrapolations to more complex natural ecosystems should be made with caution as our experimental design prioritised mechanistic inference over generalisability and predictive power.” (L417-419). Alongside our previous calls for tests on other species, we now additionally call for tests on other provenances of beech (L511-512).

      I was also very concerned by the revisions.

      If this concern stems from the confusion regarding line-numbers and the two submitted versions of the manuscript (with tracked changes and without tracked changes; as required by eLife), then we hope that situation is now clarified. Otherwise, the authors do not understand why our previous revisions would be perceived as being concerning. Regardless, we have made every attempt to address the remaining comments comprehensively.

      Further, I am at a loss about their hypothesis, when they write in their letter: "Importantly, the Solstice-asPhenology-Switch hypothesis does not assume that the reversal is fixed to June 21." Why on earth reference the solstice if the authors do not mean to exactly reference the solstice?

      We appreciate this important conceptual point. The Solstice-as-Phenology-Switch hypothesis is central to our conceptual model and therefore requires clear explanation. In concert with our changes in response to Reviewer 1’s comment regarding flexibility, we have substantially revised and improved our description of this hypothesis (L69-108).

      Whilst the summer solstice is fixed to a calendar date (June 21), the timing of when trees change their autumn phenological responses to temperature is not (L88-90 & L515-517). This occurs when the compensatory point of two antagonistic effects is crossed. Higher early-season development rates (which are driven by temperature) have an advancing (negative) effect on autumn phenology, which we now refer to as the ESD effect (L71-78). Warmer late-season temperatures have a delaying (positive) effect because trees become phenologically susceptible to cooling, i.e. overwintering responses are induced in response to cooling, which we now refer to as the LST effect (L78-82). The point in time when these two effects balance each other out, i.e. the net effect = 0, is the compensatory point (L95-97 & L523-525). The reason this point occurs after the solstice, is because the LST effect only becomes active when days begin to shorten (L92-94 & L522-523). The solstice acts as an environmental switch, initiating trees’ susceptibility to cooling. Therefore, the solstice is referenced in the hypothesis because it forms a daylength barrier. In this framework, the compensatory point cannot occur earlier than the solstice because day lengths are still increasing (L517-519).

      In the Introduction and Discussion, we clarify that the solstice is referenced as a biologically meaningful photoperiodic cue, not as a fixed threshold date. We now emphasise that the hypothesis concerns a seasonal reversal in responses to temperature structured around photoperiod, whose effective timing depends on developmental state, rather than a reversal occurring precisely on June 21. To avoid confusion, we have reworded phrases such as “summer solstice effect reversal” to “reversal of phenological responses to temperature after the summer solstice” (L371). In accordance, we have also changed the title to “Developmental constraints mediate the reversal of temperature effects on the autumn phenology of European beech after the summer solstice”.

      The following comments stem from the first round of review. We have previously revised the manuscript in accordance with these comments. For most of these points we do not see further cause for changes except for any overlap with comments above. We therefore predominantly copy our previous responses in quotes for clarity, the exception being the comment regarding the framing of our results in relation to natural systems.

      The comments below relate to my original review with many of them still applying.

      Methods: As I read the Results I was surprised the authors did not give more info on the methods here. For example, they refer to the 'effect of July cooling' but never say what the cooling was. Once I read the methods I feared they were burying this as the methods feel quite extreme given the framing of the paper.

      “We understand the concern regarding the structure of the manuscript and note that the methods section was moved to the end of the paper in accordance with eLife’s recommended formatting. We have now moved the methods section before the results to ensure that readers are familiar with the treatments before encountering the outcomes.

      Regarding presentation, treatment details are now described in both the Methods and the relevant figure legends. Given this structure, we have chosen not to restate the full treatment conditions in the main Results text to avoid repetition.”

      The paper is framed as explaining observational results of natural systems, but the treatments are not natural for any system in Europe of which I have worked in. For example a low of 2 deg C at night and 7 deg C during the day through end of May and then 7/13 deg C in July is extreme. I think these methods need to be clearly laid out for the reader so they can judge what to make of the experiment before they see the results.

      We appreciate the reviewer’s concern regarding the use of relatively extreme temperature treatments and the need to ensure that our conclusions are consistent with the motivation for using them. The manuscript was also revised in this regard in the previous round, and we copy the relevant responses at the bottom of this response. Despite this, we agree that further explanation of how our experimental treatments suited the aims of our study was still required.

      The aim of these treatments was not to reproduce typical ambient conditions, but to act as a mechanistic probe. Such mechanisms are not readily identifiable from observations or mild manipulations, because the expected effects are small relative to natural variability; stronger perturbations are therefore required to generate a diagnostic contrast. By strongly constraining development in the early-season, and by providing a robust cooling signal in the late-season, we sought to reveal the causal structure underlying the observed solstice-related reversal in temperature effects on autumn phenology.

      Temperatures below 10°C intensively slow down cell division and mitotic rates, these rates then rapidly and non-linearly approach 0 as temperatures drop towards 0°C (Körner, 2021). As reflected in L152-158 of the revised manuscript, we selected a spring cooling regime of 2–7 °C to strongly slow developmental processes while maintaining a clear thermal safety margin that eliminates the risk of frost damage. Although a milder cooling regime (e.g. 5–10 °C) would be less extreme, it would also be expected to produce only a comparatively small reduction in developmental rates, thereby substantially reducing our ability to generate distinct early- and late-developing individuals and to detect carry-over effects on autumn phenology. Applying strong cooling therefore increases signal-to-noise and allows us to detect the underlying mechanism, which would not be possible with temperature treatments that represent average contemporary climatic variation.

      The use of conditions out with the norm is a standard practice to elucidate mechanisms in ecology, where organisms are often pushed to their physiological limits or transplanted into environments fundamentally different to those which they are adapted (Somero, 2010; Berend et al., 2019). Experiments targeting autumn phenology have utilised a broad range of environmental conditions from moderate to extreme manipulations (Tanino et al., 2010). For example, to test the controls of growth cessation and dormancy induction in Prunus species, one study applied a range of treatments including constant 9°C temperature and 24 hour photoperiod between April and July (Heide, 2008).

      Our experimental design aimed to reduce rates of development, cell division and maturation. In the Methods, we describe this aim and clearly state that the experimental design was not intended to mimic natural climatic variation (L154-156 & L181-186). Importantly, our conclusions are framed at the level of direction, timing, and interaction of effects, rather than the magnitude expected under contemporary or future field conditions (L360-363).

      This framing intends to reflect the primary inference of this study, which concerns when and why temperature effects reverse around the solstice, and how this timing depends on developmental state and diel temperature exposure, rather than making quantitative predictions for present-day or future climates. This aligns our conclusions with the experimental design. We have further revised the Discussion to explain these aims and conclusions more clearly, including the addition of a subsection at the beginning titled “Experimental forcing and scope of inference” (L346-363). We have also set up this expectation in the Introduction (L113-115).

      Additionally, we have improved the Discussion in a number of related aspects.

      We explicitly separate mechanistic conclusions and any relation to natural systems, remaining cautious to not overgeneralise or overstate our findings (L417-419).

      We now include a dedicated paragraph explaining that, although these specific conditions are not likely to be found in beech’s range, analogous developmental constraints can arise during cold springs, late cold spells following budburst, or at high-elevation and continental sites where temperatures remain low despite increasing photoperiod (L540-545, L583-588). We further explain that because developmental progression integrates temperature cumulatively over time, even short episodes of strong cooling can exert lasting carry-over effects on seasonal timing, thereby linking the forced experimental responses to processes relevant under natural, fluctuating conditions (L545-550).

      We explicitly state that the decoupling of day and night temperatures was not intended to represent realistic meteorological states (L458-460). We explain that this design was used diagnostically to isolate inherently diel physiological processes (e.g. nocturnal growth, cell division and expansion versus daytime carbon assimilation), and that the observed responses demonstrate the importance of diel timing of temperature exposure rather than the realism of the imposed cycles (L460-468).

      Previous response:

      We recognise that our temperature treatments were severe and do not mimic real world scenarios. They were deliberately designed to create large contrasts in developmental rates, thereby maximising our ability to detect the mechanisms underpinning the solstice switch. For example, the severe cooling between 4 April and 24 May was specifically designed to slow spring development as much as possible without damaging the plants. We have added text in the Methods to clarify this aim.

      I also think the control is confounded with growth chamber experience in Experiment 1. That is, the control plants never experience any time in a chamber, but all the treatments include significant time in a chamber. The authors mention how detrimental chamber time can be to saplings (indeed, they mention an aphid problem in experiment 2) so I think they need to be more upfront about this. The study is still very valuable, but -- again -- we may need to be more cautious in how much we infer from the results.

      We appreciate the reviewer’s concern about the potential confounding effect of chamber exposure in experiment 1. We have now discussed this limitation more explicitly, adding further explanation to the Methods and Discussion.

      Note that chamber-related problems (e.g. aphid infestations) primarily occurred under warm chamber conditions, whereas our experiment 1 cooling treatments maintained low temperatures that suppressed such issues. This means that an equivalent “warm chamber control” could have been associated with its own artefacts, as trees kept under warm chamber conditions would have been exposed to additional stressors that were not present under natural growing conditions. To address this point, we included a chamber control in experiment 2. While aphid abundance was indeed higher in the warm chamber controls, chamber exposure itself had no detectable effect on autumn phenology. This suggests that the main findings of experiment 1 are unlikely to be artefacts of chamber conditions.

      Nevertheless, we agree that chamber exposure remains a potential limitation of experiment 1, which requires clear acknowledgement. We now state this more explicitly in the manuscript while also emphasising that our results are supported by experiment 2 and by converging lines of external evidence.

      Also, I suggest the authors add a figure to explain their experiments as they are very hard to follow. Perhaps this could be added to Figure 1?

      We have now added figures to the methods section to depict the experimental timelines and settings more clearly (Figs. 2 and 3).

      Finally, given how much the authors extrapolate to carbon and forests, I would have liked to see some metrics related to carbon assimilation, versus just information on timing.

      We agree that carbon assimilation is an important component of forest carbon dynamics. However, the primary aim of this study was to identify how developmental state and diel cycles mediate temperature effects on autumn phenology, rather than to quantify carbon assimilation per se. Assessing photosynthetic controls on autumn phenology would require a substantially different experimental design and is therefore beyond the scope of the present study.

      That said, we were able to include measurements of photosynthetic assimilation during pre-solstice cooling (now presented as Fig. S12 for all treatments). These data show that cooling strongly reduced assimilation across all treatments, despite their markedly different phenological outcomes. This supports our interpretation that variation in assimilation alone cannot explain the observed phenological responses, consistent with previous manipulative and observational studies reporting a weak role of late-season assimilation in controlling autumn phenology.

      Fagus sylvatica: Fagus sylvatica is an extremely important tree to European forests, but it also has outlier responses to photoperiod and other cues (and leafs out very late) so using just this species to then state 'our results likely are generalisable across temperate tree species' seems questionable at best.

      We agree that Fagus sylvatica has a stronger photoperiod dependence than many other European tree species. As we note in our response to Reviewer 1, our findings align with previous research across temperate northern forests. Within our framework, interspecific variation in leaf-out timing would not alter the overall response pattern, though it could shift the specific timing of effect reversals. For example, earlier-leafing species may approach completion of development sooner and thus show sensitivity to late-season cooling earlier than F. sylvatica. Nevertheless, we acknowledge the importance of not overstating generality. We have therefore revised the manuscript to phrase conclusions more cautiously and highlight the need for further research across species.

      And the referenced response to Reviewer one:

      We agree that extrapolation from our experiments on Fagus sylvatica to other species and natural forests requires caution. However, it is precisely the controlled nature of our design that allowed us to isolate the precise mechanisms that appear to underpin the solstice switch, highlighting the role of diel and seasonal temperature variation. In natural systems, additional variables such as competition, precipitation, and soil heterogeneity can strongly influence phenology, but they also make it difficult to disentangle causal mechanisms. By minimising these confounding factors, our experiment provided a clear test of how temperature before and after the solstice regulates growth cessation.

      To acknowledge the limitation, we have toned down statements about generalisation (e.g. “likely generalisable” to “other temperate tree species may display similarities”) and explicitly call for follow-up studies across species and forest contexts. At the same time, we highlight that our findings align with independent evidence from manipulative experiments, satellite observations, flux measurements, and groundbased phenology, which suggests the mechanisms we report may extend beyond the specific populations studied here.”

      As described in responses above, we have further clarified what can be directly concluded from our study, avoiding overgeneralisation.

      Measuring end of season (EOS): It's well known that different parts of plants shut down at different times and each metric of end of season -- budset, end of radial expansion, leaf coloring etc. -- relate to different things. Thus I was surprised that the authors ignore all this complexity and seem to equate leaf coloring with budset (which can happen MONTHS before leaf coloring often) and with other metrics. The paper needs a much better connection to the physiology of end of season and a better explanation for the focus on budset. Relatedly, I was surprised the authors cite almost none of the literature on budset, which generally suggests is it is heavily controlled by photoperiod and population-level differences in photoperiod cues, meaning results may different with a different population of plants. 

      We thank the reviewer for pointing out that our discussion of the responses of different EOS metrics needs more clarity. We agree with much of this perspective, and we have added an additional analysis of leaf chlorophyll content data to use leaf discolouration as an alternative EOS marker. On this we would like to make two important points:

      Firstly, we agree that bud set often occurs before leaf discolouration, although this can depend on which definition of leaf discolouration is used. In experiment 1, budset occurred on average on day-of-year (DOY) 262 and leaf senescence (50% loss of leaf chlorophyll) occurred on DOY 320. However, we do not necessarily agree that this excludes the combined discussion of bud set and leaf senescence timing. Whilst environmental drivers can affect parts of plants differently, often responses from different end-of-season indicators (e.g. bud set and loss of leaf chlorophyll) are similar, even if only directionally. Figure S11 shows how, across both experiments, treatment effects were tightly conserved (R<sup>2</sup> = 0.49) amongst the two phenometrics. In accordance with these revisions, we have updated the manuscript title to “Developmental constraints mediate the summer solstice reversal of climate effects on the autumn phenology of European beech”.

      Secondly, shifts in bud set timing remain the primary focus of the manuscript as these shifts are of direct physiological relevance to plant development and dormancy induction, whereas leaf discolouration may simply follow bud set as a symptom of developmental completion. This is supported by our results, which show stronger responses of bud set than leaf senescence (Figs. 4 & 5 vs. Figs. S9 & S10).

      Following the reviewer’s suggestion, we have included more references on the topic of bud set and its environmental controls. The reviewer rightly stresses that photoperiod is considered the most important factor. Photoperiod is therefore key in our conceptual model. However, the responses we observed in F. sylvatica cannot be explained by photoperiod alone. For example, in experiment 1, July cooling delayed the autumn phenology of late-leafing trees but had negligible impact on early-leafing trees, even though both experienced the exact same photoperiod. Moreover, in experiment 2, day, night and full-day cooling showed substantial variations in their effects despite equal photoperiod across the climate regimes. This is why we suggest that the annual progression of photoperiod modulates the responses to temperature variations instead of eliciting complete control.

      Following the addition of an analysis of leaf senescence data, we also revised the terminology in places (including the title) from “primary growth cessation/bud set” to the broader term “autumn phenology.” This term is intended to encompass two distinct but related physiological processes—bud set and leaf senescence—both of which are commonly used as markers of autumn phenology and the end of the growing season.

      Somewhat minor comments:

      (1) How can a bud type -- which is apical or lateral -- be a random effect? The model needs to try to estimate a variance for each random effect so doing this for n=2 is quite odd to me. I think the authors should also report the results with bud type as fixed, or report the bud types separately.

      We have revised the analysis to include bud type as a fixed effect. There are only very minor numerical adjustments (e.g. rounding to 4.8 days instead of 4.9) and inferences are not altered. We also report the bud type effects for experiment 1 and experiment 2.

      (2) I didn't fully see how the authors results support the Solstice as Switch hypothesis, since what timing mattered seemed to depend on the timing of treatment and was not clearly related to solstice. Could it be that these results suggest the Solstice as Switch hypothesis is actually not well supported (e.g., line 135) and instead suggest that the pattern of climate in the summer months affects end of season timing?

      Our responses to the main comments in this new round of revision have comprehensively covered this topic.

      References

      Berend K, Haynes K, MacKenzie CM. 2019. Common garden experiments as a dynamic tool for ecological studies of alpine plants and communities in northeastern North America. Rhodora 121: 174.

      Heide OM. 2008. Interaction of photoperiod and temperature in the control of growth and dormancy of Prunus species. Scientia Horticulturae 115: 309–314.

      Körner C. 2021. Alpine Plant Life: Functional Plant Ecology of High Mountain Ecosystems. Cham: Springer International Publishing.

      Somero GN. 2010. The physiology of climate change: how potentials for acclimatization and genetic adaptation will determine ‘winners’ and ‘losers’. Journal of Experimental Biology 213: 912–920.

      Tanino KK, Kalcsits L, Silim S, Kendall E, Gray GR. 2010. Temperature-driven plasticity in growth cessation and dormancy development in deciduous woody plants: a working hypothesis suggesting how molecular and cellular function is affected by temperature during dormancy induction. Plant Molecular Biology 73: 49–65.

    1. eLife Assessment

      This important study combined careful computational modeling, a large patient sample, and replication in an independent general population sample to provide convincing evidence in support of a computational account of a difference in risk-taking between people who have attempted suicide and those who have not. It is proposed that this difference reflects a general change in the approach to risky (high-reward) options and a lower emotional response to certain rewards. While the findings advance our understanding of cognitive mechanisms at the group level, the observation that computational phenotype is predictive of suicidal behavior only in the clinical sample and not in the online sample limits its applicability for individual prediction, early detection and prevention of suicidality.

    2. Reviewer #1 (Public review):

      Summary:

      The authors use a gambling task with momentary mood ratings from Rutledge et al. and compare computational models of choice and mood to identify markers of decisional and affective impairments underlying risk-prone behavior in adolescents with suicidal thoughts and behaviors (STB). The results show that adolescents with STB show enhanced gambling behavior (choosing the gamble rather than the sure amount), and this is driven by a bias towards the largest possible win rather than insensitivity to possible losses. Moreover, this group shows a diminished effect of receiving a certain reward (in the non-gambling trials) on mood. The results were replicated in a general online sample where participants were divided into groups with or without STB based on their self-report of suicidal ideation on one question in the Beck Depression Inventory self-report instrument. The authors suggest, therefore, that adolescents diagnosed with depression or anxiety with decreased sensitivity to certain rewards may need to be monitored more closely for STB due to their increased propensity to take risky decisions aimed at (expected) gains (such as relief from an unbearable situation through suicide) regardless of the potential losses. However, such a result was only found in the clinical sample and cannot be generalized more broadly based on the current findings.

      Strengths:

      (1) The study uses a previously validated task design and replicates previously found results through well-explained model-free and model-based analyses.

      (2) Sampling of adolescents at high risk can help target early preventative diagnoses and treatments for suicide.

      (3) Replication of the results in an online cohort increases confidence in the findings.

      (4) The models considered for comparison are thorough and well-motivated. The chosen models allow for teasing apart which decision and mood sensitivity parameters relate to risky decision-making across groups based on their hypotheses.

      (5) Novel finding of mood (in)sensitivity to non-risky rewards and its relationship with risk behavior in STB.

      Weaknesses:

      (1) Sample size of 25 for S- group is low-powered, which is explicitly mentioned as a study limitation.

      (2) Modeling in the mediation analysis focused on predicting risk behavior in this task from the model-derived bias for gains and suicidal symptom scores. Thus, the implications of this work are more relevant to a basic-science understanding of the etiology of suicidal behavior than they are useful as a predictor of suicidal behavior, and it is not clear that a psychiatrist or psychologist could use this task to potentially determine who is at higher risk of attempting suicide and must be more closely monitored. Indeed, relationships between task parameters and behavior and suicidal behavior was limited to the clinical sample with a diagnosis of depression or anxiety disorder, and did not extend to the online sample. Therefore, the claim that these findings provide "computational markers for general suicidal tendency among adolescents" is unwarranted.

    3. Reviewer #2 (Public review):

      Summary:

      This article addresses a very pertinent question - what are the computational mechanisms underlying risky behaviour in patients having attempted suicide. In particular, it is impressive how the authors find a broad behavioral effect whose mechanisms they can then explain and refine through computational modeling. This work is important because currently, beyond previous suicide attempts, there has been a lack of predictive measures. This study is the first step towards that: understanding the cognition on a group level. Before then being able to include it in future predictive studies (based on the cross-sectional data, this study by itself cannot assess the predictive validity of the measure).

      Strengths:

      - Large sample size<br /> - Replication of their own findings<br /> - Well-controlled task with measures of behaviour and mood + precise and well-validated computational modeling

      Questions, based on revised manuscript and replies to other reviewers:

      (1) Replies to reviewers in general: Bayes Factors have been added, it would be good to also use common verbal terms to describe them (e.g. 'anecdotal', 'moderate' etc). For example, my reading of table S8 would be that for gambling rate there is only anecdotal evidence that it does not relate to PSWQ, BDI, and moderate evidence it does not relate to TAI.

      (2) Reply to reviewer 1 Q2 (Predicting STB):<br /> For the regression predicting suicidal ideation, it seems to me that what you did was a regression STB ~ gambling behaviour + approach + mood? Could you clarify? I had expected as a test of whether the task can predict STB risk something slightly different - a cross-validation (LOO or maybe 5-fold in the large sample): STB ~ gambling behaviour + approach [parameter from model] + mood [parameter from model]; and then computing in the left out participants: predicted STB. Then checking correlation between STB and predicted STB. This would allow testing whether the diverse task measures together predict STB (with the caveat, that it's cross-validated, rather than hold-out sample, unless you could train on one sample (in lab) and test on the other (online).

      (3) Reply to reviewer 2 Q1 (parameter recovery): I'm looking at S3, it seems to still show only the scatter plots and not the correlation matrices, which are now added as text notes. Can you actually show these matrices? An off-diagonal correlation of 0.63 appears quite high. I think it needs to be discussed exactly which parameters those are, and whether that impacts the interpretation of the results.

      (4) Reply to reviewer 3 Q3 (mood model): I would have imagined that the response would involve changing the mood equations (equation 8 main text) to include a term for whether the participant gambled or not, independent of the gamble value.

    4. Reviewer #3 (Public review):

      This manuscript investigates computational mechanisms underlying increased risk-taking behavior in adolescent patients with suicidal thoughts and behaviors. Using a well-established gambling task that incorporates momentary mood ratings and previously established computational modeling approaches, the authors identify particular aspects of choice behavior (which they term approach bias) and mood responsivity (to certain rewards) that differ as a function of suicidality. The authors replicate their findings on both clinical and large-scale non-clinical samples.

      The main problem, however, is that the results do not seem to support a specific conclusion with regard to suicidality. The S+ and S- groups differ substantially in the severity of symptoms, as can be seen by all symptom questionnaires and the baseline and mean mood, where S- is closer to HC than it is to S+. The main analyses control for illness duration and medication but not for symptom severity. The supplementary analysis in Figure S11 is insufficient as it mistakes the absence of evidence (i.e., p > 0.05) for evidence of absence. Therefore, the results do not adequately deconfound suicidality from general symptom severity.

      The second main issue is that the relationship between an increased approach bias and decreased mood response to CR is conceptually unclear. In this respect, it would be natural to test whether mood responses influence subsequent gambling choices. This could be done either within the model by having mood moderate the approach bias or outside the model using model-agnostic analyses.

      Additionally, there is a conceptual inconsistency between the choice and mood findings that partly results from the analytic strategy. The approach bias is implemented in choice as a categorical value-independent effect, whereas the mood responses always scale linearly with the magnitude of outcomes. One way to make the models more conceptually related would be to include a categorical value-independent mood response to choosing to gamble/not to gamble.

      The manuscript requires editing to improve clarity and precision. The use of terms such as "mood" and "approach motivation" is often inaccurate or not sufficiently specific. There are also many grammatical errors throughout the text.

      Claims of clinical relevance should be toned down, given that the findings are based on noisy parameter estimates whose clinical utility for the treatment of an individual patient is doubtful at best.

      Comments on revisions:'

      The authors adequately addressed my comments and I find the manuscript substantially strengthened.

    5. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This valuable study combined careful computational modeling, a large patient sample, and replication in an independent general population sample to provide a computational account of a difference in risk-taking between people who have attempted suicide and those who have not. It is proposed that this difference reflects a general change in the approach to risky (high-reward) options and a lower emotional response to certain rewards. Evidence for the specificity of the effect to suicide, however, is incomplete, which would require additional analyses.

      We thank the editors and reviewers for this important assessment. Based on clinical interviews, we included patients with and without suicidality (S<sup>+</sup> and S<sup>-</sup> groups). However, in line with suicidal-related literature (e.g., Tsypes et al., 2024), two groups also differed substantially in the severity of symptoms (see Table 1). To address the request for evidence on specificity to suicidality beyond general symptom severity, we performed separate linear regressions to explain in gambling behaviour, value-insensitive approach parameter (β<sub>gain</sub>), and mood sensitivity to certain rewards (β<sub>CR</sub>) with group as a predictor (1 for S<sup>+</sup> group and 0 for S<sup>-</sup> group) and scores for anxiety and depression as covariates. Results remained significant after controlling anxiety and depression (ps < 0.027; Table S8). Given high correlations among anxiety and depression questionnaires (rs > 0.753, ps < 0.001), we performed Principal Components Analysis (PCA) on the clinical questionnaire to extract the orthogonal components, where each component explained 86.95%, 7.09%, 3.27%, and 2.68% variance, respectively. We then performed linear regressions using these components as covariates to control for anxiety and depression. Our main results remained significant (ps < 0.027; Table S9). We believe that these analyses provide evidence that the main effects on gambling and on mood were specific to suicide.

      Moreover, as Reviewer 3 pointed out, these “absence of evidence” cannot provide insights of “evidence of absence”. Although we median-split patients by the scores of general symptoms (e.g., depression and anxiety-related questionnaires) and verified no significant differences in these severities (Figure S11), we additionally conducted Bayesian statistics in gambling behavior, value-insensitive approach parameter, and mood sensitivity to certain rewards. BF<sub>01</sub> is a Bayes factor comparing the null model (M<sub>0</sub>) to the alternative model (M<sub>1</sub>), where M<sub>0</sub> assumes no group difference. BF<sub>01</sub> > 1 indicates that evidence favors M<sub>0</sub>. As can be seen in Table S7, most results supported null hypothesis, suggesting that general symptoms of anxiety and depression overall did not influence our main results. Overall, we believe that these analyses provide compelling evidence for the specificity of the effect to suicide, above and beyond depression and anxiety.

      Beyond these specific findings, this work highlights the broader utility of computational modelling and mood to better understand behavioral effect, showing how to use both mood and choice data to better comprehend a psychiatric issue. 

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors use a gambling task with momentary mood ratings from Rutledge et al. and compare computational models of choice and mood to identify markers of decisional and affective impairments underlying risk-prone behavior in adolescents with suicidal thoughts and behaviors (STB). The results show that adolescents with STB show enhanced gambling behavior (choosing the gamble rather than the sure amount), and this is driven by a bias towards the largest possible win rather than insensitivity to possible losses. Moreover, this group shows a diminished effect of receiving a certain reward (in the non-gambling trials) on mood. The results were replicated in an undifferentiated online sample where participants were divided into groups with or without STB based on their self-report of suicidal ideation on one question in the Beck Depression Inventory self-report instrument. The authors suggest, therefore, that adolescents with decreased sensitivity to certain rewards may need to be monitored more closely for STB due to their increased propensity to take risky decisions aimed at (expected) gains (such as relief from an unbearable situation through suicide), regardless of the potential losses.

      Strengths:

      (1) The study uses a previously validated task design and replicates previously found results through well-explained model-free and model-based analyses.

      (2) Sampling choice is optimal, with adolescents at high risk; an ideal cohort to target early preventative diagnoses and treatments for suicide.

      (3) Replication of the results in an online cohort increases confidence in the findings.

      (4) The models considered for comparison are thorough and well-motivated. The chosen models allow for teasing apart which decision and mood sensitivity parameters relate to risky decision-making across groups based on their hypotheses.

      (5) Novel finding of mood (in)sensitivity to non-risky rewards and its relationship with risk behavior in STB.

      Weaknesses:

      (1) The sample size of 25 for the S- group was justified based on previous studies (lines 181-183); however, all three papers cited mention that their sample was low powered as a study limitation.

      We thank the Reviewer for rising this concern. We agree that the sample size for S<sup>-</sup> group (n=25) is modest, and the prior studies we cited also acknowledged limited power. We wanted to point out that we obtained a comparable sample size to a prior study. In the revision, we therefore updated the section to justify this sample size in which we acknowledge the limited power of our study in the limitation section. Please see our clarification below:

      Page 32:

      “Third, despite replicating our main results in an independent dataset (n=747), the modest S<sup>-</sup> subgroup size (n=25) has a limited statistical power.”

      (2) Modeling in the mediation analysis focused on predicting risk behavior in this task from the model-derived bias for gains and suicidal symptom scores. However, the prediction of clinical interest is of suicidal behaviors from task parameters/behavior - as a psychiatrist or psychologist, I would want to use this task to potentially determine who is at higher risk of attempting suicide and therefore needs to be more closely watched rather than the other way around (predicting behavior in the task from their symptom profile). Unfortunately, the analyses presented do not show that this prediction can be made using the current task. I was left wondering: is there a correlation between beta_gain and STB? It is also important to test for the same relationships between task parameters and behavior in the healthy control group, or to clarify that the recommendations for potential clinical relevance of these findings apply exclusively to people with a diagnosis of depression or anxiety disorder. Indeed, in line 672, the authors claim their results provide "computational markers for general suicidal tendency among adolescents", but this was not shown here, as there were no models predicting STB within patient groups or across patients and healthy controls.

      Thank you for these thoughtful comments. Our study focuses on why adolescent patients with suicidality have increased risk behavior, aiming to provide a mechanism-based target for suicide prevention. Therefore, our dependent variable in the mediation model was gambling behavior. We also agree that the clinically relevant question is whether suicidality can be predicted from task-derived behavior/parameters. We thus used risky behavior and the potential mental parameters to predict STB. Linear regressions showed that gambling behavior, as well as the value-insensitive approach parameter, can predict suicidal symptom scores among patients (former: β = 9.189, t = 2.004, p = 0.048; latter: β = 5.587, t = 2.890, p = 0.005). In healthy controls, these predictions failed (gambling behavior: β = 1.471, t = 0.825, p = 0.411; approach: β = 0.874, t = 1.178, p = 0.241). These results suggest that clinical relevance of these findings apply exclusively to people with a diagnosis of depression or anxiety disorder. We found same patterns for the mood parameter (mood sensitivity to certain rewards: patients: β = -28.706, t = -2.801, p = 0.006; healthy controls: β = -2.204, t = -0.528, p = 0.599). In sum, we believe that our statement of “computational markers for general suicidal tendency among adolescents” is reasonable now. Please see our revisions below:

      Page 17:

      “Furthermore, linear regression showed that gambling rate can predict the current suicidal ideation score (BSI-C, β = 9.189, t = 2.004, p = 0.048) among patients, but not among HC (β = 1.471, t = 0.825, p = 0.411), suggesting that gambling behavior has patient-specific predictive utility for suicidal symptoms.”

      Page 19:

      “Furthermore, linear regression showed that approach parameter can predict the current suicidal ideation score (β = 5.587, t = 2.890, p = 0.005) among patients, but not among HC (β = 0.874, t = 1.178, p = 0.241), suggesting that value-insensitive approach parameter has patient-specific predictive utility for suicidal symptoms.”

      Page 21:

      “Furthermore, linear regression showed that mood sensitivity to CR can predict the current suicidal ideation score (β = -28.706, t = -2.801, p = 0.006) among patients, but not among HC (β = -2.204, t = 0.528, p = 0.599), suggesting that mood sensitivity to CR has patient-specific predictive utility for suicidal symptoms.”

      (3) The FDR correction for multiple comparisons mentioned briefly in lines 536-538 was not clear. Which analyses were included in the FDR correction? In particular, did the correlations between gambling rate and BSI-C/BSI-W survive such correction? Were there other correlations tested here (e.g., with the TAI score or ERQ-R and ERQ-S) that should be corrected for? Did the mediation model survive FDR correction? Was there a correction for other mediation models (e.g., with BSI-W as a predictor), or was this specific model hypothesized and pre-registered, and therefore no other models were considered? Did the differences in beta_gain across groups survive FDR when including comparisons of all other parameters across groups? Because the results were replicated in the online dataset, it is ok if they did not survive FDR in the patient dataset, but it is important to be clear about this in presenting the findings in the patient dataset.

      Thank you for raising the important issue of multiple testing and for asking us to clarify exactly which tests were covered by the FDR procedure. In the clinical dataset we conducted a large number of inferential tests (χ<sup>2</sup>, t-tests, ANOVAs, regressions) spanning: (i) group differences in demographic/clinical characteristics; (ii) sanity checks (e.g., anxiety/depression questionnaires); (iii) primary hypotheses (e.g., group differences in risky behavior); (iv) model-based analyses (parameter checks and between-group contrasts); and (v) control/sensitivity analyses. Post-hoc t-tests were performed only when the three-group ANOVA was significant. This yielded >150 p-values. FDR was applied using all these p-values. Please see our clarification below:

      Supplementary Page 4:

      “Supplementary Note 8: Clarification for FDR correction.

      In the clinical dataset we conducted a large number of inferential tests (χ<sup2\</sup>, t-tests, ANOVAs, regressions) spanning: (i) group differences in demographic/clinical characteristics; (ii) sanity checks (e.g., anxiety/depression questionnaires); (iii) primary hypotheses (e.g., group differences in risky behavior); (iv) model-based analyses (parameter checks and between-group contrasts); and (v) control/sensitivity analyses. Post-hoc t-tests were performed only when the three-group ANOVA was significant. This yielded >150 p-values. FDR was applied using all these p-values.”

      (4) There is a lack of explicit mention when replication analyses differ from the analyses in the patient sample. For instance, the mediation model is different in the two samples: in the patient sample, it is only tested in S+ and S- groups, but not in healthy controls, and the model relates a dimensional measure of suicidal symptoms to gambling in the task, whereas in the online sample, the model includes all participants (including those who are presumably equivalent to healthy controls) and the predictor is a binary measure of S+ versus S- rather than the response to item 9 in the BDI. Indeed, some results did not replicate at all and this needs to be emphasized more as the lack of replication can be interpreted not only as "the link between mood sensitivity to CR and gambling behavior may be specifically observable in suicidal patients" (lines 582-585) - it may also be that this link is not truly there, and without a replication it needs to be interpreted with caution.

      Thank you for these important comments. This study focused on cognitive and affective computational mechanisms underlying increased risky behavior in STB. Accordingly, we compared patients with STB (S<sup>+</sup>) with patients without STB (S<sup>-</sup>) and healthy controls (HC) to examine the effects of STB on risky behavior. Therefore, group comparison, instead of dimensional measure of suicidal symptoms by Beck Scale for Suicidal Ideation, can answer our research questions directly.

      To enhance consistency between the clinical and replication datasets, we included all participants in each dataset when performing the mediation analysis. Given that S<sup>-</sup> and HC did not differ in gambling behavior or the approach parameter in the clinical dataset, we merged these two groups. In the replication dataset, to mirror the S<sup>+</sup> vs. S<sup>-</sup> contrast used clinically, we categorized the general sample into S+ and S<sup>-</sup> based on BDI item 9. The mediation results remained significant in both datasets (the clinical dataset: a×b = 0.321, 95% CI = [0.070, 0.549], p = 0.016; the replication dataset: a×b = 0.143, 95% CI = [0.016, 0.288], p = 0.031), suggesting that STB is associated with increased risk behavior via stronger approach motivation.

      We also acknowledge the non-replication of the correlation between gambling behavior and mood sensitivity to certain rewards in the online sample. While this pattern might indicate that the link is specific to suicidal patients, it may also reflect sample-specific or unstable effects; thus, we now state this explicitly and interpret the finding with caution. Please see our revisions below:

      Page 15:

      “We next verified our results in an independent dataset, including the same task and BDI questionnaire in 747 general participants (500 females; age: 20.90±2.41) (46). One item in BDI involves the measurement of STB. In item 9 of BDI, participants chose one option that describes them best: Option 1, “I don't have any thoughts of killing myself.”; Option 2, “I have thoughts of killing myself, but I would not carry them out.”; Option 3, “I would like to kill myself.”; Option 4, “I would kill myself if I had the chance.”. In line with the current definition of S<sup>+</sup>/S<sup>-</sup> in the clinical dataset, we identified S<sup>+</sup> group as choosing Option 2, 3, or 4, while participants selecting Option 1 were categorized as S<sup>-</sup> group.”

      Page 19:

      “Given significant correlations between group, approach parameter, and gambling rate for gain trials (ps < 0.017), we further conducted a mediation analysis with the assumption of the mediating effect of approach motivation of suicidality on the risk behavior. Given that we aimed to test the effect of STB, with S<sup>-</sup> and HC as controls, and given that S<sup>-</sup> and HC did not differ in gambling behavior or in the approach parameter, we merged these two groups for the mediation analysis. Results supported our hypothesis (a×b = 0.321, 95% CI = [0.070, 0.549], p = 0.016; Figure 2C), confirming that suicidal thoughts and behavior increase risk behavior through stronger approach motivation.”

      Page 26:

      “However, we did not observe any significant correlation between mood sensitivity to CR and gambling behavior (ps > 0.389), which suggests that the link between mood sensitivity to CR and gambling behavior may be specifically observable in suicidal patients. Alternatively, this non-replicated result may also reflect sample-specific or unstable effects, which needs to be interpreted with caution.”

      (5) In interpreting their results, the authors use terms such as "motivation" (line 594) or "risk attitude" (line 606) that are not clear. In particular, how was risk attitude operationalized in this task? Is a bias for risky rewards not indicative of risk attitude? I ask because the claim is that "we did not observe a difference in risk attitude per se between STB and controls". However, it seems that participants with STB chose the risky option more often, so why is there no difference in risk attitude between the groups?

      Thank you for pointing out the ambiguity. In our manuscript, “motivation” and “risk attitude” are defined at the computational level. Following prior work with this task Rutledge et al., (2015, 2016), we decompose observed gambling into (i) value-dependent valuation parameters that capture risk attitude (e.g., risk aversion and loss aversion, which scale the subjective value of outcomes), and (ii) value-insensitive, valence-dependent biases that capture approach/avoidance motivation. Accordingly, a higher gambling rate does not imply a change in risk attitude per se: it can arise from an increased value-insensitive approach bias even when risk-attitude parameters are comparable between groups—which is what we observe for S<sup>+</sup> vs. controls. We have clarified this point in the computational modeling section.

      Pages 12-13:

      “Please note that a higher gambling rate does not imply a change in risk attitude per se: it can arise from an increased value-insensitive approach bias even when risk-attitude parameters are comparable between groups. Risk attitude is indeed conceptualized in economics as the curvature of the utility function (i.e., the subjective value) of the objective outcomes, with concave curves associated with risk aversion, and convex curves associated with risk seeking (54,56). By contrast, the approach or avoidance bias apply to all the value. A possible interpretation of the approach bias is that participant approach the option with the highest possible gain (the lottery) in the gain frame; the avoidance bias would then reflect a tendency to systematically avoid the highest potential losses (the lottery) in the loss frame.”

      Reviewer #2 (Public review):

      Summary:

      This article addresses a very pertinent question: what are the computational mechanisms underlying risky behaviour in patients who have attempted suicide? In particular, it is impressive how the authors find a broad behavioural effect whose mechanisms they can then explain and refine through computational modeling. This work is important because, currently, beyond previous suicide attempts, there has been a lack of predictive measures. This study is the first step towards that: understanding the cognition on a group level. This is before being able to include it in future predictive studies (based on the cross-sectional data, this study by itself cannot assess the predictive validity of the measure).

      Strengths:

      (1) Large sample size.

      (2) Replication of their own findings.

      (3) Well-controlled task with measures of behaviour and mood + precise and well-validated computational modeling.

      Weaknesses:

      I can't really see any major weakness, but I have a few questions:

      (1) I can see from the parameter recovery that the parameters are very well identified. Is it surprising that this is the case, given how many parameters there are for 90 trials? Could the authors show cross-correlations? I.e., make a correlation matrix with all real parameters and all fitted parameters to show that not only the diagonal (i.e., same data is the scatter plots in S3) are high, but that the off-diagonals are low.

      Thank you for raising these thoughtful concerns. The current task consisted of 90 choices and 36 mood ratings. There were 5 choice parameters and 4 mood parameters. The apparently strong identifiability is not unexpected, as 90 choice trials and 36 mood ratings are comparable to those in prior computational modeling literature (Blain & Rutledge, 2022).

      As suggested, we computed cross-correlations between all generating (“true”) and recovered (“fitted”) parameters. The resulting matrix showed high diagonal (choice winning model: rs > 0.91; mood winning model: rs > 0.90) and low off-diagonal (choice winning model: abs(rs) < 0.63; mood winning model: abs(rs) > 0.40) correlations, further supporting parameter recovery. Please see our clarifications below:

      Supplementary Pages 2-3:

      “Parameter recovery: Figure S3 shows good parameter recovery for both choice and mood winning model (choice: rs > 0.91, ps < 0.001; intraclass coefficients > 0.78; mood: rs > 0.90, ps < 0.001; intraclass coefficients > 0.86). Moreover, we computed cross-correlations between all generating (“true”) and recovered (“fitted”) parameters. The resulting matrix showed high diagonal (choice winning model: rs > 0.91; mood winning model: rs > 0.90) and low off-diagonal (choice winning model: abs(rs) < 0.63; mood winning model: abs(rs) > 0.40) correlations, further supporting parameter recovery.”

      Page 10:

      “The numbers of choice trials and mood ratings were comparable to those in prior computational modeling studies (34,35).”

      (2) Could the authors clarify the result in Figure 2B of a correlation between gambling rate and suicidal ideation score, is that a different result than they had before with the group main effect? I.e., is your analysis like this: gambling rate ~ suicide ideation + group assignment? (or a partial correlation)? I'm asking because BSI-C is also different between the groups. [same comment for later analyses, e.g. on approach parameter].

      Thank you for pointing out the lack of clarity. We performed group difference analysis and correlation of suicidal ideation analysis, separately. We first performed group difference analysis to test our hypothesis of STB effects. We then conducted correlational analysis to further specify our findings.

      (3) The authors correlate the impact of certain rewards on mood with the % gambling variable. Could there not be a more direct analysis by including mood directly in the choice model?

      Thank you for this insightful suggestion. As suggested, we tried to integrate mood into choice models by adding mood bias component(s) in line with previous literature (Vinckier et al., 2018). The first model (mcM1) assumes that mood biases choice, building on cM3 (the winning choice model). cmM2 further separated the mood bias parameter into two components according to participants’ choices.

      However, model comparison using BIC supported cM3 (Table S6), that is, without consideration of mood in choice modeling. This can be due to the lack of block design in our experimental design unlike e.g., Vinckier et al., (2018) and Eldar & Niv, (2015). Please see our clarifications below:

      Supplementary Pages 3-4:

      “Supplementary Note 6: integration of mood into choice models

      Although we modeled choice and mood separately to examine cognitive and affective mechanisms underlying increased risk behavior in adolescent suicidal patients, one interesting question was whether mood responses influence subsequent gambling choices and how to model them. First, we median-split mood responses (except the final rating) to compare gambling rate. Results showed a trend for less gambling rate in higher mood (t = -1.971, p = 0.050). However, there was no significant group difference (F = 0.680, p = 0.507). Second, with the assumption that mood biases choice, we constructed mcM1 based on cM3 (the winning choice model).

      Based on our finding of the negative correlation between mood sensitivity to certain rewards and gambling rate in S<sup>+</sup>, we separated β<sub>Mood</sub> parameter into β<sub>Mood-CR</sub> and β<sub>Mood-GR</sub> (cmM2).

      Model comparison using BIC supported cM3 (Table S6), that is, without consideration of mood in choice modeling. The mood bias parameters in neither cM2 nor cM3 reached significance (ps > 0.091), which may be due to the absence of a blocked design in our experiment, unlike in Vinckier et al. (2018) and Eldar and Niv (2015).”

      (4) In the large online sample, you split all participants into S+ and S-. I would have imagined that instead, you would do analyses that control for other clinical traits. Or, for example, you have in the S- group only participants who also have high depression scores, but low suicide items.

      Thank you for this insightful suggestion. Following prior suicide-related literature (Tsypes et al., 2024), we controlled for depression by including them as covariates. Note that depression scores were derived from our established bifactor model (Wang et al., 2025), which decomposed depression from the anxiety. These results remained largely significant (ps ≤ 0.050), except a marginally significant effect of group on gambling behavior (p = 0.059). Despite a trend, this effect with covariates of depression-related questionnaires is strong in our clinical cohort (p = 0.024; Table S8). This suggests that the link between suicidality and risky behavior persists above and beyond general depressive symptoms.

      Please see our clarifications below:

      Page 26:

      “After controlling for depression severity using our established bifactor model (see ref 60 for details), these results remained significant (ps ≤ 0.050), except a marginally significant effect of group on gambling behavior (p = 0.059). Despite a trend, this effect with covariates of depression-related questionnaires is strong in our clinical cohort (p = 0.024; Table S8). This suggests that the link between suicidality and risky behavior persists above and beyond general depressive symptoms.”

      Reviewer #3 (Public review):

      This manuscript investigates computational mechanisms underlying increased risk-taking behavior in adolescent patients with suicidal thoughts and behaviors. Using a well-established gambling task that incorporates momentary mood ratings and previously established computational modeling approaches, the authors identify particular aspects of choice behavior (which they term approach bias) and mood responsivity (to certain rewards) that differ as a function of suicidality. The authors replicate their findings on both clinical and large-scale non-clinical samples.

      (1) The main problem, however, is that the results do not seem to support a specific conclusion with regard to suicidality. The S+ and S- groups differ substantially in the severity of symptoms, as can be seen by all symptom questionnaires and the baseline and mean mood, where S- is closer to HC than it is to S+. The main analyses control for illness duration and medication but not for symptom severity. The supplementary analysis in Figure S11 is insufficient as it mistakes the absence of evidence (i.e., p > 0.05) for evidence of absence. Therefore, the results do not adequately deconfound suicidality from general symptom severity.

      Thank you for this important comment. Based on clinical interviews, we included patients with and without suicidality (S<sup>+</sup> and S<sup>-</sup> groups). However, in line with suicidal-related literature (e.g., Tsypes et al., 2024), two groups also differed substantially in the severity of symptoms (see Table 1). To address the request for evidence on specificity to suicidality beyond general symptom severity, we performed separate linear regressions to explain in gambling behaviour, value-insensitive approach parameter (β<sub>gain</sub>), and mood sensitivity to certain rewards (β<sub>CR</sub>) with group as a predictor (1 for S<sup>+</sup> group and 0 for S<sup>-</sup> group) and scores for anxiety and depression as covariates. Results remained significant after controlling anxiety and depression (ps < 0.027; Table S8). Given high correlations among anxiety and depression questionnaires (rs > 0.753, ps < 0.001), we performed Principal Components Analysis (PCA) on the clinical questionnaire to extract the orthogonal components, where each component explained 86.95%, 7.09%, 3.27%, and 2.68% variance, respectively. We then performed linear regressions using these components as covariates to control for anxiety and depression. Our main results remained significant (ps < 0.027; Table S9). We believe that these analyses provide evidence that the main effects on gambling and on mood were specific to suicide.

      As pointed out, these “absence of evidence” cannot provide insights of “evidence of absence”. Although we median-split patients by the scores of general symptoms (e.g., depression and anxiety-related questionnaires) and verified no significant differences in these severities (Figure S11), we additionally conducted Bayesian statistics in gambling behavior, value-insensitive approach parameter, and mood sensitivity to certain rewards. BF<sub>01</sub> is a Bayes factor comparing the null model (M<sub>0</sub>) to the alternative model (M₁), where M<sub>0</sub> assumes no group difference. BF<sub>01</sub> > 1 indicates that evidence favors M<sub>0</sub>. As can be seen in Table S7, most results supported null hypothesis, suggesting that general symptoms of anxiety and depression overall did not influence our main results. Overall, we believe that these analyses provide compelling evidence for the specificity of the effect to suicide, above and beyond depression and anxiety.

      Please see our revisions below:

      Page 17:

      “Within patients, this group effect on gambling rate remained significant after controlling for sex, illness duration, family history, diagnosis, and various medications use (ps < 0.05), as well as general symptoms (e.g., depression and anxiety; p = 0.024; also see Figure S11, Table S7 and Table S8). Given high correlations among anxiety and depression questionnaires (rs > 0.753, ps < 0.001), we performed Principal Components Analysis (PCA) to extract main components, where each component explained 86.95%, 7.09%, 3.27%, and 2.68% variance, respectively. To further control for anxiety and depression, linear regression using these components as covariates revealed that the group effect on gambling rate remained significant (p = 0.024; Table S9).”

      Pages 18-19:

      “Within patients, this group effect on the approach parameter remained significant after controlling for sex, illness duration, family history, diagnosis, and various medications use (ps < 0.05), as well as general symptoms (e.g., depression and anxiety; p = 0.027; also see Figure S11, Table S7 and Table S8). Linear regression using PCA components as covariates revealed that the group effect on approach parameter remained significant (p = 0.027; Table S9).”

      Page 21:

      “Within patients, this group effect on βCR remained significant after controlling for gambling rate, earnings, mood-related outcome effect, mood drift effect, sex, illness duration, family history, diagnosis, and various medications use (ps < 0.032), as well as general symptoms (e.g., depression and anxiety; p = 0.001; also see Figure S11, Table S7 and Table S8). Linear regression using PCA components as covariates revealed that the group effect on this mood parameter remained significant (p = 0.001; Table S9).”

      (2) The second main issue is that the relationship between an increased approach bias and decreased mood response to CR is conceptually unclear. In this respect, it would be natural to test whether mood responses influence subsequent gambling choices. This could be done either within the model by having mood moderate the approach bias or outside the model using model-agnostic analyses.

      Thank you for this important suggestion. As suggested, one interesting question was whether mood responses influence subsequent gambling choices and how to model them. First, we median-split mood responses (except the final rating) to compare gambling rate. Results showed a trend for less gambling rate in higher mood (t = -1.971, p = 0.050). However, there was no significant group difference (F = 0.680, p = 0.507). Second, with the assumption that mood biases choice, we constructed mcM1 based on cM3 (the winning choice model). Based on our finding of the negative correlation between mood sensitivity to certain rewards and gambling rate in S<sup>+</sup>, we separated β<sub>Mood</sub> parameter into β<sub>Mood-CR</sub> and β<sub>Mood-GR</sub> (cmM2). Model comparison using BIC supported cM3 (Table S6), that is, without consideration of mood in choice modeling. This can be due to the lack of block design in our experimental design unlike e.g., Vinckier et al., (2018) and Eldar & Niv, (2015). Please see Supplementary Pages 3-4:

      (3) Additionally, there is a conceptual inconsistency between the choice and mood findings that partly results from the analytic strategy. The approach bias is implemented in choice as a categorical value-independent effect, whereas the mood responses always scale linearly with the magnitude of outcomes. One way to make the models more conceptually related would be to include a categorical value-independent mood response to choosing to gamble/not to gamble.

      We apologise for the unclear statement. The approach bias is implemented in choice as a continuous value-independent effect, ranging from -1 to 1.

      It was true that the mood responses always scale with the magnitude of outcomes, since mood ratings were request after the outcomes. Therefore, mood parameters and the approach bias were both continuous.

      We also attempted to integrate mood into choice modelling. See Response 2 for Reviewer 3 for details.

      (4) The manuscript requires editing to improve clarity and precision. The use of terms such as "mood" and "approach motivation" is often inaccurate or not sufficiently specific. There are also many grammatical errors throughout the text.

      Thank you for this important suggestion. We have now explained motivation and mood in the Introduction section and the computational modeling section. Please see our clarifications below:

      Pages 3-4:

      “A growing literature indeed shows that risky behavior can be far better explained after adding value-insensitive approach and avoidance components to prospect theory(18,19), that is by including a decision bias in favor of the highest gain (approach) and another decision bias against the lowest loss (avoidance), above and beyond options value difference. This class of models highlights the important role of value-insensitive motivational components in decision making in addition to risk attitude-driven valuation (e.g., loss/risk aversion)(20).”

      Page 5:

      “Although mood is thought to persist for hours, days, or even weeks(30-33), momentary mood, measured over the timescale in the laboratory setting, represents the accumulation of the impact of multiple events at the scale of minutes(30,32,34-38). Momentary mood external validity is demonstrated e.g., through its association with depression symptoms(37). Mood is different from emotions, which reflect immediate affective reactivity and is more transient (e.g., from surprise to fear)(31-33,39).”

      We have corrected grammatical errors throughout the manuscript.

      5) Claims of clinical relevance should be toned down, given that the findings are based on noisy parameter estimates whose clinical utility for the treatment of an individual patient is doubtful at best.

      Thank you for this comment. We agree that we did not evaluate the noise in our estimate e.g., by assessing the test-retest reliability on the task parameters, which is outside the scope of the study, and it is indeed possible that parameter estimate is somehow noisy. Therefore, we tone down the clinical relevance of our results. Please see our revision below:

      Page 32:

      “Next, we did not evaluate the noise in our estimate e.g., by assessing the test-retest reliability on the task parameters and it is indeed possible that parameter estimate is somehow noisy.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Title: I believe "aberrant mood dynamics" is both too general and overstating the results of this study, which did not measure mood dynamics longitudinally. "Aberrant" is also overly pathologizing. I would suggest sticking more directly to the results, for instance, "Insensitivity of momentary mood to non-risky rewards in adolescent suicidal patients".

      Thank you for this suggestion. We have now corrected it.

      (2) Abstract: in line 61, "Our study uncovers the cognitive and affective mechanisms" suggests that these are the only ones, and you uncovered them. Of course, there could be more mechanisms contributing to risk behavior in STB, so I would suggest removing the word "the" or adding "one of the".

      Thank you for this suggestion. We have now corrected it.

      (3) One major weakness of this study is that suicidal thoughts and behaviors were not assessed via a clinical instrument such as the Columbia Suicide Severity Rating Scale - this should be mentioned upfront.

      Thank you for this comment. According to medical records and information from family and friends by the researcher and psychiatrists, patients with suicidal thoughts and behaviors were categorized as suicidal group (S<sup>+</sup>), while patients without suicidal thoughts and behaviors were identified as control group (S<sup>-</sup>). Note that medical records and information were recorded from clinical interviews where the psychiatrists were vigilant for signs of suicidal ideation and inquired about suicidal-related thoughts and behaviors from both the patients and their families. Therefore, the current group operation was possibly comparable to Columbia Suicide Severity Rating Scale.

      (4) Table 1: female/male are sex, not gender (gender is man/woman/transgender/non-binary).

      Thank you for this suggestion. We have now corrected it.

      (5) Equation 1: It would be good to clarify what happens in gain-only or loss-only trials (the other value is then 0, but this can be clarified as it is not technically a loss or a gain).

      Thank you for this suggestion. We have now corrected it. Please see below for our revision:

      Page 12:

      “Please note that V<sub>gain</sub> is 0 in gain trials and V<sub>loss</sub> is 0 in loss trials.”

      (6) Figure 1E: The model prediction is not informative here. Given the linear regression model, there is no other option except that the mean prediction would overlap with the mean empirical measurement (unless the model was specified incorrectly). The same is true in Figure 2A.

      Thank you for this suggestion. We have now removed plots for model prediction.

      (7) Figure 1G: There was no analysis of the differences between groups in terms of earnings, given that the ANOVA was not significant. Still, if the claim is that risky behavior is sometimes suboptimal in this task, it would be good to show that there is a correlation between, say, symptoms of STB across groups and 1) risky behavior and 2) earnings.

      Thank you for this insightful comment. In the patient cohort, risky behavior (gambling rate)—but not earnings—predicted the current suicidal ideation score (BSI-C, β = 9.189, t = 2.004, p = 0.048; earnings, β = 0.001, t = 0.582, p = 0.562). The lack of association for earnings is consistent with the task design, in which there is no stable optimal policy and payouts are only a coarse proxy for decision quality. Future work in learning paradigms, where optimality is well defined, may be better suited to test earnings-based links to STB. We have clarified this point below:

      Page 32:

      “Second, although we assumed that increased risky behavior in STB was suboptimal, the current task was not suited to test this, given the task design of random feedback for gambling option. Future work in learning paradigms, where optimality is well defined, may be better suited to test earnings-based links to STB.”

      (8) Line 290: "beta_gain: -1-1" is unclear. I believe you meant beta_gain \in [-1,1].

      Thank you for this suggestion. We have now corrected it to make it clear.

      (9) The gain and loss biases are modeled as minimum and maximum probabilities for choosing the gamble. This is a legitimate choice for value-agnostic biases, but it is not the traditional choice (as far as I know). I wonder if the same results would hold with the more traditional formulation of the bias as an added constant to the utility of the gamble, i.e., p(gamble) = 1/(1+ exp(-mu(U_gamble + beta_gain - U_certain)). I believe in this case, you would also not have to specify different equations for positive or negative biases, or to limit the bias to the range of [-1,1] (indeed, the bias would be in reward-equivalent units).

      Thank you for this suggestion. The winning choice model we used here was consistent with previous literature (Rutledge et al., 2015 & 2016), which decomposed the decision process into risk-attitude-driven valuation (e.g., loss and risk aversion) and value-insensitive motivational components. These approach/avoidance parameters are a decision bias in favor of the highest gain (approach) and another decision bias against the lowest loss (avoidance), above and beyond options value difference.

      As suggested, we also compared the traditional bias choice model. Model comparison did not support this. Please see our revision below:

      Supplementary Page 4:

      “We also considered the traditional bias parameter (cM4), rather than approach/avoidance parameters. We limited the bias to the range of [-100, 100], which was in reward-equivalent units.

      However, model comparison did not support cM4 (Table S6).”

      (10) Also, for equations 5-8, it seems that 5-6 are identical to 7-8 except for the use of beta_gain versus beta_loss. You might want to consider simplifying by putting beta in the equations and specifying in the text that, depending on the trial type (loss or gain), the relevant beta is used.

      Thank you for this suggestion. We have now simplified it. Please see response to Reviewer 2, point 3.

      (11) It is not clear what equations are applied to mixed trials in cM3.

      Sorry for the confusion. We have now clarified this point.

      Page 12:

      “Approach/avoidance parameters are not applied to in mixed trials.”

      (12) Model comparison: the mood models are nested within each other (e.g., mM3 can be derived from mM1 by setting beta_EV = beta_RPE). In this case, model comparison can use the likelihood ratio test instead of BIC, which can be too conservative (and therefore does not support the extra beta parameter for RPE, different from previous results in the literature). I wonder if a likelihood ratio test would lead to results more in line with previous findings with this task?

      Thanks for this suggestion. We agree that mM1 (CR+EV+RPE) and mM3 (CR+GR) are nested. However, our model space also included unnested models, such as mM5 (CR+GR<sub>better</sub>+GR<sub>worse</sub>). Therefore, it was not reasonable in our model space to use likelihood ratio tests.

      (13) Line 346: The replication sample is described as "healthy participants," however, their health (or mental health) status was not assessed, and they may as well have mental health concerns. I would suggest calling this a general sample or an undifferentiated sample - but not a healthy sample.

      Sorry for the confusion. We have now corrected this phrase.

      (14) Line 363: "in addition to the replication of previous findings in the validation dataset" is unclear. Are those tests not two-tailed?

      Sorry for the unclear statement. In the replication analyses, we used one-tailed t-tests because the direction of the effect was revealed on the clinical dataset. Please see our clarification below:

      Page 15:

      “For the replication of previous findings in the validation dataset, we used one-tailed tests in line with our clinically motivated directional hypothesis.”

      (15) Line 372: "validating our group manipulation" - the presented work does not have a manipulation. Maybe you meant "validating our grouping of participants"?

      Thank you for this suggestion. We have now corrected it to make it clear.

      (16) Figure 2B: It is not clear how the data were binned for illustration purposes only, and why this binning is necessary (I have not seen it in other papers) - presenting the data from each subject and the correlation line with error margins (as is done here) should be sufficient.

      Thank you for flagging this. For illustration only, we binned the data proportional to group sizes: in the patient sample (S<sup>-</sup> n = 25; S<sup>+</sup> n = 58; ≈1:2), we displayed 3 bins for S<sup>-</sup> and 6 bins for S<sup>+</sup>. We agree that binning is not necessary; all statistics were computed on raw, unbinned data. The binned panel was included solely for visualization, consistent with our prior work (Blain et al., 2023).

      (17) Table 2: delta BIC should be presented per subject (that is, divided by the number of subjects in each group), as the groups are of different sizes, so as presented now, the columns are not comparable across groups.

      Thank you for the helpful suggestion. Our goal in Table 2 is not to compare ΔBIC magnitudes across groups, but to identify the winning model within each group. The ΔBICs are aggregated at the group level solely to rank models for that group. Dividing by the number of participants would rescale each group’s column by a constant and would therefore not affect the within-group ranking or the conclusion that cM3 is the best model in all groups. For this reason, we retain the current presentation and interpret each column within group rather than across groups.

      (18) Line 640 - the effect of expectations and prediction errors on mood was not only shown in healthy people, but also in people with depression (Rutledge et al., 2007, https://pubmed.ncbi.nlm.nih.gov/28678984/)

      Thank you for this comment. Indeed, Rutledge et al., (2017) showed evidence for CR+EV+RPE mood model in adult people with depression. However, our study recruited adolescents with depression or anxiety, given that adolescent period might provide a developmental window for opportunities for early intervention of suicidality. Therefore, it is also possible that the current winning model was specific to adolescents. Please see our clarifications below:

      Page 28:

      “It is also possible that the current winning model was specific to adolescents. Given that Rutledge et al., (2017) supported the “CR-EV-RPE model” in adults with depression, our study with adolescent populations may suggest a developmental change for mood sensitivities.”

      (19) Supplemental material: Is the R2 section about R-squared? Perhaps you can use superscript on the 2 to make that clearer? For Figure S2, how was model recovery determined? Should I interpret the confusion matrix as suggesting that the winning model for each and every simulated subject was the generating model, or was the winning model determined for the whole simulated population in each of the 100 simulations? Traditionally, confusion matrices use the former measure, but the results of 100% recoverability make me suspect the latter was used here. In Figure S3, should we not be looking at simulated parameters and recovered parameters? What are "real parameters" here?

      Thank you for these important comments. We now consistently denote the coefficient of determination as R<sup>2</sup> (with a superscript 2) throughout the manuscript and Supplementary Materials.

      For the model recovery analysis in Figure S2, we have clarified that the confusion matrix is computed at the population level. Specifically, for each of the 100 simulations we generated a full dataset under each candidate model, fit all models to that dataset, and selected the winning model based on group-level model evidence (BIC). Each cell in the confusion matrix therefore reflects the proportion of simulations in which model j was selected as the best-fitting model when the data were generated by model i. This operation was reasonable because the decision of the winning model is made on the population-level dataset rather than on individual subjects.

      In Figure S3, the term “real parameters” referred to the parameters used to generate the simulated data. To avoid confusion, we now relabel these as “simulated (generating) parameters” and explicitly describe the figure as showing the relationship between simulated (generating) parameters and recovered parameters. Please see our revisions below:

      Supplementary Pages 2-3:

      “Model recovery: We generated 100 simulated datasets for each model (3 choice models and 8 mood models) using the fitted parameters of each model as the ground truth. Each dataset contained 201 trials and included 3 (or 8) sets of simulated data corresponding to the respective models. For each simulated dataset, we then fit all models and determined the winning model at the population level based on group-level BIC, yielding a confusion matrix in which each entry represents the proportion of simulations in which model j was selected as the best-fitting model when the data were generated by model i. As shown in Figure S2, all models are highly identifiable, indicating excellent recovery performance for both the choice and mood models.”

      “Parameter recovery: Figure S3 shows good parameter recovery for both choice and mood winning model (choice: rs > 0.91, ps < 0.001; intraclass coefficients > 0.78; mood: rs > 0.90, ps < 0.001; intraclass coefficients > 0.86). Moreover, we computed cross-correlations between all generating (“generating”) and recovered (“fitted”) parameters. The resulting matrix showed high diagonal (choice winning model: rs > 0.91; mood winning model: rs > 0.90) and low off-diagonal (choice winning model: abs(rs) < 0.63; mood winning model: abs(rs) > 0.40) correlations, further supporting parameter recovery.”

      Typos:

      (1) Line 90: original → originate

      (2) Line 596-598 - the same phrase is repeated twice.

      (3) Line 616: on the other word → hand.

      Sorry for the mistakes. We have now corrected them throughout the manuscript.

      Reviewer #2 (Recommendations for the authors):

      For people unfamiliar with interpersonal theory or motivational-volitional model, or three-step theory (lines 105-106), could you briefly explain the key idea of mood and suicide before going to the decision-making tasks? And from this, maybe motivate the predictions in your task? In particular, in the abstract and introduction, the phrasing could be a bit more concise and simpler. In the abstract, sentences were sometimes quite long. In the introduction, some paragraphs are somewhat repetitive. In the discussion, there were some typos.

      Thank you for these suggestions. We have now explained the key idea of mood and suicide before going to the decision-making tasks in the introduction, which can be seen below:

      Pages 4-5:

      “Contemporary theories of suicide converge on the idea that STB is initially caused by low mood experience. The interpersonal theory of suicide proposes that suicidal desire arises when people simultaneously feel socially disconnected (“thwarted belongingness”) and like a burden on others (“perceived burdensomeness”), experiences that are tightly linked to chronically low mood(25). The motivational–volitional model(26) and the three-step theory(27,28) similarly emphasize that when negative mood and feelings of defeat or entrapment are experienced as inescapable, they can give rise to suicidal ideation, and that the progression from ideation to suicide attempts depends on additional factors such as reduced fear of death, increased pain tolerance, and a tendency to act impulsively under intense affect. Some official organizations, e.g., National Institute of Mental Health, have also listed mood problems as warning signals(8). Interestingly, within the framework of decision making under uncertainty, gambling on lotteries with a revealed outcome has been found to induce high mood variance(29), providing an opportunity to assess the relationship between deficient mood and increased gambling decisions in STB.”

      We have also refined the wording and corrected typos throughout the manuscript.

      Reviewer #3 (Recommendations for the authors):

      (1) Since many readers might only read the abstract, it is important that it is both informative and accurate. I have two suggestions in this respect. First, for the abstract to be more informative, it may be helpful to indicate already there that these are value-insensitive approach-avoidance parameters, in the sense that they favor/disfavor the gamble regardless of the potential outcomes' magnitude or probability. This issue is also present throughout the text, where the phrases "approach and avoidance motivation" are referred to as if they have established and precise computational definitions. In my view, these terms could just as easily be interpreted as parameters that multiply the value of potential gains or losses, which is not what the authors mean. It would be helpful to clarify this terminology.

      Thank you for these suggestions. In line with previous literature (Rutledge et al., 2015 & 2016), approach and avoidance motivation are indeed defined at the computational level, referring to a decision bias in favor of the highest gain (approach) and another decision bias against the lowest loss (avoidance), above and beyond options value difference. We have cited these papers in the manuscript. We also make it clear to further clarify approach and avoidance parameters in the abstract and introduction. Please see our revisions below:

      Page 2 (Abstract):

      “Using a prospect theory model enhanced with value-insensitive approach-avoidance parameters revealed that this rise in risky behavior resulted only from a heightened approach parameter in S<sup>+</sup>.Altogether, model-based choice data analysis indicated dysfunction in the approach system in S<sup>+</sup>, leading to greater propensity for gambling in the gain domain regardless of the lottery expected value.”

      Page 3 (Introduction):

      “A growing literature indeed shows that risky behavior can be far better explained after adding value-insensitive approach and avoidance components to prospect theory(18,19), that is by including a decision bias in favor of the highest gain (approach) and another decision bias against the lowest loss (avoidance), above and beyond options value difference. This class of models highlights the important role of value-insensitive motivational components in decision making in addition to risk attitude-driven valuation (e.g., loss/risk aversion)(20).”

      (2) The statement "our study uncovers the cognitive and affective mechanisms contributing to increased risk behavior in STB" is overstating the findings, as the study may have uncovered some contributing mechanisms, but likely not all of them. Removing the word "the" would fix this issue.

      Thank you for this suggestion. We have now corrected it.

      (3) Since mood is typically defined as lasting hours, it's inappropriate to refer to ratings that only reflect the last few trials as self-reports of mood. To be sure, I view the distinction between emotions and moods as quantitative, not qualitative, so I do not think there is a problem studying the former to understand the latter, but to avoid confusion, the terminology should follow common usage.

      Thank you for this suggestion. We follow previous work and operational definitions regarding mood (Rutledge et al., 2014, Eldar & Niv, 2015, Vinckier et al., 2018). Emotion is usually a very brief response to a specific stimulus (Emanuel & Eldar, 2023), e.g., leading to rapid changes like surprise then fear. In contrast, mood is defined as a diffuse state that is not specific to one stimulus. Here, we operationally and computationally define mood as an affective state reflecting the recent history of safe and gamble outcomes. We now clarify that point in the main text. Please see our revision below:

      Page 5:

      “Although mood is thought to persist for hours, days, or even weeks(30-33), momentary mood, measured over the timescale in the laboratory setting, represents the accumulation of the impact of multiple events at the scale of minutes(30,32,34-38). Momentary mood external validity is demonstrated e.g., through its association with depression symptoms(37). Mood is different from emotions, which reflect immediate affective reactivity and is more transient (e.g. from surprise to fear)(31-33,39).”

      (4) Line 78: The phrases "increase in risk attitude", "decrease in loss attitude", and "decrease in value-independent choice biases" are unclear to me in terms of their directionality. An attitude might be avoidant or embracing. If it is the former then increasing it would decrease risk-taking.

      Thank you for pointing out the ambiguity. We have now corrected them throughout the manuscript. Please see our revision below:

      Page 4:

      “We therefore hypothesized that heightened approach motivation, or weakened avoidance motivation, would account for increased risk behavior in STB.”

      (5) Line 125: I was not sure why one would expect the mood response to gamble-related quantities (EV and RPE) to be lower in STB and not higher.

      Sorry for the typo. We hypothesized that mood would respond more strongly to gambling-related quantities—expected value (EV) and reward prediction error (RPE)—in adolescents with STB than in controls, given prior evidence that STB is associated with greater risk-taking.

      (6) The text could use proofreading, as there are many typos. These are from the first 100 lines alone:

      a) Abstract: regardless the lotteries -> regardless of the lotteries'.

      b) Line 78: it remains whether.

      c) Line 80: can each -> each can.

      d) Line 90: may original from.

      Sorry for the mistakes. We have now corrected them throughout the manuscript.

      (7) The rationale for focusing on the S+ group for mood model comparison is incorrect. The purpose is to identify parameters that vary as a function of suicidality, and for that, the S- group is just as important.

      Thank you for this comment. We agree that the S<sup>-</sup> group is as important as the S<sup>+</sup> group. A direct comparison was complicated because the winning mood models differed (S<sup>+</sup>: mM3; S<sup>-</sup>: mM5; Table 3). To ensure comparability, we checked results from both model specifications (mM3 and mM5). The conclusions were convergent: mood sensitivity to certain rewards (CR) was lower in S<sup>+</sup> than in S<sup>-</sup> (see Fig. 3 for mM3 and Fig. S8 for mM5).

      (8) There appears to be a contradiction between the inclusion criteria, which include having experienced suicidal thoughts and behaviors, and the definition of the S- group as not having suicidality.

      Thank you for pointing out this mistake. The corrected version of inclusion criteria can be seen on Page 7:

      “Patients were included if they met the following criteria: 1) both the researcher and psychiatrists agreed on their group classification; 2) they had a current diagnosis of major depressive disorder (MDD; unipolar depression), generalized anxiety disorder (GAD), or bipolar disorder with depressive episodes (BD), confirmed by two experienced psychiatrists using the Structured Clinical Interview for DSM-IV-TR-Patient Edition (SCID-P, 2/2001 revision; see Supplementary Note 1 for details); 3) they were between 10 and 19 years of age; 4) they had no organic brain disorders, intellectual disability, or head trauma; 5) they had no history of substance abuse; 6) they had no experience of electroconvulsive therapy.”

      (9) It would be helpful to specify whether mood modeling was based on objective or subjective values, and why.

      Thank you for this helpful suggestion. We have now clarified whether mood modeling was based on objective or subjective values, and why. Specifically, we constructed two model families: one in which mood was driven by objective monetary outcomes (objective values) and one in which mood was driven by subjective values derived from each participant’s fitted choice model (subjective values). We then used the VBA_groupBMC function in the VBA toolbox to perform family-wise model comparison, with 8 candidate mood models within each family. Consistent with previous literature, the objective-value family provided a clearly superior fit to the data (exceedance probability, EP = 1.000). Based on this result and for parsimony, we report and interpret the mood modeling results from the objective-value family in the main text. We have clarified this point below:

      Supplement Pages 4-5:

      “Supplementary Note 9: Mood model comparison using subjective values.

      To identify whether mood modeling was based on objective or subjective values, we constructed two model families: one in which mood was driven by objective monetary outcomes (objective values) and one in which mood was driven by subjective values derived from each participant’s fitted choice model (subjective values). We then used the VBA_groupBMC function in the VBA toolbox (Daunizeau et al., 2014) to perform family-wise model comparison, with 8 candidate mood models within each family. Consistent with previous literature, the objective-value family provided a clearly superior fit to the data (exceedance probability, EP = 1.000).”

    1. eLife Assessment

      Using single-cell transcriptomic data from mouse inner ear hair cells, the authors compare for the first time gene expression across the four recognized hair cell types in adults, generating information fundamental to understanding hair cell relationships between the ancient vestibular compartment and the more recent cochlea. Among observed differences, compelling evidence is provided for the expression in vestibular hair cells but not cochlear hair cells of certain ciliary motility-related genes, suggesting that the kinocilium of vestibular hair cells may function as an active force generator to increase sensitivity.

    2. Reviewer #1 (Public review):

      Summary

      From transcriptomic comparisons of adult mouse cochlear and vestibular hair cells, Xu et al. provide a broad and well-organized overview of differences across 4 established hair cell types (2 cochlear and 2 vestibular). They go on to demonstrate the power of such analyses to provide functional insights by focusing on the differentiated expression of ciliary genes, building to the hypothesis that kinociliary motility occurs in adult vestibular hair cells.

      Background

      Cilia are prominent in sensory receptors, including vertebrate photoreceptors, olfactory neurons and mechanosensitive hair cells of the inner ear and lateral line. Cilia can be motile or nonmotile depending on their axonemal structure: motile cilia require dynein and the inner 2 singlet microtubules of the 9+2 array. Primary cilia, present early in development, are considered to have sensory functions and to be nonmotile (Mill et al., Nature Rev Gen 2023).

      In hair cells, the kinocilium anchors and polarizes the mechanosensitive hair bundle of specialized microvilli. The kinocilium matures from the primary cilium of a newborn hair cell; behind it the bundle of mechanosensory microvilli rises in a descending staircase of rows. During maturation of the mammalian cochlea, all hair cells lose the kinocilium, though not the associated basal body. The consensus for many years has been that most vertebrate kinocilia, and especially mammalian kinocilia, are nonmotile, based largely on the lack of spontaneous motility in excised mammalian vestibular organs, but also on the impression that the rare examples of spontaneous beating motility even in non-mammalian hair cells are associated with deterioration of the preparation (Rüsch & Thurm 1990).

      Strengths

      In comparing RNA expression across the 4 major types of mouse hair cells - 2 cochlear and 2 vestibular - Xu et al. provide rich data sets for exploration of structure-function differences between these highly specialized cell types. The revised paper significantly improves the organization, interpretation and readability of the presentation of overall findings. smFISH and immuno-staining back up key RNA data, and comparisons are made with published data.

      The ciliary motility focus of the rest of the paper is creative and highly interesting. The authors curated the ciliary genes into types associated with different aspects of beating motility, and also investigated the expression of genes typical of primary cilia, which are considered to have sensory and cell signaling functions and to be nonmotile. Their data justify suggesting a role for kinociliary motility (or force generation) in adult mammalian vestibular hair cells, in opposition to a long-held assumption. The results should stimulate investigation of the implications for mechanosensitivity.

      Weaknesses

      Data

      Functional data on kinocilia motility: The technical difficulty in making such measurements in small mouse hair bundles led the authors to work with bullfrog crista bundles. Though not extensively studied here, the ciliary motility shown is convincing. Mouse hair bundle motions are also shown but the evidence connecting the data to kinociliary motion are more suggestive than convincing. But the authors are not dogmatic about these data, and it is reasonable to show them.

      Interpretation

      The authors take the view that kinociliary motility is likely to be normally present but is rare in their observations because conditions are not right. But while others have described some (rare) kinociliary motility in fish organs (Rusch & Thurm 1990), they interpreted its occurrence as a sign of pathology. Indeed, in this paper, it is not clear what role kinociliary motility would play in mature hair bundles. The authors have added a discussion of this question in the revision.

      An underlying rationale for the hypothesis that ciliary motility manifests in mammalian vestibular hair cells seems to rest on the presence of the necessary mRNA and its contrasting absence in cochlear hair cells. Another way to look at this difference could be that evolution acted on cochlear hair cells to shed kinocilia as one of many changes to improve mechanosensitivity at much higher sound frequencies. In vestibular hair cells, kinociliary motion might be useful to enhance mechanostimulation in the developing vestibule (as suggested in this revision) and not so active in maturity. Nevertheless, with their scholarly analysis of the expression of ciliary genes, the authors make a significant argument for further investigation of when and why hair cell kinocilia show active motility.

    3. Reviewer #2 (Public review):

      Summary:

      In this study the authors compared the transcriptomes of the various different types of hair cells contained in the sensory epithelia of the cochlea and vestibular organs of the mouse inner ear. The analysis of their transcriptomic data lead to novel insights into the potential function of the kinocilium.

      Strengths:

      The novel findings for the kinocilium gene expression along with the demonstration that some kinocilia demonstrate rhythmic beating as would be seen for known motile cilia is fascinating. It is possible that perhaps the kinocilium known to play a very important role in the orientation of the stereocilia, may have a gene expression pattern that is more like a primary cilium early in development and later in mature hair cells more like a motile cilium. Since the kinocilium is retained in vestibular hair cells it makes sense that it is playing a different role in these mature cells than its role in the cochlea.

      Another major strength of this study which cannot be overstated is that for the transcriptome analysis they are using mature mice. To date there is a lot of data from many labs for embryonic and neonatal hair cells but very little transcriptomic data on the mature hair cells. They do a nice job in presenting the differences in marker gene expression between the 4 hair cell types. This information is very useful to those labs studying regeneration or generation of hair cells from ES cell cultures. One of the biggest questions these labs confront is what type of hair cell develop in these systems. The more markers available the better. These data will also allow researchers in the field to compare developing hair cells with mature hair cell to see what genes are only required during development and not in later functioning hair cells.

      Comments on revision:

      I am satisfied with the revision, the authors made an effort to incorporate the changes requested.

    4. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Weaknesses:

      (1) Data:

      (a) The main weakness in the data is the lack of functional and anatomical data from mouse hair bundles. While the authors compensate in part for this difficulty with bullfrog crista bundles, those data are also fragmentary - one TEM and 2 exemplar videos. Much of the novelty of the EM depends on the different appearance of stretches of a single kinocilium - can we be sure of the absence of the central microtubule singlets at the ends?

      Our single-cell RNA-seq findings show that genes related to motile cilia are specifically expressed in vestibular hair cells. This has not been demonstrated before. We have also provided supporting evidence using electrophysiology and imaging from bullfrogs and mice. Although no ultrastructural images of mouse vestibular kinocilia were provided in our study, transmission electron micrograph of mouse vestibular kinocilia has been published (O’Donnell and Zheng, 2022). The mouse vestibular kinocilia have a “9+2” microtubule configuration with nine doublet microtubules surrounding two central singlet microtubules. This finding contrasts with a previous study, which demonstrated that the vestibular kinocilia from guinea pigs lack central singlet microtubules and inner dynein arms, whereas outer dynein arms and radial spokes are present (Kikuchi et al., 1989). The central pair of microtubules is absent at the end of the bullfrog saccular kinocilium (Fig. 7A). We would like to point out that the dual identity of primary and motile cilia is not just based on the TEM images. The kinocilium has long been considered a specialized cilium, and its role as a primary cilium during development has been demonstrated before (Moon et al., 2020; Shi et al., 2022).

      In most motile cilia, the central pair complex (CPC) does not originate directly from the basal body; instead, it begins a short distance above the transition zone, a feature that already illustrates variation in CPC assembly across systems (Lechtreck et al., 2013). The CPC can also show variation in its spatial extent: for example, in mammalian sperm axonemes, it can terminate before reaching the distal end of the axoneme (Fawcett and Ito, 1965). In addition, CPC orientation differs across organisms: in metazoans and Trypanosoma, the CPC is fixed relative to the outer doublets, whereas in Chlamydomonas and ciliates it twists within the axoneme (Lechtreck et al., 2013). Such variation has been described in multiple motile cilia and flagella and is therefore not unique to vestibular kinocilia. What appears more unusual in our data is the organization at the distal tip, where a distinct distal head is present, similar to cilia tip morphologies recently described in human islet cells (Polino et al., 2023). Although this feature is intriguing, we interpret it primarily as a structural signature rather than as evidence for a specialized motile adaptation, and we have moderated our interpretation accordingly in the revision.

      (b) While it was a good idea to compare ciliary motility expression in published P2 datasets for mouse cochlear and vestibular hair cells for comparison with the authors' adult hair cell data, the presentation is too superficial to assess (Figure 6C-E; text from line 336) - it is hard to see the basis for concluding that motility genes are specifically lower in P2 cochlear hair cells than vestibular hair cells. Visually, it is striking that CHCs have much darker bands for about 10 motility-related genes.

      While these genes (e.g., Dynll1, Dynll2, Dynlrb1, Cetn2, and Mdh1) appear more highly expressed in P2 cochlear hair cells, they are not uniquely associated with the axoneme. For example, Dynll1/2 and Dynlrb1 are components of the cytoplasmic dynein-1 complex (Pfister et al., 2006), Cetn2 has multiple basic cellular functions beyond cilia (e.g., centrosome organization, DNA repair), and Mdh1 encodes a cytosolic malate dehydrogenase involved in central metabolic pathways such as the citric acid cycle and malate–aspartate shuttle. This contrasts with axonemal dyneins, which are uniquely required for cilia motility. To avoid ambiguity, we have marked such cytoplasmic or multifunctional genes with red asterisks in both Fig. 5G and Fig. 6D in the revised manuscript.

      Our comparison showed that key genes for motile machinery are not detected in cochlear hair cells. For example, Dnah6 and Dnah5 are not expressed in the P2 cochlear hair cells. Dnah6 and Dnah5 encode axonemal dynein and are part of inner and outer dynein arms. Importantly, we did not detect the expression of CCDC39 and CCDC40 in kinocilia of P2 cochlear hair cells. Furthermore, axonemal CCDC39 and CCDC40, the molecular rulers that organize the axonemal structure in the 96-nm repeating interactome were not detected in cochlear hair cells. We have revised the text to emphasize key differences.

      (2) Interpretation:

      The authors take the view that kinociliary motility is likely to be normally present but is rare in their observations because the conditions are not right. But while others have described some (rare) kinociliary motility in fish organs (Rusch & Thurm 1990), they interpreted its occurrence as a sign of pathology. Indeed, in this paper, it is not clear, or even discussed, how kinociliary motility would help with mechanosensitivity in mature hair bundles. Rather, the presence of an autonomous rhythm would actively interfere with generating temporally faithful representations of the head motions that drive vestibular hair cells.

      Spontaneous flagella-like rhythmic beating of kinocilia in vestibular HCs in frogs and eels (Flock et al., 1977; Rüsch and Thurm, 1990) and in zebrafish early otic vesicle (Stooke-Vaughan et al., 2012; Wu et al., 2011) has been reported previously. Based on Rüsch and Thurm (1990), spontaneous kinocilia motility occurred under non-physiological conditions and was interpreted as a sign of cellular deterioration rather than a normal feature. We speculate that deterioration under non-physiological conditions may lead to the disruption of lateral links between the kinocilium and the stereociliary bundle, effectively unloading the kinocilium and allowing it to move more freely. Additionally, fluctuations in intracellular ATP levels may contribute, as ciliary motility is highly ATP-dependent; when ATP is depleted, beating ceases. Similar phenomena have been documented in respiratory epithelia, where ciliary activity can temporarily pause. Nevertheless, the fact that kinocilia can exhibit spontaneous motility under these conditions indicates that they possess the motile machinery necessary for such beating. Irrespective of the condition, cilia without the molecular machinery required for motility will not be able to move.

      We agree with the reviewer that, based on the present data, it is difficult to know the functional role of kinocilia and whether the presence of such autonomous rhythm would interfere with temporal fidelity. Spontaneous bundle motion, driven by the active process associated with mechanotransduction, was observed in bullfrog saccular hair cells (Benser et al., 1996; Martin et al., 2003). We have revised the discussion to clarify this important point of the reviewer. Specifically, we will emphasize that our observations of ciliary beating in the ex vivo conditions may not reflect its properties in the mature in vivo context, but rather a byproduct of motile machinery clearly present in the kinocilia. We speculate that this machinery in mature hair cells could operate in a more subtle mode—modulating the rigor state of dynein arms or related axonemal structures to influence kinociliary mechanics and, in turn, bundle stiffness in response to stimuli or signaling cues. Such a mechanism could either enhance sensitivity or introduce filtering properties, thereby contributing to the fine control of mechanosensory function without compromising temporal fidelity. Future studies using loss-of-function approach will be needed to reveal the unexplored role(s) of kinocilia for vestibular hair cells in vertebrates.

      We note that spontaneous activity exits throughout nervous system. It allows the nervous system to maintain baseline activity and interpret signals. Retinal cells are spontaneously active even in the dark and spiral ganglion neurons also fire spontaneously. Spontaneous hair bundle motion driven by mechanotransduction-related mechanism has been observed in bullfrog saccular hair cells. So, it is unlikely that spontaneous kinocilia beating would interfere with generating temporally faithful representations.

      Could kinociliary beating play other roles, possibly during development - for example, by interacting with forming accessory structures (but see Whitfield 2020) or by activating mechanosensitivity cell-autonomously, before mature stimulation mechanisms are in place? Then a latent capacity to beat in mature vestibular hair cells might be activated by stressful conditions, as speculated regarding persistent Piezo channels that are normally silent in mature cochlear hair cells but may reappear when TMC channel gating is broken (Beurg and Fettiplace 2017). While these are highly speculative thoughts, there is a need in the paper for more nuanced consideration of whether the observed motility is normal and what good it would do.

      We thank the reviewer for these excellent suggestions. We agree that kinociliary motility could plausibly serve roles during development, for example by guiding hair bundle formation or by contributing to early mechanosensitivity and spontaneous neural activity before mature stimulation mechanisms are established. It is also possible that the motility machinery represents a latent capacity in mature vestibular hair cells that could be reactivated under stress or pathological conditions. We have revised the Discussion to address these possibilities and to provide a more nuanced consideration of whether the observed motility is normal and what potential functions it might serve.

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors compared the transcriptomes of the various types of hair cells contained in the sensory epithelia of the cochlea and vestibular organs of the mouse inner ear. The analysis of their transcriptomic data led to novel insights into the potential function of the kinocilium.

      Strengths:

      The novel findings for the kinocilium gene expression, along with the demonstration that some kinocilia demonstrate rhythmic beating as would be seen for known motile cilia, are fascinating. It is possible that perhaps the kinocilium, known to play a very important role in the orientation of the stereocilia, may have a gene expression pattern that is more like a primary cilium early in development and later in mature hair cells, more like a motile cilium. Since the kinocilium is retained in vestibular hair cells, it makes sense that it is playing a different role in these mature cells than its role in the cochlea.

      Another major strength of this study, which cannot be overstated, is that for the transcriptome analysis, they are using mature mice. To date, there is a lot of data from many labs for embryonic and neonatal hair cells, but very little transcriptomic data on the mature hair cells. They do a nice job in presenting the differences in marker gene expression between the 4 hair cell types. This information is very useful to those labs studying regeneration or generation of hair cells from ES cell cultures. One of the biggest questions these labs confront is what type of hair cells develop in these systems. The more markers available, the better. These data will also allow researchers in the field to compare developing hair cells with mature hair cells to see what genes are only required during development and not in later functioning hair cells.

      We would like to thank reviewer 2 for his/her comments and hope that the datasets provided in this manuscript will be a useful resource for researchers in the auditory and vestibular neuroscience community.

      Joint Recommendations for the authors:

      (1) Figure 1 - Explain how hair cell types are recognized after dissociation. Figure 1 will not be clear in this regard for non-aficionados. Some of the dissociated cells shown appear quite distorted and even unhealthy - e.g., the bottom right crista type II hair cell; the second from left crista type I hair cell; can you address why this doesn't matter for the purposes of this study?

      HC types in Fig. 1C were identified based on their morphological features: Type I HCs are flask-shaped with a narrow neck while type II HCs are cylindrical and short. We have replaced those cells with new images. In our study, HCs were identified based on their marker genes. Although some HCs such as those shown in Fig. 3C were impossible to avoid during preparation of single cells for library (most people did not examine their morphology), quality of mRNA and sequencing was high, better than those datasets published in previous studies.

      (2) Line 98 - Explain accessory cells (as opposed to supporting cells).

      We changed accessory cells to other cell types.

      (3) Line 246 - The primary cilium is...

      Changed.

      (4) Figure 6D - The scale bar is missing. Please use arrows to point to the genes you call out in the text. Also, the genes called out in the text as differently expressed (line 342) are quite faint bands in both cell types. It would be a service to the reader to point them out in the panel.

      A scale bar has been added. We also marked those genes as suggested and edited the text accordingly.

      (5) Figure 7 - mixes frog crista and mouse middle ear images with waveforms and FFTs from frog crista, mouse middle ear, and mouse crista. Related to these still images are 2 videos of frog kinocilium beating (2 hair cells). The mouse images must be underwhelming, or we would have been shown those, yet they were considered adequate to analyze.

      Yes, the spontaneous kinocilia motion of mouse crista HCs is very small. The peak motion is about 40 nm, which is very close to the resolution of our camera. That is why we used photodiode technique to detect its motion. Photodiode is more sensitive, and this technique allows us to observe dynamic response waveform.

      (6) I recommend labeling each figure panel with the tissue of origin to avoid confusion.

      Labeled as suggested.

      (7) I suggest dropping the mouse middle ear data, as they are not directly adequate as a positive control (or no more so than the more beautiful frog data).

      We keep the waveforms of middle ear cilia movement in Fig. 7. The main reason is that we would like to show the magnitude difference between airway cilia and kinocilia. The kinocilia movement was at least an order of magnitude less than the movement of airway cilia. This has led to our effort to generate a model to predict the 96-nm modular repeat and explain why kinocilia movement in mice is much smaller than airway cilia and bullfrog kinocilia.

      (8) Focus on the hair bundle motions:

      (a) Show the waveforms for the frog crista hair cells and their FFTs.

      These images were captured many years ago using camera. The kinocilia motion is between 5 and 10 Hz. We did not present any waveforms of kinocilia motion since we no longer have access to bullfrogs. However, although we did not present response waveforms, the videos are very powerful for visualization of kinocilia beat of bullfrog saccular HCs.

      (b) Find some way to show us how you measured the mouse hair bundle beating.

      Photodiode technique was used to measure spontaneous kinocilia motion in mice. More details are now included in the text.

      (c) Does EGTA break links between kinocilium and stereocilia? (Could that contribute to the higher beat frequency?) Just applying the same treatment and viewing from above could clarify whether kinocilia dissociate from stereocilia rows. This would likely be more straightforward with an otolith organ.

      All these links (tip links, side links) are vulnerable to Ca concentration and Ca-free medium is often used to break these links as shown in many previous studies. Breaking the kinocilia links leads to reduced load to the kinocilia, which may result in larger motion of the kinocilia. The frequency is inherent to motile machinery and subject to temperature and intracellular ATP concentration. When facing upward, the hair bundles in otolith organ do not have a good contrast against HCs in the background. This makes measurement of their motion difficult, especially when the motion is small and random and can’t be averaged to improve signal to noise ratio. Besides, unlike cochlear HCs whose hair bundles are short and can easily be oriented in parallel with light path, the long hair bundle of vestibular HCs is more difficult to orient and image. For these reasons, we chose to use crista hair bundles for our measurements since they can be oriented in perpendicular to the light path without interference from background HCs. The lateral motion of the entire bundle is also relatively easy to measure in this preparation.

      (6) Is there no reason to cite McInturff et al. (2018), given that they compared type I and II VHC transcriptomes at P12 and P100? This database is also available on gEAR.

      Their studies are now cited. We also compared their datasets with ours.

      (7) Line 374 - Eatock et al., 1998 citation does not work for this purpose. Eatock & Songer (2011) would be better, or Li, Xue, Peterson (2008): mouse utricle anatomy; significant discussion of relative heights of kinocilia and tallest stereocilia.

      Changed and cited.

      (8) In Figure 3, 2 of the 18 panels in B are missing labels.

      The bar, applied to all panels, was there at the bottom of Fig. 3B. The bar is bigger and more visible in the revision.

      (9) Line 187 should "Sppl1" be Spp1?

      Corrected.

      (10) Define BBSome on line 244.

      Added.

      (11) Looking at Figure 5, it seems that all the motile genes are expressed in the vestibular hair cells and not the cochlear hair cells. It is surprising that there are any cilia-related genes expressed in these adult cochlear hair cells, given that they do not retain their cilia into adulthood. Could the authors make a comment on this finding in the discussion? Also, are there any ciliopathies that show a vestibular defect but normal hearing in mice or humans? Have you compared the cilia-related gene expression in neonatal/embryonic vestibular hair cells to your dataset?

      There are many kinocilia related genes still expressing adult cochlear HCs. It is not surprising to see many kinocilia related genes in cochlear HCs. Most of these genes are related to primary cilia structure including the basal body and transporters in cilia. The basal body is still present in cochlear HCs. Many other primary cilia-related proteins are also expressed in soma, especially those related to signal transduction, microtubule cytoskeleton, actin cytoskeleton, vesicle transport, metabolic enzyme, protein folding, translation, nuclear transport, ubiquitination, RNA binding, mitochondrial proteins and transcription factors. Of course, some of them are vestigial. We added discussion of this in the text. Comparison between neonatal cochlear and vestibular was presented in Fig. 6D. We compared those genes related to the axonemal repeat (96 nm repeat complex). Due to quality of mRNA, the total genes and genes related to kinocilia detected in previous developmental studies were much less than our datasets. While we detected 112 out of 128 genes related to axonemal repeat, only 90 genes were detected in previous studies (Burns et al., 2015; McInturff et al., 2018). Therefore, we only compared neonatal cochlear and vestibular HCs using their datasets. As far as we know, no ciliopathies with vestibular defects but normal hearing have been reported in mice or humans. But we plan to use a Ccdc39 mutant mouse model to examine how loss of function of a key motile cilia signature gene would affect kinocilia motility and vestibular function.

      (12) How is "expression level" in the violin plots being calculated? Is this a measure of read count? The normalization is cursorily explained in the methods. Is this value comparable across genes? Did the authors switch to z-score by Figure 6?

      We dissected the auditory and vestibular sensory epithelia from the same groups of mice and prepared libraries and sequenced them at the same time. All parameters are the same. The violin Plots are based on values presented in Supplementary Table 1. Each dot in the plot reflects an aggregated number of reads across all cells for each gene. They are all normalized across different HC types and biological repeats. The details for normalization are now provided.

      (13) The authors comment on the 16/128 motile cilia axonemal repeat genes that are not expressed in the vestibular hair cells. Listing these somewhere may be helpful to the readers.

      We thank the reviewer for this helpful suggestion. Most of the 128 motile cilia axonemal repeat genes were listed in Figs 8C and S5, along with known loss-of-function mutations and ciliopathy associations identified in human diseases or observed in animal models. To improve clarity, we have now included Table S2, which provides the complete list of all 128 motile cilia axonemal repeat genes, including those not expressed in vestibular HCs.

      (14) Figure 5D needs some refinement. While the authors used databases, including CiliaCarta, SYSCILIA gold standard, and CilioGenics, to identify the primary cilia-related genes, they have included many genes that are not highly specific to primary cilia function (e.g., HSP90, HSPA8, DNAJA4, GNAS...). Perhaps the authors would be able to do a better job of specifically querying primary cilia function by using genes that are common to these three databases.

      We presented comparison and analysis based on three major cilia databases, which are generated from proteomics of cilia from different tissues/organisms. In addition, we have provided more comprehensive list of primary cilia-related genes in Fig. S2. While majority of cilia-related genes/proteins are highly conserved, some genes/proteins are tissue-/organism-specific. Majority of the genes presented in Fig. 5D of our manuscript are shared among all three databases. The cilium is a complex structure, composed of proteins for microtubule cytoskeleton, actin cytoskeleton, vesicle transport, metabolic enzyme, signaling, and protein folding. It also contains proteins for translation, nuclear transport, ubiquitination, RNA binding as well as mitochondrial proteins and transcription factors (https://ciliogenics.com/?page=Home). Proteins such as HSP90 and HSPA8 are important for protein folding. HSPA8 also functions as an ATPase in the disassembly of clathrin-coated vesicles during transport of membrane components through the cell. GNAS is part of a G protein complex that transmits signals. DNAJA4 is one of the high-confidence cilia proteins (mean score of 1.26, expression rank is 938). These proteins are detected in cilia according to CilioGenics (https://ciliogenics.com/?page=Home). These proteins are not highly specific to cilia and are expressed in soma as well. Most of these proteins for signaling such as WNT (Supplementary Fig. 2) are detected in both cilia and soma.

      (15) The authors state, "Furthermore, we observed robust spontaneous kinocilia motility in bullfrog crista HCs and small spontaneous bundle motion in mouse crista HCs." This statement should be moderated by acknowledging that this motility was observed in only some cells. The authors favor the hypothesis that the lack of motility in some crista HCs is due to depolarization or damage to the sample. The authors should also acknowledge the possibility that there may be cell-to-cell variability in the motility of the kinocilia.

      We address these issues in public review section. We modified the statement as suggested.

      (16) The first few pages of the Results section include many lists of genes. Readability may be improved if this is curtailed modestly.

      Changed as suggested. We removed comparison among different types of HCs and replotted Fig. 2B. This has reduced the number of genes mentioned in the text.

    1. eLife Assessment

      This important work delineates layered glucose-responsive neuropeptidergic mechanisms that regulate sugar intake. Using a combination of genetic, physiological, and behavioral experiments, the authors convincingly show that Hugin- and Allatostatin A-releasing neurons suppress sugar feeding by reducing the sensitivity of Gr5a-expressing gustatory neurons. They further demonstrate that Neuromedin U neurons share key physiological properties with fly Hugin neurons, highlighting conserved peptide functions across animal phyla.

    2. Reviewer #1 (Public review):

      This revised manuscript by Qin and colleagues delineates an important neural mechanism that suppresses the intake of sugar solution in response to internal glucose level (the "brake" mechanism for sugar consumption). They identified a three-step neuropeptidergic system that downregulates the sensitivity of sweet-sensing gustatory sensory neurons, primarily in response to elevated level of circulating glucose. First, neurons that release a neuropeptide Hugin (which is an insect homolog of vertebrate Neuromedin U (NMU)) are activated by a high concentration of hemolymph glucose, which is directly sensed by Hugin-releasing neurons in a cell-autonomous mechanism. Next, Hugin neuropeptides activate Allatostatin A (AstA)-releasing neurons via one of Hugin receptors, PK2-R1. Finally, the released AstA neuropeptide suppresses sugar response in sweet-sensing Gr5a-expressing gustatory sensory neurons through the AstA-R1 receptor. Suppression of sugar response in Gr5a-expressing neurons reduces fly's sugar intake motivation. They also found that NMU-expressing neurons in the ventromedial hypothalamus (VMH) of mice (which project to the rostal nucleus of the solitary tract (rNST)) are also activated by high concentration of circulating glucose, independent of synaptic transmission, and that injection of NMU reduces the glucose-induced activity in the downstream of NMU-expressing neurons in rNST. These data suggest that the function of Hugin neuropeptides in the fly is analogous to the function of NMU in the mouse.

      The authors have provided multiple lines of compelling evidence generated through rigorous and comprehensive experiments, which spans genetic abrogation, neuronal manipulation, pharmacology, and functional imaging. The authors are also receptive to the critiques and reframed the central message, such that their conclusions are soundly supported by the presented data. Importantly, the parallel study in mice adds a unique comparative perspective that makes the paper of interest to a wide range of readers.

    3. Reviewer #2 (Public review):

      Summary:

      The question of how caloric and taste information interact and consolidate remains both active and highly relevant to human health and cognition. The authors of this work sought to understand how nutrient sensing of glucose modulates sweet sensation. They found that glucose intake activates hugin signaling to AstA neurons to suppress feeding, which contributes to our mechanistic understanding of nutrient sensation. They did this by leveraging the genetic tools of Drosophila to carry out nuanced experimental manipulations, and confirmed the conservation of their main mechanism in a mammalian model. This work builds on previous studies examining sugar taste and caloric sensing, enhancing the resolution of our understanding.

      Strengths:

      Fully discovering neural circuits that connect body state with perception remains central to understanding homeostasis and behavior. This study expands our understanding of sugar sensing, providing mechanistic evidence for a hugin/AstA circuit that is responsive to sugar intake and suppresses feeding. In addition to effectively leveraging the genetic tools of Drosophila, this study further extends their findings into a mammalian model with the discovery that NMU neural signaling is also responsive to sugar intake.

      Weaknesses:

      The effect of Glut1 knockdown on PER in hugin neurons is modest in both fed and starved flies, suggesting that glucose intake through Glut1 may only be part of the mechanism. The authors address this in their discussion.

    4. Author Response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      In this revised manuscript, Qin and colleagues aim to delineate a neural mechanism that is engaged specifically in the sated flies to suppress the intake of sugar solution (the "brake" mechanism for sugar consumption). They identified a three-step neuropeptidergic system that downregulates the sensitivity of sweet-sensing gustatory sensory neurons in sated flies. First, neurons that release a neuropeptide Hugin (which is an insect homolog of vertebrate Neuromedin U (NMU)) are in active state when the concentration of glucose is high. This activation depends on the cell-autonomous function of Hugin-releasing neurons that sense hemolymph glucose levels directly. Next, the Hugin neuropeptides activate Allatostatin A (AstA)-releasing neurons via one of Hugin receptors, PK2-R1. Finally, the released AstA neuropeptide suppresses sugar response in sugar-sensing Gr5a-expressing gustatory sensory neurons through AstA-R1 receptor. Suppression of sugar response in Gr5a-expressing neurons reduces fly's sugar intake motivation. They also found that NMU-expressing neurons in the ventromedial hypothalamus (VMH) of mice (which project to the rostal nucleus of the solitary tract (rNST)) are also activated by high concentrations of glucose independent of synaptic transmission, and that injection of NMU reduces the glucose-induced activity in the downstream of NMU-expressing neurons in rNST. These data suggest that the function of Hugin neuropeptide in the fly is analogous to the function of NMU in the mouse.

      The shift of the narrative, which focuses specifically on the hugin-AstA axis as the "brake" on the satiety signal and feeding behavior, clarified the central message of the presented work. The authors have provided multiple lines of compelling evidence generated through rigorous experiments. The parallel study in mice adds a unique comparative perspective that makes the paper interesting to a wide range of readers.

      While I deeply appreciate the authors' efforts to substantially restructure the manuscript, I have a few suggestions for further improvements. First, there remains room for discussion whether the "brake" function of the hugin-AstA axis is truly satiety state-dependent. The fact that neural activation (Fig. Supp. 8), peptide injection (Fig. 3A, 4A), receptor knockdown (Fig. 3C,G, 4E), and receptor mutants (Fig. Supp. 10, 12) all robustly modulate PER irrespective of the feeding status suggests that the hugin-AstA axis influences feeding behaviors both in sated and hungry flies. Additionally, their new data (Fig. Supp. 13B, C) now shows that synaptic transmission from hugin-releasing neurons is necessary for completely suppressing feeding even in sated flies. If the hugin-AstA axis engages specifically in sated (high glucose) state, disruption of this neuromodulatory system is expected to have relatively little effect in starved flies (in which the "brake" is already disengaged).

      We thank the reviewer for pointing out this inconsistency. We have corrected this interpretation. Specifically:

      (1) We removed statements suggesting that the circuit is fully disengaged during starvation.

      (2) We now state that endogenous hugin activity is reduced during starvation, but the circuit retains modulatory capacity when experimentally perturbed.

      (3) The Discussion now emphasizes that the system operates as a state-modulated inhibitory tone rather than a strictly fed-state switch.

      We believe this revised framing resolves the discrepancy.

      In this context, it is intriguing that the knockdown of PK2-R2 hugin receptor modestly but consistently decreases proboscis extension reflex specifically in starved flies (Fig. 3D, H). The manuscript does not discuss this interesting phenotype at all. Given the heterogeneity of hugin-releasing neurons (Fig. Supp. 7), there remains a possibility that a subset of hugin-releasing neurons and/or downstream neurons can provide a complementary (or even opposing) effect on the feeding behavior.

      We agree that this is an important observation. Although the effect size is modest, it is reproducible and suggests that hugin signaling may not operate as a strictly linear pathway.

      To address this:

      (1) We added a paragraph in the Results acknowledging the PK2-R2-dependent phenotype.

      (2) We included a discussion noting the potential functional heterogeneity of hugin neurons.

      (3) The schematic model (now Figure Supplementary 17, previously Figure Supplementary 16) includes a dashed line indicating a possible parallel PK2-R2-dependent branch.

      Given these intriguing yet unresolved issues, it is important to acknowledge that whether this system is "selectively engaged in fed states to dampen sweet sensation (in Discussion)" requires further functional investigations. Consistent effects of manipulation of the hugin-AstA system across multiple experimental approaches underscores the importance of this molecular circuitry axis for controlling feeding behaviors. Moderation of conclusions to accommodate alternative interpretation of data will be beneficial for field to determine the precise mechanism that controls feeding behaviors in future studies.

      We fully agree with the reviewer. Our original description of the circuit as a “satiety brake” implied exclusive engagement in fed states, which is not strictly supported by the behavioral data. Although endogenous hugin activity is elevated under fed conditions (as shown by CaMPARI), experimental manipulations demonstrate that the circuit retains functional capacity to modulate feeding behavior across feeding states.

      To address this concern, we have:

      (1) Removed the term “satiety-specific brake” throughout the manuscript.

      (2) Reframed the circuit as a glucose-responsive, state-modulated inhibitory module.

      (3) Revised the Discussion to explicitly state that the hugin–AstA pathway biases sweet sensitivity according to circulating glucose levels rather than functioning as an on/off switch.

      (4) Substantially revised Supplementary Figure 17 to reflect graded modulation across metabolic states rather than binary state engagement.

      These changes better align our conclusions with the experimental observations.

      Reviewer #2 (Public review):

      Summary:

      The question of how caloric and taste information interact and consolidate remains both active and highly relevant to human health and cognition. The authors of this work sought to understand how nutrient sensing of glucose modulates sweet sensation. They found that glucose intake activates hugin signaling to AstA neurons to suppress feeding, which contributes to our mechanistic understanding of nutrient sensation. They did this by leveraging the genetic tools of Drosophila to carry out nuanced experimental manipulations, and confirmed the conservation of their main mechanism in a mammalian model. This work builds on previous studies examining sugar taste and caloric sensing, enhancing the resolution of our understanding.

      Strengths:

      Fully discovering neural circuits that connect body state with perception remains central to understanding homeostasis and behavior. This study expands our understanding of sugar sensing, providing mechanistic evidence for a hugin/AstA circuit that is responsive to sugar intake and suppresses feeding. In addition to effectively leveraging the genetic tools of Drosophila, this study further extends their findings into a mammalian model with the discovery that NMU neural signaling is also responsive to sugar intake.

      Weaknesses:

      The effect of Glut1 knockdown on PER in hugin neurons is modest in both fed and starved flies, suggesting that glucose intake through Glut1 may only be part of the mechanism.

      We agree that the modest PER phenotype suggests that Glut1-mediated glucose uptake represents one component of glucose sensing in hugin neurons. We have clarified this in the Discussion and now explicitly state that additional glucose-sensing mechanisms may contribute to hugin activation.

      Additionally, many of the manipulations testing the "brake" circuitry throughout the study show similar effects in both fed and starved flies. This suggests that the focus of the discussion and Supplemental Figure 16 on a satiety-specific "brake" mechanism may not be fully supported by the data.

      We fully agree that the previous framing overstated state specificity.

      As described above, we have:

      (1) Removed “satiety-specific brake” terminology.

      (2) Reframed the circuit as a glucose-responsive inhibitory module.

      (3) Revised the Discussion to explicitly acknowledge modulation across feeding states.

      (4) Updated the schematic model (Figure Supplementary 17, formerly Figure Supplementary 16) accordingly.

      Recommendations for the authors:

      Reviewing Editor (Recommendations for the authors):

      Both the reviewers and I agree that the conclusion about a "satiety-dependent" brake needs to be modified to discuss the phenotypes that are also observed under starved conditions. Reviewer 1 would further like to emphasize that the authors are not required to follow through with the specific recommendations suggested by them. Modifying the conclusion and Supplementary Figure 16 should suffice.

      We sincerely thank the Reviewing Editor for the clear guidance. We fully agree that our previous framing of the hugin–AstA circuit as a strictly “satiety-dependent” brake may have overstated the state specificity of the system.

      In response to this recommendation, we have:

      (1) Revised the Abstract, Results, and Discussion to moderate the conclusion and explicitly acknowledge the phenotypes observed under starved conditions.

      (2) Reframed the circuit as a glucose-responsive, state-modulated inhibitory module, rather than a satiety-exclusive brake.

      (3) Supplementary Figure 17 (formerly Figure Supplementary 16) has been substantially revised to illustrate graded modulation across metabolic states rather than binary engagement.

      We appreciate the clarification that no additional experiments were required and are grateful for the opportunity to improve the conceptual framing of our work.

      Please include full statistical reporting in the main manuscript (e.g., figure legends or results).

      We have revised all figure legends to include full statistical reporting.

      Reviewer #1 (Recommendations for the authors):

      By re-framing their finding as the "brake" mechanism on satiety-induced suppression of feeding behavior and sensitivity to sweet taste, the authors substantially improved the clarity of their findings and their significance. The additional data (Fig. Supp. 13B, C) allows "apple-to-apple" comparisons of behavioral data. I support the publication of this manuscript with no further experiments, although I have several suggestions for the text.

      As I write in the public review, I have a reservation on the authors' argument that hugin-AstA system is the "'satiety brake' - that is selectively engaged in fed states to dampen sweet sensation (lines 392-394)". Manipulation of both hugin system (Fig. 2C, Fig. 3A, C, D, G, Fig. Supp. 8A, C, Fig. Supp. 10A-C, Fig. Supp. 13B, C) and AstA system (Fig. 4A, E, Fig. Supp., 8C, D, Fig. Supp. 12A-C, Fig. Supp. 13D) all indicate that hugin-AstA system suppresses feeding regardless of the satiety state. Specifically, Fig. Supp. 13B shows that synaptic blockade does further increases PER, causing contradictions to authors' statements ("silencing hugin+ neurons led to enhanced sweet-driven feeding behavior (line 299-300)" and "...further silencing has little additional effect (line 402)"). The CaMPARI data (Fig. 1J) provides the link between the activity levels of hugin-releasing neurons and satiety state. However, the fact that eliminating hugin-AstA signal can promote further PER in starved flies suggests that this brake is not completely satiety-dependent. I ask authors to at least discuss this perceived discrepancy between their data and conclusions.

      Also, the authors' finding that PK2-R2 reduction actually suppresses PER specifically among starved flies (Fig. 3D, H), albeit with relatively small effect size, suggests that hugin-AstA axis is not a singular, linear pathway as authors suggest in Fig. Supp. 16. While delineating the PK2-R2-dependent pathway is beyond the scope of this study, at least a line of discussion would be helpful.

      Minor comments:

      (1) Fig. Supp. 8 (dTRPA1 activation of hugin and AstA neurons), and Fig. Supp. 13B-D (inhibition of hugin and AstA neurons) should be in the main figure given its relevance to the narrative of this manuscript.

      We agree with the reviewer regarding their importance. The key behavioral panels from these figures have now been moved to the main figures to strengthen the narrative flow.

      (2) Fig. Supp. 11 (PER and imaging using decapitated heads only), despite its creativity, leaves me wonder how PER of fly heads looks like. It is a highly artificial and invasive experiment. Supplementary movies would be helpful.

      We apologize for the lack of clarity in our description. In this experiment, flies were not decapitated. Instead, we surgically severed the connection between the brain and the ventral nerve cord (VNC), while keeping the body and proboscis musculature intact. Thus, the flies remained physically intact, and PER was measured using the same behavioral protocol as in intact animals.

      We have revised the figure legend to clarify this point and avoid confusion. Because the behavioral procedure was identical to standard PER assays and the flies retained normal proboscis motor function, we did not include supplementary videos.

      (3) Expression patterns of PK2-R1 and AstA-R2 in proboscis are mentioned in text but with no data (lines 229 and 279). I strongly encourage authors to show images.

      We have now included the relevant expression images in the revised manuscript.

      (4) A citation for the "previous study (line 486)" describing PER method is required.

      The appropriate citation has been added.

    1. eLife Assessment

      This important study developed a new sensor for TDP-43 activity that is sensitive and robust that should strongly impact the field's ability to monitor whether TDP-43 is functional or not. The evidence, though limited to cell culture, is compelling and is the first demonstration that a GFP on/off system can be used to assess genetic TDP-43 mutants as well as loss of soluble TDP-43.

    2. Reviewer #2 (Public review):

      Summary:

      The authors goals is to be develop a more accurate system that reports TDP-43 activity as a splicing regulator. Prior to this, most methods employed western blotting or QPCR based assays to determine whether targets of TDP-43 were up or down regulated. The problem with that is the sensitivity. This approach uses an ectopic delivered construct containing splicing elements from CFTR and UNC13A (two known splicing targets) fused to a GFP reporter. Not only does it report TDP-43 function well, but it operates at extremely sensitive TDP-43 levels, requiring only picomolar TDP-43 knockdown for detection. This reporter should supersede the use of current TDP-43 activity assays, its cost-effective, its rapid and reliable.

      Strengths:

      In general, the experiments are convincing and well designed. The rigor, number of samples and statistics, and gradient of TDP-43 knockdown were all viewed as strengths. In addition, the use of multiple assays to confirm the splicing changes were viewed as complimentary (ie PCR and GFP-fluorescence) adding additional rigor. The final major strength i'll add is the very clever approach to tether TDP-43 to the loss of function cassette such that when TDP-43 is inactive it would autoregulate and induce wild-type TDP-43. This has many implications for the use of other genes, not just TDP-43, but also other protective factors that may need to be re-established upon TDP-43 loss of function.

      Weaknesses:

      Admittedly, one needs to initially characterize the sensor and the use of cell lines is an obvious advantage, but it begs the question of whether this will work in neurons. Additional future experiments in primary neurons will be needed. The bulk analysis of GFP-positive cells is a bit crude. As mentioned in the manuscript, flow sorting would be an easy and obvious approach to get more accurate homogenous data. This is especially relevant since the GFP signal is quite heterogenous in the image panels, for example Figure 1C, meaning the siRNA is not fully penetrant. Therefore, stating that 1% TDP-43 knockdown achieves the desired sensor regulation might be misleading. Flow sorting would provide a much more accurate quantification of how subtle changes in TDP-43 protein levels track with GFP fluorescence.

      Some panels in the manuscript would benefit from additional clarity to make the data easier to visualize. For example, Figure 2D and 2G could be presented in a more clear manner, possibly split into additional graphs since there are too many outputs. Sup Figure 2A image panels would benefit from being labeled, its difficult to tell what antibodies or fluorophores were used. Same with Figure 4B.

      Figure 3 is an important addition to this manuscript and in general is convincing showing that TDP-43 loss of function mutants can alter the sensor. However, there is still wild-type endogenous TDP-43 in these cells, and its unclear whether the 5FL mutant is acting as a dominant negative to deplete the total TDP-43 pool, which is what the data would suggest. This could have been clarified. Additional treatment with stressors that inactivate TDP-43 could be tested in future studies.

      Overall, the authors definitely achieved their goals by developing a very sensitive readout for TDP-43 function. The results are convincing, rigorous, and support their main conclusions. There are some minor weaknesses listed above, chief of which is the use of flow sorting to improve the data analysis. But regardless, this study will have an immediate impact for those who need a rapid, reliable, and sensitive assessment of TDP-43 activity, and it will be particularly impactful once this reporter can be used in isolated primary cells (ie neurons) and in vivo in animal models. Since TDP-43 loss of function is thought to be a dominant pathological mechanism in ALS/FTD and likely many others disorders, having these type of sensors is a major boost to field and will change our ability to see sub-threshold changes in TDP-43 function that might otherwise not be possible with current approaches.

      Comments on revisions:

      In the revised version, most of the reviewer's comments have been appropriately addressed with the exception of 1) the use of flow sorting to improve the data analysis and 2) testing this sensor in primary neurons. The latter is the focus of an ongoing separate study. Though flow sorting would significantly strengthen this study and help others in the field to use this sensor, it is still an impactful and innovative study without it.

    3. Reviewer #3 (Public review):

      The DNA and RNA binding protein TDP-43 has been pathologically implicated in a number of neurodegenerative diseases including ALS, FTD, and AD. Normally residing in the nucleus, in TDP-43 proteinopathies, TDP-43 mislocalizes to the cytoplasm where it is found in cytoplasmic aggregates. It is thought that both loss of nuclear function and cytoplasmic gain of toxic function are contributors to disease pathogenesis in TDP-43 proteinopathies. Recent studies have demonstrated that depletion of nuclear TDP-43 leads to loss of its nuclear function characterized by changes in gene expression and splicing of target mRNAs. However, to date, most readouts of TDP-43 loss of function events are dependent upon PCR based assays for single mRNA targets. Thus, reliable and robust assays for detection of global changes in TDP-43 splicing events are lacking. In this manuscript, Xie, Merjane, Bergmann and colleagues describe a biosensor that reports on TDP-43 splicing function in real time. Overall, this is a well-described unique resource that would be of high interest and utility to a number of researchers validated in multiple cell types as a sensitive readout of TDP-43 loss of function. Future studies validating the utility of this biosensor in models of TDP-43 loss of function (e.g. disease iPSNs) that do not rely on TDP-43 knockdown will be of further interest.

    4. Author Response:

      The following is the authors’ response to the previous reviews

      Public Review:

      We thank the editor and reviewers for their thoughtful and constructive feedback, which has enabled us to greatly strengthen the manuscript. We apologize for the delay in resubmitting this as we were dealing with a large turnover in the lab due to trainee graduations which has We have carefully revised the text, figures, and supplementary materials in response to these comments. Below, we summarize the key revisions made followed by a point-by-point response to the reviewers’ critiques.

      (1) Performed CUTS analyses in human neuronal system: In the revised manuscript, we included new data demonstrating that the CUTS system can be applied to additional cellular models, specifically neuronal cells (Figure 5, Figure S4). To address whether CUTS functions effectively in neuronal contexts, we generated stable CUTS-expressing lines in differentiated BE(2)-C and ReN VM–derived differentiated neurons (Figure 5A-D, Figure S4 A-C). To ensure this was neuronal expression, we developed a new Tet-On3G system construct where the Tet-On3G transactivating protein is driven by the SYN1 promoter to ensure neuron-specific inducible expression for these experiments.

      (2) Define the relationship between CUTS and endogenous/physiological cryptic exons inclusion: To evaluate how well the CUTS system reflects physiological cryptic exon regulation, we performed RT-PCR analysis of several cryptic exons previously reported by us and evaluated CUTS activation at the RNA level in parallel (Figure S2E) . CUTS is sensitive to low-mild reductions in TDP-43 levels, whereas the tested endogenous cryptic exons exhibit variable responses to TDP-43 knockdown.

      (3) Defining stress-induced TDP-43 loss of function: We included new data demonstrating that the CUTS system can detect TDP-43 loss of function induced by acute sodium arsenite (NaAsO₂) treatment in HEK cells (Figure 3D–I). We have also tested additional stressor as part of a separate ongoing study where this work will be expanded upon (Xie et al., 2025). We selected this paradigm since TDP-43 loss of function in response to acute NaAsO₂ treatment is also supported by work from other labs(Huang et al., 2024).

      (4) Implications of using a TDP-43 Loss-of-Function sensor for therapeutic applications: In the revised manuscript, we clarify that CUTS-TDP43 is auto-regulated and we highlight two potential therapeutic applications: i) TDP-43 Knockdown-and-replacement: CUTS-TDP43 provides a strategy for simultaneous depletion of pathological TDP-43 species while enabling autoregulated re-expression of wild-type TDP-43. This design mitigates the risk of supraphysiologic overexpression, a known liability in conventional replacement approaches, by restoring TDP-43 within a self-limiting regulatory network that maintains homeostatic control. ii) Aggregation-independent correction: Because CUTS is autoregulatory, it can be repurposed to regulate alternative downstream effectors, including splicing modifiers or TDP-43 functional interactors, without expressing TDP-43 itself. This approach provides a potential aggregation-independent strategy to compensate for TDP-43 loss-of-function (LOF) by restoring downstream splicing. We are evaluating this work in a follow up study (Xie et al., 2025). In these ongoing studies, we show that CUTS-regulated expression of splicing proteins in response to TDP-43 loss restored subsets of cryptic exon events (24/28 events evaluated). These findings suggest CUTS as a versatile tool for both autoregulated TDP-43 replacement and trans-regulatory therapeutic correction. We expanded on this concept in the discussion section of this revised manuscript. We also note that autoregulatory TDP-43 biosensor strategies have been proposed in related systems, including TDP-Reg, underscoring broader interest in self-regulated TDP-43 systems (Wilkins et al., 2024).

      (5) Clarified mechanism of TDP-43 5FL causing strong loss of function: The TDP-43 5FL exhibits reduced RNA binding capacity, and we previously showed that the lack of RNA binding promotes aberrant homotypic phase separation of TDP-43 (Mann et al., 2019). Expression of RNA-deficient TDP-43 variant forms nuclear “anisomes” (Yu et al., 2021), which evidence suggests sequesters endogenous TDP-43 protein into insoluble structures. We expanded on this in our results section in this revised manuscript.

      (6) Improved figure clarity and data presentation: To enhance clarity and organization, we maintained the main structure of the manuscript while reorganizing figures and improved data visualization. Some examples include:

      Figure 1: We revised the schematic layout for greater clarity and simplicity. The figure now focuses more specifically on the CUTS data, with additional data on the UNC13A-TS and CFTR-TS moved to Figure S1. To improve readability, titles were added to all schematic panels. Visual consistency was also improved by refining the color labelling for each sensor in Figures 1C and 1D and adjusting the corresponding bar graphs accordingly.

      Figure 2: We reorganized the figure to clearly distinguish between protein and mRNA analyses for greater clarity. In the revised layout, western blot quantifications of TDP-43 and CUTS (GFP) signals are shown in Figures 2D and 2E, respectively, while the corresponding qPCR analyses are presented in Figures 2H and 2I. Minor edits include removing the percentage knockdown and fold-change annotations from the graphs and incorporating these values into a mini-table in Figure S2E.

      The original Figure 2D and 2G were reincorportated as reference panels in Figure S2A–B, while new graphs showing CUTS protein-level changes as a function of TDP-43 knockdown were added (Figure S2C–D). We also incorporated new data showing the behavior of endogenous cryptic exons under low siTDP-43 treatment (Figure S2E).

      Figure 3: We added new data demonstrating that the application of the CUTS system in detecting TDP-43 loss of function induced by stress conditions. Specifically, we show that sodium arsenite (NaAsO₂) treatment leads to TDP-43 functional impairment detectable by CUTS and supported with endogenous cryptic exon via RT-PCR (Figure 3D-I).

      Figure 5 and Figure S4: We introduced a new figure that demonstrates the effective application of the CUTS system in differentiated neuronal systems, thereby extending its usability to disease-relevant cell types.

      Figures 2SA and 4B were edited to include the corresponding labels on the sides of each image for clarity. Sup Figure 2A was moved to Sup Figure 3A, while Figure 4B remains in its original configuration.

      We thank the reviewers again for their insightful critiques and helpful suggestions, which have enabled us to substantially improve the manuscript. Please find our detailed response to each review below:

      Reviewer #1 (Public review):

      Summary:

      The authors create an elegant sensor for TDP -43 loss of function based on cryptic splicing of CFTR and UNC13A. The usefulness of this sensor primarily lies in its use in eventual high throughput screening and eventual in vivo models. The TDP-43 loss of function sensor was also used to express TDP-43 upon reduction of its levels.

      Strengths:

      The validation is convincing, the sensor was tested in models of TDP-43 loss of function, knockdown and models of TDP-43 mislocalization and aggregation. The sensor is susceptible to a minimal decrease of TDP-43 and can be used at the protein level unlike most of the tests currently employed,

      Weaknesses:

      Although the LOF sensor described in this study may be a primary readout for high-throughput screens, ALS/TDP-43 models typically employ primary readouts such as protein aggregation or mislocalization. The information in the two following points would assist users in making informed choices.

      (1) Testing the sensor in other cell lines

      We thank the reviewer for raising this important point. In agreement with this suggestion, we generated ReN VM cell lines and used a neuroblastoma cell line model (BE(2)-C) expressing the TetOn3G CUTS system under a human synapsin I (hSYN1) promoter. In this construct the transactivator protein is under the control of a neuronal specific hSYN1 promoter whereas the classical TetOn3G system uses a CMV-like promoter. Several studies have reported reduced activity or silencing of CMV and PGK-driven transgenes in neurons. Therefore, we for our neuronal experiments, we removed this promoter to generate a new version of a doxycycline-inducible CUTS system in which Tet-On 3G transactivator is now driven by the hSYN1 promoter which will express CUTS in response to doxycycline treatment. In this improved construct, we also replaced mCherry with mScarlet to enhance the fluorescent signal.

      To test this neuronal-adapted system, we established stable CUTS expression in undifferentiated BE(2)-C cells, a subclone of the SK-N-BE(2) neuroblastoma line that has been used to study TDP-43–dependent splicing function(Brown et al., 2022). This model can be differentiated into neuron-like cells within 10 days, as shown in Supplementary Figure 4A. Using this model, we confirmed that TDP-43 knockdown leads to robust activation of the CUTS system (Figure 5B-E). We additionally tested this in in a stable polyclonal ReN VM cells following differentiation into cortical-like neurons (Figure 5D, Figure S4B-C).

      (2) Establishing a correlation between the sensor's readout and the loss of function (LOF) in the physiological genes would be useful given that the LOF sensor is a hybrid structure and doesn't represent any physiological gene. It would be beneficial to determine if a minor decrease (e.g., 2%) in TDP-43 levels is physiologically significant for a subset of exons whose splicing is controlled by TDP43.

      We agree with the reviewer that correlating the sensor’s readout with physiological TDP-43 splicing targets is essential to validate its biological relevance. To this end, we complemented our sensor expression profile with endogenous cryptic exons (CEs) sensitive to TDP-43 depletion. We tested a panel of five physiological cryptic exons regulated by TDP-43 (LRP8, EPB41L4A, ARHGAP32, HDGFL2, and ACBD3). To address the reviewer’s concerned, we performed RT-PCR on samples from the low-dose siTDP-43 experiment shown in Figure S2E.

      The endogenous CEs used in the panel were selected based on our own and others’ preliminary observations. Among these, HDGFL2 showed a particularly robust increase in cryptic exon inclusion at very low siTDP-43 concentrations (38 pM), while untreated samples showed almost no CE inclusion. This finding strongly supports a direct mechanism linking mild TDP-43 reduction to loss of physiological splicing control.

      (3) Considering that most TDP-LOF pathologically occurs due to aggregation and or mislocalization, and in most cases the endogenous TDP-43 gene is functional but the protein becomes non-functional, the use of the loss of function sensor as a switch to produce TDP-43 and its eventual use as gene therapy would have to contend with the fact that the protein produced may also become nonfunctional. This would eventually be easy to test in one of the aggregation modes that were used to test the sensor.. However, as the authors suggest, this is a very interesting system to deliver other genetic modifiers of TDP-43 proteinopathy in a regulated fashion and timely fashion.

      We thank the reviewer for this thoughtful point and agree that in the disease-relevant context where endogenous TDP-43 is intact but TDP-43 function is lost due to mislocalization and/or aggregation, a re-supply of TDP-43 risks sequestration and loss of activity. In our manuscript, the CUTS-TDP43 module was presented as a control circuit proof-of-concept rather than a stand-alone approach: it demonstrates that CUTS can (i) sense LOF with high dynamic range and proportionality, and (ii) drive a payload under negative feedback such that total TDP-43 remains near baseline while partially rescuing a splicing readout (CFTR minigene) under knockdown conditions.

      Importantly, we evaluated CUTS in aggregation/mislocalization-prone contexts: ΔNLS, 5FL, and ΔNLS+5FL variants trigger CUTS activation (ref), allowing us to quantify LOF arising from these aggregation modes. This confirms that CUTS can operate precisely in the very settings where sequestration is likely to occur.

      To directly address the reviewer’s suggestion, in the revision we (i) clarify in the Discussion that CUTS-TDP43 is a circuit demonstration and not our proposed monotherapy in aggregation-dominant disease; and (ii) expand our therapeutic framing into two approaches:

      Knockdown-and-replacement: concurrently deplete aggregation-prone/endogenous pathologic TDP-43 species (i.e., mutant TDP-43) while using CUTS to re-deliver wild-type TDP-43 under autoregulation. Aggregation-independent correction: use of CUTS to deliver modifiers that bypass TDP-43 sequestration (e.g., downstream effectors or splicing correctors that restore LOF consequences without expressing TDP-43 itself).

      (4) I don't think the quantity of siRNA is directly proportional to the degree of TDP-43 knockdown/extent of TDP-43 loss. Therefore, to enhance the utility of the dose-response curves, I'd suggest using TDP-43 levels as the variable on the x-axis, rather than the amount of siRNA administered or even just adding a plot alongside the current plots would enable readers to quickly evaluate LOF response levels concerning the protein. While I understand that the sensitivity of Western blots for quantification might be why the authors have not created the graphs in this manner, having this information would be useful.

      We appreciate the reviewer’s insightful comment. As noted, in the original version of the graph, we incorporated the percentage of TDP-43 knockdown corresponding to each siTDP-43 concentration (indicated in red text). However, we agree that this format was not easy to interpret, given the amount of information presented. To address this, we generated two new plots in which the x-axis represents TDP-43 levels (percentage of remaining protein or mRNA), and the y-axis shows the fold change in CUTS signal measured by (i) TDP-43 protein pixel intensity and (ii) TDP-43 mRNA levels, respectively. These new plots are now included as Supplementary Figures 2C–D, which allow a clearer visualization of CUTS readout in relation to actual TDP-43 levels rather than siRNA dose. As the reviewer anticipated, the reason we did not originally present the data in this format was that at low siTDP-43 concentrations, the fold change is minimal and more difficult to quantify by Western blot. Nevertheless, we have now incorporated the revised plots to strengthen the interpretation of the dose–response relationship. Additionally, we experience batch effects across siRNA lots. We believe this revised format should enhance the clarity of the result.

      (5) p3 line 74: one of the reasons cited as a pitfall of using the endogenous cryptic exons exhibit variable responses to TDP-43 loss and may be cell type-specific. has the sensor been used in different cell lines?

      We tested the CUTS system in differentiated neuronal models using two differentiated neuronal cell types, BE(2)C and ReN VM cells. The results are presented in Figure 5 and Figure S4 of the revised manuscript.

      (6) The order of the text describing 1A and 1B is confusing. The text starts describing the TS cassettes referring to 1A using the CUTS cassettes which haven't been introduced yet as an example. I'd suggest reorganising this section. The graph, always in 1A showing readout proportional to GFP should be taken out or highlighted in the figure legend that it is theoretical.

      We agree with the reviewer’s point. In the original schematic (Figure 1A), we included the CUTS system as an example to introduce the TS cassette design, since it contains the three possible sensor configurations. However, we recognize that this could be confusing. Therefore, we have removed the CUTS cassette from Figure 1A, along with the theoretical graph showing GFP readout proportional to the degree of TDP-43 LOF. In agreement with this change, we also restructured Figure 1. As the focus is the CUTS system, we have moved the Western blot and quantification of UNC13A-TS and CFTR-TS to Supplementary Figure 1.

      Reviewer #2 (Public review):

      Summary:

      The authors goal is to develop a more accurate system that reports TDP-43 activity as a splicing regulator. Prior to this, most methods employed western blotting or QPCR-based assays to determine whether targets of TDP-43 were up or down-regulated. The problem with that is the sensitivity. This approach uses an ectopic delivered construct containing splicing elements from CFTR and UNC13A (two known splicing targets) fused to a GFP reporter. Not only does it report TDP-43 function well, but it operates at extremely sensitive TDP-43 levels, requiring only picomolar TDP-43 knockdown for detection. This reporter should supersede the use of current TDP-43 activity assays, it's cost-effective, rapid and reliable.

      Strengths:

      In general, the experiments are convincing and well designed. The rigor, number of samples and statistics, and gradient of TDP-43 knockdown were all viewed as strengths. In addition, the use of multiple assays to confirm the splicing changes were viewed as complimentary (ie PCR and GFPfluorescence) adding additional rigor. The final major strength I'll add is the very clever approach to tether TDP-43 to the loss of function cassette such that when TDP-43 is inactive it would autoregulate and induce wild-type TDP-43. This has many implications for the use of other genes, not just TDP-43, but also other protective factors that may need to be re-established upon TDP-43 loss of function.

      Weaknesses:

      (1) Admittedly, one needs to initially characterize the sensor and the use of cell lines is an obvious advantage, but it begs the question of whether this will work in neurons. Additional future experiments in primary neurons will be needed.

      We thank the reviewer for highlighting the importance of validating the sensor in neuronal models, given the central role of TDP-43 dysfunction in ALS/FTD and related neurodegenerative disorders. While initial characterization in established cell lines provides experimental control and scalability, we agree that demonstrating functionality in neuronal systems is essential. To address this, we adapted the CUTS platform for neuronal application by incorporating the human synapsin-1 (hSYN1) promoter into the Tet-On 3G system to enable inducible, neuronal specific expression. We validated this configuration in differentiated BE(2)-C cells (Figures 5A-C, S4A-C), where CUTS retained robust responsiveness to TDP-43 perturbation. In parallel, we generated stable CUTS-expressing ReN VM neural progenitor cells and differentiated them for three weeks prior to functional assessment (Figures 5A-C, S4A-C). In both neuronal models, CUTS was functional and responsive to TDP-43 siRNA. We are currently optimizing promoter selection and expression paradigms for fully differentiated iPSC-derived neuronal models and will be the subject of future studies.

      (2) The bulk analysis of GFP-positive cells is a bit crude. As mentioned in the manuscript, flow sorting would be an easy and obvious approach to get more accurate homogenous data. This is especially relevant since the GFP signal is quite heterogeneous in the image panels, for example, Figure 1C, meaning the siRNA is not fully penetrant. Therefore, stating that 1% TDP-43 knockdown achieves the desired sensor regulation might be misleading. Flow sorting would provide a much more accurate quantification of how subtle changes in TDP-43 protein levels track with GFP fluorescence.

      We thank the reviewer for this thoughtful suggestion. We agree that flow cytometry and sorting of GFP-positive populations would provide a higher-resolution, single-cell–level relationship between TDP-43 abundance and sensor output. Such an approach would reduce heterogeneity arising from incomplete siRNA penetrance and allow more precise quantification of how incremental changes in TDP-43 protein levels track with GFP fluorescence. In the present study, our goal was to establish proof-of-principle functionality of the CUTS circuit and to demonstrate that graded TDP-43 depletion produces a proportional sensor response at the population level. While GFP signal heterogeneity is visible in imaging panels, we hypothesize that this variability likely reflects known differences in siRNA uptake and transfection efficiency rather than instability of the circuit itself. Importantly, bulk measurements consistently demonstrated dose-dependent sensor regulation across independent experiments, supporting the robustness of the system despite cellular heterogeneity. Furthermore, we were able to quantify CUTS activation in HeLa TARDBP<sup>-/-</sup> cells. We also note that CUTS was developed as a practical tool for rapid assessment of TDP-43 LOF in standard laboratory settings. Although flow cytometry increases resolution, the ability to detect functional perturbation using bulk fluorescence measurements supports the utility of the system for routine and high-throughput applications.

      We agree that flow cytometry would provide a more refined analysis of the dynamic range and sensitivity of CUTS, particularly for defining thresholds such as minimal TDP-43 knockdown required for measurable activation. We plan to include this work in future studies. Specifically, we have implemented FACs sorting of CUTS-expressing cells in a parallel study in which we are conducting a CRISPR knockout screen to identify modifiers of TDP-43 splicing function. For this, we incorporate TDP-43 knockdown followed by FACs to stratify cells based on CUTS activation. This strategy enables direct evaluation of the relationship between the extent of TDP-43 LOF and CUTS sensor activation. These analyses are ongoing and provide a more quantitative analyses linking TDP-43 depletion to CUTS activation and address the reviewer’s concern regarding heterogeneity in bulk measurements. We plan to include this in a future study.

      (3) Some panels in the manuscript would benefit from additional clarity to make the data easier to visualize. For example, Figure 2D and 2G could be presented in a more clear manner, possibly split into additional graphs since there are too many outputs.

      We thank the reviewer for this suggestion. In response, we have split the graphs previously shown in Figures 2D and 2G to improve clarity, as we agree that these panels contained an extensive amount of data. We Specifically split Figure 2D into two separate graphs showing TDP-43 and GFP pixel intensity from Western blots on the Y-axis, plotted against low siTDP-43 treatment on the X-axis. Please see this data as Figure 2 D and Figure 2E in the new manuscript.

      Furthermore, for Figure 2G we also split into graphs showing the fold change of mRNA for TDP-43 and the CUTS cryptic exon plotted against low siTDP-43 treatment on the X-axis. Please see this data as Figure 2 H and Figure 2I in the new manuscript. We have maintained the previous graphs in Supplementary Figure 2 to preserve the full dataset for reference.

      (4) Sup Figure 2A image panels would benefit from being labeled, its difficult to tell what antibodies or fluorophores were used. Same with Figure 4B.

      We appreciate the reviewer’s careful observation. In both figures, we are showing mCherry and GFP signals. In the revised version, we have added the corresponding labels to the side of each image for clarity. Therefore, Sup Figure 2A has been moved and is now Sup Figure 3A, while Figure 4B remains in its original configuration.

      (5) Figure 3 is an important addition to this manuscript and in general is convincing showing that TDP43 loss of function mutants can alter the sensor. However, there is still wild-type endogenous TDP-43 in these cells, and it's unclear whether the 5FL mutant is acting as a dominant negative to deplete the total TDP-43 pool, which is what the data would suggest. This could have been clarified.

      The TDP-43 5FL variant exhibits reduced RNA-binding capacity, and we previously demonstrated that impaired RNA binding promotes aberrant homotypic phase separation of TDP-43. Consistent with this mechanism, expression of RNA-binding–deficient TDP-43 variants induces the formation of nuclear “anisomes” which have been shown to sequester endogenous TDP-43 into insoluble fractions via dominant-negative mechanisms (Cohen et al., 2015; Keating et al., 2023; Mann et al., 2019; Yu et al., 2021). These findings support a model in which disruption of RNA engagement alters TDP-43 biophysical behavior and promotes functional depletion through self-association. We have expanded this mechanistic explanation in the Results section of the revised manuscript to better contextualize the behavior of the 5FL construct and its impact on endogenous TDP-43.

      (6) Additional treatment with stressors that inactivate TDP-43 could be tested in future studies.

      We appreciate this suggestion and agree with this important point. Due to the lack of methods to directly induce endogenous TDP-43 aggregation and loss of function, the use of stressors has become a partial solution to address this issue. In line with this, our group has tested several stressors in follow-up research, including sodium arsenite (NaAsO₂), puromycin, KCl, MG132, sorbitol, and tunicamycin, using HEK cells expressing the CUTS system(Xie et al., 2025). We were able to show a dose-response relationship in relative GFP intensity under these conditions, with sodium arsenite showing the strongest effect, consistent with previous reports(Huang et al., 2024). To provide additional relevant findings in the current manuscript, we expanded this analysis by testing sodium arsenite in the CUTS system while also including endogenous cryptic exons. We therefore added a new figure showing the effect of sodium arsenite on the CUTS system, including GFP intensity measurements, qPCR using CUTS cryptic exon primers, and three endogenous cryptic exon reporters (ATG4B, GPSM2, and KCNQ2).

      Overall, the authors definitely achieved their goals by developing a very sensitive readout for TDP-43 function. The results are convincing, rigorous, and support their main conclusions. There are some minor weaknesses listed above, chief of which is the use of flow sorting to improve the data analysis. But regardless, this study will have an immediate impact for those who need a rapid, reliable, and sensitive assessment of TDP-43 activity, and it will be particularly impactful once this reporter can be used in isolated primary cells (ie neurons) and in vivo in animal models. Since TDP-43 loss of function is thought to be a dominant pathological mechanism in ALS/FTD and likely many other disorders, having these types of sensors is a major boost to the field and will change our ability to see sub-threshold changes in TDP-43 function that might otherwise not be possible with current approaches.

      (7) Regarding the methods, they seem a bit sparse and would benefit from additional detail. For example, I do not see a section in the methods where microscopy images were quantified (%GFP positive cells for example). This information is important and is lacking in the current form.

      We thank the reviewers, and we add the following information in the method section: For live imaging quantification, we measured the mean GFP signal intensity for each group. The values were averaged, and the fold change was calculated and plotted. For immunofluorescent imaging, we first created maximum intensity projection images. We then applied masks to the GFP, mCherry, and Hoechst signals. By overlapping the GFP and mCherry signals, we identified the number of GFP-positive cells. Similarly, by overlapping the mCherry signal with the Hoechst mask, we identified the CUTS-expressing cells. We then calculated the ratio of GFPpositive cells to CUTS-expressing cells and plotted it as a percentage of GFP-positive cells. All analyses were performed using the Nikon NIS software. This information is included in the methods of the revised manuscript.

      Reviewer #3 (Public review):

      The DNA and RNA binding protein TDP-43 has been pathologically implicated in a number of neurodegenerative diseases including ALS, FTD, and AD. Normally residing in the nucleus, in TDP-43 proteinopathies, TDP-43 mislocalizes to the cytoplasm where it is found in cytoplasmic aggregates. It is thought that both loss of nuclear function and cytoplasmic gain of toxic function are contributors to disease pathogenesis in TDP-43 proteinopathies. Recent studies have demonstrated that depletion of nuclear TDP-43 leads to loss of its nuclear function characterized by changes in gene expression and splicing of target mRNAs. However, to date, most readouts of TDP-43 loss of function events are dependent upon PCR-based assays for single mRNA targets. Thus, reliable and robust assays for detection of global changes in TDP-43 splicing events are lacking. In this manuscript, Xie, Merjane, Bergmann and colleagues describe a biosensor that reports on TDP-43 splicing function in real time. Overall, this is a well described unique resource that would be of high interest and utility to a number of researchers. Nonetheless, a couple of points should be addressed by the authors to enhance the overall utility and applicability of this biosensor.

      (1) While the rationale for selecting UNC13A CE as the reporting CE species is understood given the relevance to disease, could the authors please comment on whether other CE sequences would behave similarly or as robustly? This is particularly critical given the multitude of different splicing changes that can occur as a result of TDP-43 loss of function (ie cryptic exons of differing sensitivity, skiptic exons, premature polyadenylation).

      We thank the reviewer for this question regarding generalizability beyond the UNC13A CE. While UNC13A was selected due to its strong disease relevance and well-characterized sensitivity to TDP-43 loss-of-function (LOF), our platform is not intrinsically restricted to this sequence. In the manuscript, we directly compared three architectures: UNC13A-TS, CFTR-TS, and the combined CUTS sensor incorporating additional UG motif optimization. Under matched conditions in stable HEK293 lines, CUTS demonstrated superior specificity and sensitivity, exhibiting near-zero baseline activity and a proportional, log-linear response across low-dose siTDP43 (38–1200 pM) (Figures 1–2). Importantly, this head-to-head comparison demonstrates that sensor performance can be engineered and optimized beyond a single CE species.

      TDP-43 LOF is known to induce a spectrum of RNA processing defects, including cryptic exons with differing sensitivities and cell-type dependence, premature polyadenylation events (e.g., STMN2), and, under conditions of excess nuclear TDP-43, exon skipping (“skiptic exons”). This diversity supports the concept in which alternative CE elements, or other TDP-43 regulated RNAs, can be incorporated into the same sensor backbone and tuned for specific biological scenarios (cell type, specific stress responses, etc...). Consistent with this, the recently described TDP-REG system (Wilkins et al., 2024) designed and AI-generated de novo CE sequences to express reporters or gene payloads, and screened multiple candidates to identify the appropriate RNA elements required for this response. These findings demonstrate that CE sequences beyond UNC13A can serve as robust TDP-43 sensing elements when optimized. Our results complement this work by demonstrating that CUTS achieves tight baseline control and a steep dynamic range (>110,000-fold induction over baseline in HEK293 cells), while maintaining compatibility across both non-neuronal and neuronal model systems, as shown in the revised manuscript.

      In the revised manuscript, we show direct comparisons indicating that CUTS outperforms single-CE sensors such as UNC13A-TS and CFTR-TS under identical conditions. This supports independent work from other groups that alternative CE sequences can be engineered into effective sensors, depending on their paradigm and model systems. We have clarified this in the revised Discussion and now note that CUTS is adaptable to alternative CE inserts.

      (3) Could the authors provide evidence of the utility of their biosensor in disease relevant systems that do not rely on TDP-43 KD? For example, does this biosensor report on TDP-43 loss of function in C9orf72 iPSNs in a time-dependent manner? Alternatively, groups have modeled TDP-43 proteinopathy in wildtype iPSNs via MG132 treatment.

      We thank the reviewer for this important suggestion. We agree that demonstrating CUTS responsiveness in disease-relevant models independent of artificial TDP-43 knockdown would further strengthen its translational relevance. In the current study, our primary objective was to establish the sensitivity, dynamic range, and autoregulatory properties of the CUTS circuit under controlled perturbation of TDP-43 levels. siRNA-mediated depletion provides a reliable approach to establish the relationship between graded TDP-43 LOF and the CUTS sensor sensitivity/specificity. That said, CUTS is designed to detect functional TDP-43 loss irrespective of the upstream cause. As the reviewer notes, disease-relevant systems, such as C9orf72 iPSC-derived neurons and proteotoxic stress paradigms (e.g., MG132-induced impairment of TDP-43 nuclear function), are important for future studies. We are currently evaluating CUTS in iPSC-derived neuronal models of TDP-43 proteinopathy, but are optimizing the induction system, promoters, and timing. It should be noted that C9orf72 iPSC neurons do not exhibit TDP-43 LOF using standard differentiation protocols. Regarding pharmacological stress, we have shown that acute sodium arsenite treatment can activate CUTS (Figure 3). In a concurrent study under revision, we show that MG132 similarly causes TDP-43 LOF and CUTS activation (Xie et al., 2025). Notably, none of these induce complete nuclear loss of TDP-43; instead, they show nuclear TDP-43 retention or modest mislocalization. This suggests that TDP-43 LOF may also result from nuclear redistribution and dysfunction under these stress conditions, rather than from complete nuclear loss. We look forward to presenting these ongoing studies in the future.

      References

      Brown A-L, Wilkins OG, Keuss MJ, Kargbo-Hill SE, Zanovello M, Lee WC, Bampton A, Lee FCY, Masino L, Qi YA, Bryce-Smith S, Gatt A, Hallegger M, Fagegaltier D, Phatnani H, NYGC ALS Consortium, Newcombe J, Gustavsson EK, Seddighi S, Reyes JF, Coon SL, Ramos D, Schiavo G, Fisher EMC, Raj T, Secrier M, Lashley T, Ule J, Buratti E, Humphrey J, Ward ME, Fratta P. 2022. TDP-43 loss and ALS-risk SNPs drive mis-splicing and depletion of UNC13A. Nature 603:131–137. doi:10.1038/s41586-022-04436-3

      Cohen TJ, Hwang AW, Restrepo CR, Yuan C-X, Trojanowski JQ, Lee VMY. 2015. An acetylation switch controls TDP-43 function and aggregation propensity. Nat Commun 6:5845. doi:10.1038/ncomms6845

      Huang W-P, Ellis BCS, Hodgson RE, Sanchez Avila A, Kumar V, Rayment J, Moll T, Shelkovnikova TA. 2024. Stress-induced TDP-43 nuclear condensation causes splicing loss of function and STMN2 depletion. Cell Rep 43:114421. doi:10.1016/j.celrep.2024.114421

      Keating SS, Bademosi AT, San Gil R, Walker AK. 2023. Aggregation-prone TDP-43 sequesters and drives pathological transitions of free nuclear TDP-43. Cell Mol Life Sci 80:95. doi:10.1007/s00018-023-04739-2

      Mann JR, Gleixner AM, Mauna JC, Gomes E, DeChellis-Marks MR, Needham PG, Copley KE, Hurtle B, Portz B, Pyles NJ, Guo L, Calder CB, Wills ZP, Pandey UB, Kofler JK, Brodsky JL, Thathiah A, Shorter J, Donnelly CJ. 2019. RNA Binding Antagonizes Neurotoxic Phase Transitions of TDP-43. Neuron 102:321-338.e8. doi:10.1016/j.neuron.2019.01.048

      Wilkins OG, Chien MZYJ, Wlaschin JJ, Barattucci S, Harley P, Mattedi F, Mehta PR, Pisliakova M, Ryadnov E, Keuss MJ, Thompson D, Digby H, Knez L, Simkin RL, Diaz JA, Zanovello M, Brown A-L, Darbey A, Karda R, Fisher EMC, Cunningham TJ, Le Pichon CE, Ule J, Fratta P. 2024. Creation of de novo cryptic splicing for ALS and FTD precision medicine. Science 386:61–69. doi:10.1126/science.adk2539

      Xie L, Zhu Y, Hurtle BT, Wright M, Robinson JL, Mauna JC, Brown EE, Ngo M, Bergmann CA, Xu J, Merjane J, Gleixner AM, Grigorean G, Liu F, Rossoll W, Lee EB, Kiskinis E, Chikina M, Donnelly CJ. 2025. Contextdependent Interactors Regulate TDP-43 Dysfunction in ALS/FTLD. BioRxiv. doi:10.1101/2025.04.07.646890

      Yu H, Lu S, Gasior K, Singh D, Vazquez-Sanchez S, Tapia O, Toprani D, Beccari MS, Yates JR, Da Cruz S, Newby JM, Lafarga M, Gladfelter AS, Villa E, Cleveland DW. 2021. HSP70 chaperones RNA-free TDP-43 into anisotropic intranuclear liquid spherical shells. Science 371. doi:10.1126/science.abb4309.

    1. eLife Assessment

      This valuable study addresses mechanisms of feedback inhibition between planar cell polarity protein complexes during convergent extension movements in Xenopus embryos. The authors propose a conceptually new model, in which non-canonical Wnt ligand stimulates transition of Dishevelled from its complex with Vangl to Frizzled, with essential roles of Prickle and Ror in this process. The main observations supporting molecular interactions rely on modest but significant changes in protein association in response to Wnt11. While the study is limited due to insufficient phenotypic analysis at the cellular level and the use of exogenously supplied proteins, this work is convincing and will be of broad interest to cell and developmental biologists.

    2. Reviewer #1 (Public review):

      Summary:

      Planar cell polarity core proteins Frizzled (Fz)/Dishevelled (Dvl) and Van Gogh-like (Vangl)/Prickle (Pk) are localized on opposite sides of the cell and engage in reciprocal repression to modulate cellular polarity within the plane of static epithelium. In this interesting manuscript, the authors explore how the anterior core proteins (Vangl/Pk) inhibit the posterior core protein (Dvl). The authors propose that Pk assists Vangl2 in sequestering both Dvl2 and Ror2, while Ror2 is essential for Dvl to transition from Vangl to Fz in response to non-canonical Wnt signaling.

      Strengths:

      The strengths of the manuscript are found in the very interesting and new concept along with supportive data for a model of how non-canonical Wnt induces Dvl to transition from Vangl to Fz with an opposing role for PK and Vangl2 to suppress Dvl during convergent extension movements. Ror is key player required for the transition and antagonizes Vangl.

      Weaknesses:

      In addition to general whole embryo morphology that is used as evidence for CE defects, two forms of data are presented: co-expression and IP, as well as IF of exogenously expressed proteins. The microscopy would benefit from super-resolution microscopy since in many cases the differences in protein localization are not very pronounced, and Western analysis data often show relatively subtle differences. Thus, future work will determine the strength of the interactions of the model.

      Major points.

      Overexpression conditions

      A possible concern is that most analyses were performed with overexpression conditions. PCP core proteins (Vangl2, Pk, Dvl, and Fz receptors) are known to display polarized subcellular localization in both the neural epithelium and DMZ explants (Ref: PCP and Septins govern the polarized organization of the actin cytoskeleton during convergent extension, Current Biology, 2024). However, in this study, overexpressed PCP core proteins failed to show polarized localization. Thus, one must be careful in interpreting data.

      Subtle effects

      Several of the reported results show quite modest changes in imaging and immunoprecipitation analyses, which are supportive of the proposed molecular model, but future experiments will be needed to robustly test the model.

    3. Author Response:

      The following is the authors’ response to the previous reviews

      Public Review:

      Reviewer #1 (Public review):

      The weaknesses are in the clarity and resolution of the data that forms the basis of the model. In addition to general whole embryo morphology that is used as evidence for CE defects, two forms of data are presented, co-expression and IP, as well as a strong reliance on IF of exogenously expressed proteins. Thus, it is critical that both forms of evidence be very strong and clear, and this is where there are deficiencies; 1) For vast majority of experiments general morphology and LWR was used as evidence of effects on convergent extension movements rather than keller explants or actual cell movements in the embryo. 2) the microscopy would benefit from super resolution microscopy since in many cases the differences in protein localization are not very pronounced. 3) the IP and Western analysis data often shows very subtle differences, and some cases not apparent.

      Major points.

      (1) Assessment of CE movement

      The authors conducted an analysis of the subcellular localization of PCP core proteins, including Vangl2, Pk, Fz, and Dvl, within animal cap explants (ectodermal explants). The authors primarily used the length-to-width ratio (LWR) to evaluate CE movement as a basis for their model. However, LWR can be influenced by multiple factors and is not sufficient to directly and clearly represent CE defects. While the author showed that Prickle knockdown suppresses animal cap elongation mediated by Activin treatment, they did not test their model using standard assays such as animal cap elongation or dorsal marginal zone (DMZ) Keller explants. Furthermore, although various imaging analyses were performed in Wnt11-overexpressing animal caps and DMZ explants, the Wnt11-overexpressing animal caps did not undergo CE movement. Given that this study focuses on the molecular mechanisms of Vangl2 and Ror2 regulation of Dvl2 during CE, the model should be validated in more appropriate tissues, such as DMZ explants.

      (2) Overexpression conditions

      Another concern is that most analyses were performed with overexpression conditions. PCP core proteins (Vangl2, Pk, Dvl, and Fz receptors) are known to display polarized subcellular localization in both the neural epithelium and DMZ explants (Ref: PCP and Septins govern the polarized organization of the actin cytoskeleton during convergent extension, Current Biology, 2024). However, in this study, overexpressed PCP core proteins failed to show polarized localization. Previous studies, such as those from the Wallingford lab, typically used 10-30 pg of RNA for PCP core proteins, whereas this study injected 100-500 pg, which is likely excessive and may have created artificial conditions that confound the imaging results.

      (3) Subtle and insufficient effects

      Several of the reported results show quite modest changes in imaging and immunoprecipitation analyses, which are not sufficient to strongly support the proposed molecular model. For example, most Dvl2 remained localized with Fz7 even under Vangl2 and Pk overexpression (Fig. 4). Similarly, Wnt11 overexpression only slightly reduced the association between Vangl2 and Dvl2 (Sup. Fig. 8), and the Ror2-related experiments also produced only subtle effects (Fig. 8, Sup. Fig. 15).

      We thank reviewer 1 for careful reading of our revised manuscript, and additional constructive criticisms. Since the two reviewers had divergent opinions towards our revised manuscript, we think that it might be more productive to request a Version of Record at this point, and have our proposed model debated/ tested by others in the field. We will keep the reviewer’s suggestions in mind while design ongoing studies. We would like to address the criticisms collectively below:

      (1) The primary goal of our current manuscript is to build a mechanistic model for non-canonical Wnt signaling through elucidating the functional relationships between Dvl, Vangl, PK and Ror during CE. They each have been studied extensively in prior literature using DMZ injected embryos, and DMZ, Keller and animal cap explants, so there is little doubt that the reduced LWR following their over-expression or knockdown in DMZ is due to disruption of CE. In the context of our study in the current manuscript, we primarily performed their co-injections in different combinations to differentiate synergistic vs. antagonistic relationship, and in the majority cases we relied on epistatsis to draw conclusions (e.g. Fig. 1; Fig. 2h, I; Suppl. Fig. 6; Suppl. Fig. 14). Nevertheless, we did follow the reviewer’s suggestion and used animal cap elongation as an additional assay to confirm that Pk and Vangl2 did synergize to disrupt CE, and their synergy could be blocked by Dvl2 co-overexpression; the new data is added to Fig. 1 (Fig. 1h, h’). Therefore, given the prior literature, our new animal cap explant data, and the specific scope of our current study, we feel that the LWR measurement is a reasonable assay to determine CE phenotype in this manuscript. We fully agree with the reviewer that our model will need to be tested at the cellular level through live imaging of DMZ explants; it is indeed the direction of our future study, but is beyond the scope of the current manuscript.

      (2) A salient feature of non-canonical Wnt signaling is that loss or over-expression of any components can often cause identical CE defects at the tissue/ embryo level. We used many co-injection experiments to demonstrate that this is due, at least in part, to a counterbalance between Dvl/Ror and Vangl/PK (e.g. Fig. 1; Fig. 2h, I; Suppl. Fig. 6; Suppl. Fig. 14). It is in this context that we planned the imaging and biochemical experiments to determine the possible molecular mechanisms underlying their functional interaction, and we feel that the moderate over-expression used is reasonable in this case for us to build the first integrated model. We do plan to test our model using lower expression in the future. To acknowledge the limitation of our study, we also added the following sentences in the Discussion:

      “We acknowledge, however, that our model explains primarily the potential molecular actions underlying the regulation of CE at the tissue level. Whether and how our model may explain the cellular behavior during CE, such as polarized remodeling of cell junction or extension of cell protrusions, will require further study.”

      (3) The Wnt11 induced reduction of Dvl2-Vangl2 co-IP (Suppl. Fig. 8, 15) may be moderate, but is statistically significant and reproducible, and we have reported similar findings in two other publications (DOI: 10.1093/hmg/ddx095; DOI: 10.1038/s41467-025-57658-0). Given the limitation of co-IP, we had to rely on high level over-expression to make the experiments feasible. We are building proximity based assays such as NanoBRET, and plan to verify the result with lower level expression in the future.

      Reviewer #2 (Public review):

      We thank the reviewer for the encouraging comments, and the suggestion to clarify the description related to Suppl. Fig. 15. We made revision according to the reviewer’s suggestion, and added Suppl. Fig. 16 to further examine the effect of Ror2 knockdown on the steady state interaction between Dvl2 and Vangl2 using imaging approach.

    1. eLife Assessment

      In this important study, a new multi-scale imaging workflow promises to accelerate and democratize comparative connectomics, with projectome-level data informing synapse-level connectivity. While the pipeline and time savings are convincing, the evidence for the segmentation methodology as a reusable community resource is incomplete, with key metrics like error rates, annotation times, and proof-reading times not reported. Furthermore, the evidence on the utility of projectome-level information for analysing brains appears misleading. By clarifying the findings and ensuring that the complete software pipeline is available in online open source repositories alongside precise documentation, the authors would deliver on their vision to enable any laboratory to map and analyse brain connectomes.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript presents an end-to-end pipeline, intended to accelerate EM-based connectomics by combining low-resolution imaging for large volumes with synapse-level imaging only in selected regions of interest. In principle, this strategy can substantially reduce imaging time, computational demands, analysis time, and overall cost.

      General note:

      Overall, I found the manuscript interesting and valuable, particularly as a description of how one laboratory has assembled and applied a practical workflow to reconstruct and analyze the central complex across multiple insect species. In that sense, the work is compelling as an account of a real, functioning strategy for comparative connectomics, and I appreciated reading it. My main reservation is not about the relevance of the biological problem or the utility of the pipeline in the authors' own hands, but about whether the manuscript, in its current form, fully meets the expectations of a paper that is focused on tools and resources. The expectation would be that this paper would be a venue for sharing new techniques, software tools, datasets, and other resources intended to be usable by the community. Here, because much of the pipeline appears to build on existing methods and software, the key value added should be a particularly clear demonstration of how these components were adapted, integrated, validated, and documented for this specific use case in a way that others could realistically reproduce and adopt. At present, that translational and reproducibility-oriented component does not yet seem sufficiently developed, despite the clear promise of the overall approach.

      Major comments:

      (1) The work is valuable as a practical integration and application of multiple existing tools into a coherent pipeline, together with a new multi-resolution imaging strategy. However, the manuscript at times reads as though it introduces an entirely novel workflow. I would encourage the authors to clarify the contribution more explicitly: which components are genuinely new (for example, the acquisition strategy and the end-to-end integration/validation), and which are adaptations of already established methods or software. This would make the scope and novelty of the paper easier to assess.

      (2) The most distinctive element is the multi-resolution acquisition strategy. However, as described, the selection of high-resolution regions seems to be decided a priori based on anatomy (guided by xCT localization of the CX), rather than being determined automatically from the data (i.e., ROI placement is anatomy-driven rather than data-driven). A more data-driven or machine learning-guided ROI strategy would strengthen the methodological contribution and the adaptability to new scenarios, along the lines of approaches such as SmartEM [1].

      (3) The manuscript emphasizes open-source availability and reduced barriers to entry, but the current software release, as referenced, does not yet appear to support straightforward external reuse. Since much of the pipeline builds on existing methods, the main added value lies in how these technologies were adapted, combined, and validated for the present problem. A clear and complete explanation of this adaptation is therefore essential, but is currently missing. I would suggest the following concrete improvements:<br /> a) Provide a single landing page or umbrella repository that links each pipeline step in the paper to the corresponding codebase, including version tags/commits and expected inputs/outputs for each step.<br /> b) Include step-by-step tutorials for each component.<br /> c) Provide an example dataset together with a full reproduction walkthrough in a controlled environment.<br /> d) Clearly explain the required parameters and configuration for each step, including how they should be adjusted for other datasets or scenarios.<br /> e) Follow packaging and distribution best practices (for example, PyPI/conda releases, Docker containers, and version pinning).

      (4) In my own attempt to set up and run parts of the released code, I encountered issues that currently limit reproducibility. For example, when creating an environment for EMalign (https://github.com/Heinze-lab/EMalign), the required Python version is not specified, and installation did not succeed under Python 3.12 due to dependency constraints. Additionally, synful_312 (https://github.com/Heinze-lab/synful_312) and SegToPCG (https://github.com/Heinze-lab/SegToPCG) appear to be empty despite being referenced in the manuscript. These are fixable issues, but addressing them is important if the paper is to deliver on its "low entry cost" claim.

      (5) Table 1 reports acquisition times, which is helpful. However, the multi-resolution approach adds essential processing steps that appear due to the strategy followed (e.g., "XY alignment high-res" and "high-res to low-res alignment"). Please include registration/alignment (and other major post-processing) runtimes and resource requirements, such as storage, in a comparable table so readers can assess true end-to-end cost.

      References:

      [1] Meirovitch, Y., et al. "SmartEM: machine learning-guided electron microscopy." Nature Methods (2025).

    3. Reviewer #2 (Public review):

      Summary:

      The paper proposes a workflow to accelerate EM connectomics by combining multi-scale imaging with image processing and analysis (image alignment, registration, neuron tracing, automated segmentation and synapse prediction, proof-reading) to derive a brain region connectome. The paper argues and (partially) demonstrates that this approach facilitates comparative connectomics.

      The data acquisition pipeline uses a well-established sample preparation protocol, uCT guided acquisition, and SBEM imaging at cellular and synaptic resolution.

      Data processing and analysis combine existing state-of-the-art components and focus on the alignment and complementary analysis of the two SBEM resolution levels. The paper applies the workflow to the central complex of six different insects and performs some preliminary analysis based on this (which is acceptable for a resource/tool).

      Disclaimer for the rest of the review: I am an expert in image analysis and segmentation, so I have mainly focused on these aspects as I am not qualified to analyze the details of image acquisition.

      Strengths:

      The paper addresses an important problem and promises an acceleration and democratization of comparable connectomics. The time savings of the imaging approach are well-motivated and derived. The methods used for image alignment, segmentation, synapse detection, and proofreading are state-of-the-art.

      Weaknesses:

      I see two major weaknesses in the paper:

      (1) The paper introduces the (approximate) equivalence of the projectome and connectome in the insect brain very prominently in the introduction and uses this as a central motivation for the multi-resolution image acquisition protocol. But - to me - it is unclear how this principle is really used in the analysis presented in the last results and if this assumption is evaluated at all. Specifically, Figure 4 a shows the anatomical neuron reconstructions (from cellular resolution SBEM), d-g show connectome-level analysis from the synaptic resolution data. The only link I can see between the two is that the neural processes in the synapse-resolution data can be mapped to the neurons from the cellular resolution data, thanks to the image alignment. This is certainly important, BUT it is only tangentially related to the projectome vs. connectome claim from the introduction. This claim implies that a tentative connectome is derived from projectome-level data (e.g. by assuming a uniform probability of synapse-formation given surface or distance between projections) that is then validated by the "true" connectome data from synaptic resolution. Instead, what is actually solved - to my understanding - is mapping the local connectome to the projectome. While related, these are different things and the current framing of the paper and the quite brief description of the section on comparative connectomics (also no corresponding Methods section) make this claim inadequately supported.

      (2) Reporting on segmentation and proofreading is purely qualitative. Given that this is claimed as a core contribution of the paper (e.g. statement in line 497 and following), I would expect substantially more reporting and evaluation of this claim:<br /> a) Report the actual time needed for proofreading the segmentations in CAVE. I could not find any numbers on this.<br /> b) Report the initial segmentation quality of the model: How many errors does it make? Note: There is a brief mention of VoI-based quantification in Methods (around line 1060), but the results are not reported.

      What should be done: Report the error rates (with an accurate measure such as skeleton VoI) independently for all 6 volumes. Given that the authors have the proofread versions, this is feasible. Only then can the claims be made here be evaluated. Note that the F1-score of synapse prediction is quantified. This is a good starting point, but could also be extended to further species in order to assess the actual transferability. Furthermore, none of the data from the study seems to be available. The training data of the network has to be made available. If possible, high-resolution data should be proofread too.

      Further points:

      (1) Why isn't reconstruction at the cellular level addressed with ML? This is surely possible and should be easier than the full connectome analysis. Similar to before, the actual times needed for tracing with CATMAID are not reported; the manuscript only states that this can be done in minutes for a neuron, but it's unclear if this is the best or average case. It would help to have quantitative numbers to assess whether automation would bring any benefits.

      (2) Finally, regarding the underlying software. I did not try this myself due to time constraints, but did check the repositories. They seem to be in an ok state with some documentation in a README. However, given the central role of the software contribution, I would expect a centralized doc page that explains how to use the different parts of the software, including a full example with sample data. Without this, application by other labs - a central claim - will be difficult.

    4. Author Response:

      Public Review:

      On behalf of all authors I would like to thank the reviewers for highly constructive and helpful comments, which, once addressed fully, will make the paper stronger and more useful as a tools and resources contribution.

      Besides addressing all minor issues that were pointed out by the reviewers, we see three main lines of changes we will need to pursue in order to address all major concerns. We plan to do all of these as fast as possible. Given that new alignments, segmentation and tracing is needed, this will take between one and three months.

      (1) Availability of code, software documentation and accessibility of pipeline. 

      Both reviewers and the editorial summary agreed that we need to improve the availability of our code, provide more instructions and examples of how to use the code, and make our methods more reusable to outsiders. To achieve this we will follow the suggestions made by the reviewers, in particular the list presented by reviewer 1 (point three of weaknesses in the public review).

      We firstly would like to apologize for the faulty link to the SegToPCG (https://github.com/Heinzelab/SegToPCG) repository (the correct name and link is: LSDtoPCG and https://github.com/Heinze-lab/LSDtoPCG) as well as the missing code in the https://github.com/Heinze-lab/synful_312 repository; these issues have already been fixed and will be included in an updated bioRxiv version.

      Second, we will generate an overarching umbrella page that will serve as a go-to site for any user who would like to implement our pipeline. To enable implementation, we will expand the documentation, provide detailed instructions, and include an example dataset with these instructions.

      (2) Quantification of analysis steps, including segmentation, alignment and manual tracing, to validate our claims of increased efficiency and transferability across species.

      As for point 1, both reviewers as well as the editorial summary highlighted the need for more comprehensive quantification of the workflow, especially with respect to segmentation quality as well as time investment into manual tracing and high resolution alignments. In particular, these data should validate the transferability of the segmentation models across species, and support the claims made about the time savings resulting from using our multiresolution workflow compared to a whole sample synaptic resolution approach.

      To this aim, we will generate all analyses according to the reviewer suggestions and incorporate the resulting data in new figures and tables. To make the data fully comparable across species, we will apply the latest version of our alignment and segmentation scripts to at least one high resolution data stack of each species, quantify manual tracing of a comparable, defined set of neurons in each species, and perform VOI analyses of each species segmentation against manually traced neurons in identically sized testing volumes in each dataset. Additionally, we will proof-read identical branches of homologous neurons in each species and quantify the required number of edits from raw segmentation output to completion.

      As the segmentation pipeline has evolved over the last years, a fair comparison between all datasets requires fresh analysis based on the latest version of our machine learning models (cannot be done with existing data) and will therefore take a few weeks of time.

      (3) Clarification of aims for multi-resolution pipeline and how projectomes and connectomes inform each other

      Reviewer 2 highlighted that there is not sufficient clarity about the aims of combining projectome and connectome. Judging from the reviewer comment, we might have inadvertently left the impression that we aimed at predicting a connectome from projectome data, by using spatial proximity of neurons as a proxy for connectivity. In fact, our data show that this is not possible, and that projection level data cannot predict connectivity. For instance, in the head direction system, the projectivity data suggests identical circuits for bees and flies (except at the edges of the ring), but connectivity data shows that the components of the ring attractor circuit are forming circuits that are distinctly different between the species (despite the same neurons with the same projection patterns being involved).

      What we aim to do is slightly different. We define global patterns of information flow using the projectome, and then define circuits in a part of this global circuit at synaptic level. Then, we extrapolate the global connectivity by assuming that the circuits identified in one or two computational units (columns) are repeated in each column. This rests on the assumption that the same neurons form the same connections in each repeated module, as long as the cellular repertoire is identical (verified by the projectome), but does not use proximity data to predict connectivity. This method thus only applies to brain regions that consist of repeated computational modules, i.e. where we can assume that knowing the connectivity in one of them allows extrapolation to the entire brain region. While this is a simplification, the Drosophila CX has in principle confirmed this assumption.

      We will generate a new figure in which we illustrate the process of combining local connectomes and global projectomes using examples from our data, but illustrating this schematically also for other brain regions, e.g. the insect optic lobe or the cerebral cortex of mammals. We will also carefully rewrite the relevant text passages to avoid misunderstandings.

      Overall, we would like to thank the reviewers again for their thorough and detailed comments, which will help to make our connectomics workflow more accessible and reproducible.

    1. eLife Assessment

      This manuscript demonstrates the feasibility and potential value of using functional MRI in awake, behaving mice, enabling assessment of distributed brain activity during ongoing behavior in a manner analogous to human fMRI. The valuable findings suggest that the periaqueductal gray (PAG), a midbrain structure classically linked to threat processing and aversive learning, also contributes to reversal learning. If supported, this result would carry theoretical and practical implications for our subfield by expanding the computational roles attributed to the PAG and motivating cross-species circuit-level investigations. However, the strength of evidence is, at present, incomplete, and several key claims are only partially supported by the current analyses.

    2. Reviewer #1 (Public review):

      Summary:

      The authors aimed to determine the neural networks involved in updating behaviour by training mice on a 'go / no go' odour discrimination task, and measuring their brain activity using functional MRI.

      Strengths:

      The use of the translationally relevant 'go / no go' task is a major strength, as this is a task that can be used as readily in humans as in animals such as mice. The use of fMRI in awake, behaving mice is also a major strength, as this allows the activation of multiple brain regions to be measured while behaviour is ongoing, and also facilitates comparison to human studies. The computational modelling approaches further support these translational aims, again being as readily applied to human data as to animal data.

      Weaknesses:

      The major weakness of the paper - and one that is potentially addressable - is that the key analysis of the paper, showing that the periaqueductal gray (PAG) is recruited for reversal learning, is only partially supported by the data presented in the paper as it stands. The authors have used a sophisticated way of analysing the behavioural data using 'signal detection theory', in which they collected behavioural data showing correct 'go' responses ('hits'), correct 'no go' responses ('correct rejections'), missed 'go' responses ('misses') and go responses when mice should have withheld a response ('false alarms'). The data presented showing a double dissociation in the activation of the nucleus accumbens for 'hits' but not 'correct rejections' and the PAG for 'correct rejections' but not 'hits' is very interesting; however, it is confounded by the fact that the nucleus accumbens may activate when the animal makes a response, and the PAG when the animal withholds a response. If the authors also included the analysis of nucleus accumbens and PAG activation for 'misses' and 'false alarms', this would allow them to determine whether the activation of these regions reflects the behavioural response or the expectation of reinforcement from the response.

      Thus, the paper includes very interesting data and is impressive in its approach to analysing behaviour in a manner that is highly translatable between species. The additional analyses would markedly strengthen the paper and would add depth to the finding that the PAG appears to be involved in behavioural flexibility.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors test the hypothesis that whole-brain functional magnetic resonance imaging in behaving mice, coupled with reinforcement-learning modeling, can dissociate neural substrates of initial cue-reward acquisition versus contingency reversal, and potentially reveal underappreciated contributors to cognitive flexibility. Using a head-fixed go/no-go odor discrimination task with subsequent rule reversal in a subset of mice, they model trial-by-trial state-action values with a model-free Q-learning algorithm (hierarchical Bayesian fit) and use the model-derived decision variable as a parametric regressor in whole-brain analyses. They report that acquisition-related signals prominently involve ventral and dorsal striatal regions, whereas reversal learning additionally recruits the periaqueductal gray (negative correlation with the decision variable) and shows an apparent double dissociation between nucleus accumbens and periaqueductal gray responses for hit versus correct-rejection outcomes during reversal.

      Strengths:

      (1) The reversal manipulation is implemented without explicit punishment, targeting suppression of previously rewarded actions under reward omission - an underexplored regime for midbrain contributions beyond canonical threat/pain framing.

      (2) The manuscript provides a credible MR-compatible olfactory/licking platform with synchronized sniff/lick/valve/reward timing and high-field imaging, supporting feasibility and broader utility for mesoscale systems neuroscience in rodents.

      (3) Trial-by-trial value estimates from a Q-learning variant are fit via hierarchical Bayesian inference and explicitly integrated into subject-level general linear models with a mouse hemodynamic response function, which is appropriate for leveraging within-subject dynamics in small-N rodent fMRI.

      (4) The decision-variable maps during acquisition recover expected basal ganglia involvement (including nucleus accumbens and dorsal striatum), providing face validity; the reversal-stage map yields an interpretable set of cortical/striatal/pallidal regions plus periaqueductal gray/hippocampus.

      (5) The finite impulse response analysis stratified by behavioral outcomes (hit, false alarm, correct rejection, miss) adds interpretability beyond the model regressor alone, and the reported crossover interaction between nucleus accumbens and periaqueductal gray is potentially impactful if robust.

      Weaknesses:

      (1) The core claim regarding selective periaqueductal gray engagement rests on a subset of n = 6 mice for reversal. With permutation-based whole-brain inference and very small cluster sizes, the robustness of the periaqueductal gray effect to reasonable analytic perturbations is not yet convincing. I would suggest providing leave-one-animal-out analyses for the periaqueductal gray cluster/ROI effects and reporting how often the key findings survive.

      (2) The authors note that due to temporal resolution and hemodynamics, they cannot separate stimulus, choice, and feedback and therefore model "whole trials." This limitation creates ambiguity about whether periaqueductal gray signals reflect value updating, action inhibition (no-lick), reward omission, autonomic arousal, or motor preparation/withholding, especially given the strong hit versus correct-rejection opponency. I would suggest adding targeted analyses that disambiguate "withholding" from "reversal-related updating".

      (3) ROIs are defined from the whole-brain decision-variable maps and then interrogated by outcome types; the manuscript acknowledges non-independence. This can inflate apparent dissociations. It would be better if the authors define ROIs independently (anatomical periaqueductal gray/nucleus accumbens masks, or split-half ROI definition with held-out data) and repeat the key ROI conclusions.

      (4) The reversal group is a subset of the acquisition cohort and also experiences a different task phase structure and additional sessions; the paper attempts to address exposure differences descriptively. I would suggest that the authors formally test whether periaqueductal gray effects are explained by session count, time-in-scanner, or learning rate differences (e.g., include these as covariates, or match sessions more strictly).

      (5) The platform records sniffing and licking, but the imaging models described include motion, global, and ventricle regressors and do not clearly include trialwise lick/sniff covariates. Given the periaqueductal gray's known autonomic and defensive coordination roles, physiological state confounding is a major concern. Could the authors incorporate sniff and lick metrics (and their derivatives) as nuisance regressors and show whether the periaqueductal gray effects persist?

    1. eLife Assessment

      This multi-omics study provides a comprehensive characterization of the context-dependent roles of the JAK-STAT pathway (JSP) across different cellular compartments within the breast cancer microenvironment. The authors present convincing evidence that high JSP activity paradoxically drives anti-tumor cytotoxicity in T cells but promotes malignancy and immunosuppression in tumor epithelial cells, leading to the fundamental discovery that broad JAK-STAT inhibition could be therapeutically counterproductive. Ultimately, the identification of the immune-related JSP score and the STAT4 axis as predictive biomarkers for anti-PD-1 immunotherapy response, particularly in triple-negative breast cancer, offers critical insights for precise patient stratification and targeted therapeutic interventions.

    2. Reviewer #1 (Public review):

      Summary:

      In their manuscript, Zhou and colleagues present a detailed look at how the JSP functions differently in the various cells of a breast tumor. The authors have effectively shown that the JSP acts as a double-edged sword, as it helps T cells fight cancer but also allows tumor cells to grow and avoid ferroptosis. These findings are important because they identify a useful biomarker to predict how TNBC patients might respond to PD-1 inhibitors.

      Strengths:

      This work is important because it provides a clear explanation for the conflicting roles of the JSP in the tumor environment. The evidence is solid, as it combines data from thousands of patients with single-cell analysis and lab experiments to confirm the role of STAT4 in cancer progression and immunity.

      Weaknesses:

      However, there are areas for improvement in the scope of the review, the depth of analysis, and the potential for broader clinical implications. The authors are encouraged to address these issues to enhance the scientific and clinical impact of the study.

      Major Issues:

      (1) The authors demonstrate that STAT4 upregulates SLC47A1, but this is currently supported only by expression correlation and western blot data. To confirm a direct link, the authors are encouraged to perform ChIP-qPCR or luciferase reporter assays to show that STAT4 binds directly to the SLC47A1 promoter.

      (2) The conclusion that the MIF-CD74 axis drives immunosuppression is based on computational inference. To support this, the authors could consider mining publicly available breast cancer spatial transcriptomics data to show the co-localization of MIF and CD74. Alternatively, performing simple dual-color immunofluorescence staining on a few clinical sections would effectively demonstrate the physical proximity of these cells.

      (3) TNBC is highly heterogeneous and includes subtypes like mesenchymal and immunomodulatory groups. The authors should analyze whether the JSP score or STAT4 levels vary significantly between these subtypes, as this could further refine the selection of patients for JAK1 inhibitors.

      (4) While the JSP score works well in the current datasets, the authors should consider validating its predictive accuracy in additional independent immunotherapy cohorts, such as the TONIC trial, to ensure the biomarker is robust across different treatment settings.

      Minor Issue:

      The manuscript mentions a U-shaped trajectory of JSP activity during tumor transition. A more detailed biological explanation of why the pathway activity initially drops and then rises would add depth to the discussion.

    3. Reviewer #2 (Public review):

      Summary:

      The JAK-STAT pathway (JSP) exhibits cell-type-specific functional heterogeneity in breast cancer. This study investigates the JSP in breast cancer and its response to anti-PD‑1 immunotherapy. JSP displays distinct cell‑type heterogeneity: it promotes malignant phenotypes and immunosuppression in tumor cells, while enhancing cytotoxicity and reducing exhaustion in T cells. Elevated JSP expression correlates with improved immunotherapy responses, especially in triple‑negative breast cancer. These findings highlight the paradoxical roles of JSP, indicating that broad inhibition may compromise anti‑tumor immunity.

      Strengths:

      The major strengths of this study include the comprehensive characterization of JSP heterogeneity across epithelial, tumor, and T cells in breast cancer. The identification of JSP and STAT4 as predictive biomarkers for immunotherapy response, particularly in triple‑negative breast cancer, provides clinically relevant insights for patient stratification.

      Weaknesses:

      The findings rely heavily on public dataset analyses.

    4. Reviewer #3 (Public review):

      Summary:

      This multi-omics study by Zhou et al elucidates the context-dependent roles of the Janus kinase-signal transducer and activator of transcription (JAK-STAT) pathway (JSP) across different cellular compartments in the breast cancer tumor microenvironment. While bulk JSP activity is associated with a favorable prognosis, single-cell analysis reveals a paradoxical landscape: high JSP in T cells drives anti-tumor cytotoxicity and reduces exhaustion, whereas high activity in tumor epithelial cells promotes malignancy and immunosuppression via the MIF-CD74 signaling axis. The JSP score (immune-related) serves as a robust predictive biomarker for response to anti-PD-1 immunotherapy, particularly in triple-negative breast cancer (TNBC). Furthermore, the study identifies the STAT4/SLC47A1 axis as a critical mechanism through which tumor cells resist ferroptosis, facilitating disease progression. These findings suggest that broad JAK-STAT inhibition may be counterproductive in cancer therapeutics; instead, therapeutic success depends on precise modulation and carefully timed interventions to preserve its T-cell-associated functions. This study may inspire future studies to explore specific factors that selectively modulate JAK-STAT activity in immune cells to achieve favorable therapeutic outcomes.

      Strengths:

      Significant therapeutic implications.

      Weaknesses:

      Limited molecular mechanisms.

    1. eLife Assessment

      It remains unclear how human antibody-secreting cells (ASCs) differentiate. In this study, the authors discovered a CD30⁺ intermediate subset that appears during the transition from B cells to ASCs, providing a potential ontogeny for extra-germinal center B cell differentiation. This study is useful because it identifies novel intermediate markers that enable tracking of human ASC ontogeny, offering new insights into ASC development. However, the evidence is incomplete, and we see three major limitations: (1) the data are largely representative, requiring additional reproducibility; (2) the bioinformatics analysis is limited; and (3) step-wise phenotypic validation would require lineage-tracing experiments on sorted populations.

    2. Reviewer #1 (Public review):

      Summary:

      Fields et al. investigated the heterogeneity and kinetics of human antibody secreting cell (ASC) differentiation by analyzing ex vivo tonsil samples and using in vitro differentiation modeling. They discovered that a CD30+ intermediate subset emerges in transition from B cell to ASC in both contexts, but not from germinal centers, and they identified cytokines that promote this state. They also identified an isoform of CD44, CD44v9, that is expressed on some ASCs.

      Strengths:

      The strengths are the novelty of the findings and the identification of two new markers that may be useful for tracking ASC heterogeneity.

      Weaknesses:

      However, some of this work seems preliminary and would need to be further validated. Some of the data presented was only representative, with limited controls and biological repeats, limiting the interpretation. For example, the role of Mef2c for CD30 expression was not robustly demonstrated. It was not clear if Figure 1 scRNAseq/ATACseq was from multiple donors or just one. Future studies may extend these novel findings and determine the functional relevance of these factors, CD30, and CD44v9 for ASC differentiation and physiology.

    3. Reviewer #2 (Public review):

      Summary:

      Bhattacharya and colleagues here use cell culture, single-cell RNA and ATACseq sequencing of such in vitro cultures and of ex vivo isolated B-lineage cells to infer an ontogeny for extra-germinal centre B cell differentiation. The manuscript presents a useful potential ontogeny for plasma cells, wherein in vitro cultured naïve human B cells enter a CD30+ intermediate state before moving in subsequent days through a CD44v9+ state before ultimately obtaining a 'mature' antibody-secreting plasma cell phenotype. Ex vivo isolated germinal centre B cells obtain the plasma cell state without expressing CD30 in their development. Phenotype analysis of tonsillar B-lineage cells supports the same phenotype conversion in vivo, although the intermediate cell population was smaller in vivo. The link to CD44v9 expression on developing plasma cells is inferred to be for extra-GC (T-independent) responses, but the data presented leave this equivocal, and the functional importance of developing via a CD30+CD44v9+ intermediate is not investigated.

      Strengths:

      The article presents a solid potential ontogeny for PC development, wherein some differentiating B cells acquire a CD30+ state, transition through a CD44v9+CD30+ state, then downmodulate CD30 before obtaining canonical CD38+ 'PC' status. A strength is the integration of in vitro cultured B cell results with tonsillar B-lineage cell data sets, and careful flow cytometry of the in vitro cultures over several days to infer lineage. The data provide reasonable support for the concept. CD30+ cells are shown to develop readily from naïve B cells in culture, but uncommonly from GC B cell cultures. A nice piece of data is Figure 6B, which shows reasonably strong correlative changes in phenotype through the assumed ontogeny, and this fits with the expected trajectory of maturation.

      Weaknesses:

      The most important weakness throughout is the non-absolute nature of the relationship. An example is seen in that the sorted ex vivo GC B cells also give rise to the 'extra-GC' phenotype of plasma cell, suggesting that while the profile is enriched, it is not absolute. There is a further weakness, as while cultures are run for several days, division-associated shifts in PC phenotype are not mapped; such would greatly strengthen the weight of the argument, and show conditional shifts in phenotype associated with division, an uncontrolled parameter in the mix. For example, for the MEF2C A388 inhibition experiments, it would be strong evidence of the pathway/process contributing if a by-division peak increase in CD30+ population was demonstrated in the early days of culture.

      There are some basic sort experiments performed (e.g. 3C-3F), which show that the CD30+ cells do give rise to PC preferentially, but what is missing is the step-wise phenotype shifts in these sorted populations, which should support the trajectory shown in Figure 3B and (the in vitro equivalent of) 6B. It would emphatically support the trajectory to show the cellular phenotypes on the PC with sorting based on CD30, CD44v9, CD27, and CD20 expression, and following outcome phenotypes 24-48 hours later, if the inferred maturation trajectory is true.

      There are also specific weaknesses with the bioinformatics, in that, while the analyses are likely appropriate, unpresented data is necessarily used to shape the argument. For example, Figure 1C shows bubble plots for two plasma cell sets, yet, of archetypal PC-expressed genes, only IRF4 is demonstrated to confirm they are true PC, and the gene is not universally expressed in cells in the clusters. For this figure, it would help to expand the bubble plot to show J-CHAIN, XBP-1, CIITA and PRDM1 or other appropriate PC demarcating molecules. Similarly, in Fig 2B, more evidence of a bifurcation in state is needed than that CD44v9 distinguishes PC1 from PC2 clusters-this is the stated conclusion, but 2A depicts that 50% of PC1 relatively weakly express CD44, while <25% of PC2 express it. Demonstrating additional molecules or genes distinguishing the clusters would improve veracity. Figure 2F shows clonal lineages, but it would be helpful to see somatic hypermutation burdens and learn if they differ between the demarcated subsets. I also find the pseudotime analyses of limited value, as some of the branches follow trajectories that are unrealistic biologically, so less weight should be placed on the pathways to which they do or do not point (i.e., the notion that GC B cells do or do not give rise to particular PC subsets).

      Statistically, some of the experiments are single wells from single donors, so there is a low level of confidence and no reproducibility demonstrated for some aspects of the study, which is a weakness.

      Paradoxical to the argument that it is the TI response process being modelled, it is presented that CpG stimulation, plus proxy T cell help (CD40L), drives the CD30+ phenotype best with the addition of the GC-associated cytokine IL-21. This should be carefully considered and discussed.

      Overall, in addition to presenting more contextual information from the bioinformatics, the best way to solidify the data set, in my vie,w would be to revisit the hypothesis with two additional experimental approaches: (1) to incorporate division tracing into the ontogeny studies and (2) to perform lineage tracing on sort-purified populations at different stages of the maturation process.

    1. eLife Assessment

      This important study offers insights into the anatomical and physiological features of cold-selective lamina I spinal projection neurons. The evidence supporting the authors' claims is convincing, although including a larger sample size and more quantification would have strengthened the study, and the claims of monosynaptic connectivity would benefit from further experimental evidence. The work will interest those in the field of somatosensory biology, especially researchers studying spinal cord dorsal horn circuits and projection neuron cell types.

    2. Reviewer #1 (Public review):

      Summary:

      Spinal projection neurons in the anterolateral tract transmit diverse somatosensory signals to the brain, including touch, temperature, itch, and pain. This group of spinal projection neurons is heterogeneous in their molecular identities, projection targets in the brain, and response properties. While most anterolateral tract projection neurons are multimodal (responding to more than one somatosensory modality), it has been shown that cold-selective projection neurons exist in lamina I of the spinal cord dorsal horn. Using a combination of anatomical and physiological approaches, the authors discovered that the cold-selective lamina I projection neurons are heavily innervated by Trpm8+ sensory neuron axons, with calb1+ spinal projection neurons primarily capturing these cold-selective lamina I projection neurons. These neurons project to specific brain targets, including the PBNrel and cPAG. This study adds to the ongoing effort in the field to identify and characterize spinal projection neuron subtypes, their physiology, and functions.

      Strengths:

      (1) The combination of anatomical and physiological analyses is powerful and offers a comprehensive understanding of the cold-selective lamina I projection neurons in the spinal cord dorsal horn. For example, the authors used detailed anatomical methods, including EM imaging of Trpm8+ axon terminals contacting the Phox2a+ lamina I projection neurons. Additionally, they recorded stimulus-evoked activity in Trpm8-recipient neurons, carefully selected by visual confirmation of tdTomato and GFP juxtaposition, which is technically challenging.

      (2) This study identifies, for the first time, a molecular marker (calb1) that labels cold-selective lamina I projection neurons. Although calb1+ projection neurons are not entirely specific to cold-selective neurons, using an intersectional strategy combined with other genes enriched in this ALS group or cold-induced FosTRAP may further enhance specificity in the future.

      (3) This study shows that cold-selective lamina I projection neurons specifically innervate certain brain targets of the anterolateral tract, including the NTS, PBNrel, and cPAG. This connectivity provides insights into the role of these neurons in cold sensation, which will be an exciting area for future research.

      Weaknesses:

      (1) The sample size for the ex vivo electrophysiology conducted on the calb1+ lamina I projection neurons (Figure 5) is limited to a total of six recorded neurons. Given the difficulty and complexity of the preparation, this is understandable. Notably, since approximately 87% of lamina I projection neurons heavily innervated by Trpm8+ terminals are calb1+, these six recordings of such neurons in Figure 4E could also be calb1+.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, the authors took advantage of a semi-intact ex vivo somatosensory preparation that includes hindlimb skin to characterize the response of projection neurons in the dorsal horn of the spinal cord to peripheral stimulation, including cold thermal stimuli. The main aim was to characterize the connectivity between peripheral afferents expressing the cold sensing receptor TRPM8 and a set of genetically tagged neurons of the anterolateral system (ALS). These ALS neurons expressed high levels of the calcium binding protein calbindin 1.

      In addition, combining different viral tracing methods, the authors could identify the anatomical targets of this specific subset of projection neurons within the brainstem and diencephalon.

      Strengths:

      The use of a relatively new (seldom used previously) transgenic line to label TRPM8-expressing afferents, combined with the genetic characterization of a previously identified subset of projections neurons add specificity to the characterization. The transgenic line appears to capture well the subpopulation of Trpm8-expressing neurons.

      In addition, the use of electron microscopy techniques makes the interpretation of the structural contacts more compelling

      The writing is clear and the presentation of findings follows a logical flow.

      Overall, this study provides solid, novel information about the brain circuits involved in cold thermosensation.

      Weaknesses:

      In the characterization of recorded neurons in close contact or in the absence of this contact with TRPM8 afferents, the number of recordedd neurons is relatively low. In addition, the strength of thermal stimuli is not very well controlled, preventing a more precise characterization of the connectivity.

      The authors acknowledge that, technically, this is a very difficult preparation with very low yield as far as obtaining successful recordings. Moreover, the tissue needs to be maintained at room temperature which is obviously not ideal when characterizing cold thermoreceptors due to the unavoidable effects of low temperature on cold-activated receptors.

    4. Reviewer #3 (Public review):

      Summary:

      Razlan and colleagues provide a detailed anatomical characterization of lamina I projection neurons in the mouse spinal cord that are densely innervated by primary afferents activated by cooling of the skin. The authors validate a Trpm8-Flp mouse line, show synaptic contacts between Trpm8⁺ boutons and projection neurons at the ultrastructural level, and demonstrate at the physiological level that these neurons specifically respond to cooling stimuli. Next, by taking advantage of previous transcriptomic analysis of ALS neurons, the authors identify calbindin as a marker for cold activatetd lamina I projection neurons and map their ascending projections to the rostral lateral parabrachial area, caudal periaqueductal gray, and ventral posterolateral thalamus, well-known thermosensory and thermoregulatory centers. Altogether, these findings provide strong anatomical and functional evidence for a direct line of transmission from Trpm8⁺ sensory afferents through Calb1⁺ lamina I neurons to key supraspinal centers controlling perception of cold and thermoregulatory responses.

      Strengths:

      The combination of mouse genetics, electron microscopy, ex-vivo physiology, optogenetics and viral tracing provides convincing evidence for a direct cold pathway. The work validates the Trpm8-Flp line by extensive anatomical and molecular characterization. Integration with previous transcriptomic and anatomical data, neatly links the cold-selective lamina I neurons to a molecularly defined cluster of ALS neurons, strengthening the bridge between molecular identity, anatomy, and physiological function.

      Weaknesses:

      The main limitation remains the relatively small number of neurons that could be recorded electrophysiologically. While understandable given the complexity of the preparation, this necessarily limits generalization.

    5. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) The sample size for the ex vivo electrophysiology is small. Given the difficulty and complexity of the preparation, this is understandable. However, a larger sample size would have strengthened the authors' conclusions.

      We appreciate that the sample size is small, but this was limited by the technical difficulty and relatively low yield with this preparation. From a total of 16 experiments, we were able to obtain successful recordings in 6 cases, and these provided the characterisation of the 11 cells reported in Figure 4. We believe that this is sufficient to “strongly suggest” that the cells with dense Trpm8 input correspond to cold-selective cells. We have toned down the statements in the abstract (line 23) and the Results section (line 246).

      (2) The authors used tdTomato expression to identify brain targets innervated by these coldselective lamina I projection neurons. Since tdTomato is a soluble fluorescent protein that fills the entire cell, using synaptophysin reporters (e.g., synaptophysin-GFP) would have been more convincing in revealing the synaptic targets of these projection neurons.

      As the Reviewer says, tdTomato labelling fills the entire cell. However, examination at high magnification reveals numerous varicosities along the labelled axons, presumably corresponding to synaptic boutons. We now illustrate this in Figure 6–figure supplement 2F.

      In addition, we have provided further evidence that these varicosities correspond to (glutamatergic) synaptic boutons by immunostaining sections through the LPB for the postsynaptic density protein Homer1, and showing Homer1 puncta apposed to varicosities (Figure 6–figure supplement 2 G,H). This new information now appears in the Results section (lines 374-380).

      (3) The summary cartoon shown in Figure 7 can be misleading because this study did not determine whether these cold - selective lamina I projection neurons have collateral branches to multiple brain targets or if there are anatomical subtypes that may project exclusively to specific targets. For example, a recent study (Ding et al., Neuron, 2025) demonstrated that there are PBN-projecting spinal neurons that do not project to other rostral brain areas. Furthermore, based on the authors' bulk labeling experiments, the three main brain targets are NTS, PBNrel, and cPAG. The VPL projection is very sparse and almost negligible.

      We agree that branches to different brain nuclei may originate from specific subsets of ALS3 neurons and this is now stated in the figure legend. It is true that there are projections to other brain regions (including NTS). These are not included in the diagram, because their circuitry in relation to cold-sensing is less well understood. Although the projection to VPL from lumbar cord is sparse, this is likely to be explained by the very low proportion of lamina I projection neurons with axons that reach the thalamus. Our retrograde tracing data (e.g. Figure 6-figure supplement 4) had already revealed many cells in the C7 segment that were densely coated with Trpm8 afferents and retrogradely labelled from the lateral thalamus. We have carried out additional experiments in which AAV1.Cre<sup>ON</sup>.td Tomato was injected into the cervical enlargement of Calb1<sup>Cre</sup> mice.This resulted in much denser labelling in the VPL and PoT thalamic nuclei, supporting the suggestion that cold-selective lamina I neurons in the cervical enlargement project to these nuclei. This is now described in lines 381-387 and illustrated in Figure 6–figure supplement 3.

      Reviewer #2 (Public review):

      (1) In the characterization of recorded neurons in close contact or in the absence of this contact with TRPM8 afferents, the number of recorded neurons is relatively low. In addition, the strength of thermal stimuli is not very well controlled, preventing a more precise characterization of the connectivity.

      We fully accept that the sample size is small (please see response to Reviewer 1 above). We also accept that the thermal stimulation was not that well controlled. Unfortunately, commercially available probes for controlling skin temperature are too large to apply to the skin in this preparation. For this reason, we have used application of hot and cold saline, as in our previous studies with this preparation.

      (2) The authors could provide some sense of the effort needed to record from the 6 coldactivated neurons described. How many preparations were needed, etc?

      We now state that 6 out of 16 experiments resulted in successful recordings for this part of the study (lines 858-861).

      Reviewer #3 (Public review):

      (1) While anatomical evidence for direct synaptic connectivity between Trpm8+ afferents and lamina I projection neurons is compelling, a physiological demonstration of strict monosynaptic transmission is not shown. The conclusion that these inputs are exclusively monosynaptic should be toned down. Similarly, the statement that "Lamina I ALS neurons that are surrounded by Trpm8 afferents are cold-selective" should also be toned down as only a few neurons have been tested and it cannot be excluded that other neurons with similar characteristics may be polymodal.

      We have now carried out optogenetic experiments by expressing channelrhodopsin in Trpm8 afferents and retrogradely labelling ALS neurons with tdTomato. This has allowed us to directly demonstrate monosynaptic input. This is described in the Results section (lines 180-202) and the Methods section has been updated. As noted above, we have toned down the statement about lamina I neurons surrounded by Trpm8 afferents being coldselective (line 246).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The patch innervation of Trpm8+ sensory neurons in lamina I of the spinal cord dorsal horn is interesting. Do they occupy specific areas within lamina I along the mediolateral axis, or are their placements random? Quantifying the distribution of these terminals in lamina I might be worthwhile.

      Although we have not studied the mediolateral distribution systematically, it appears that the locations of the patches in the mediolateral axis is random, and they could be seen in medial, central and lateral parts of lamina I (as shown in Figure 2). We have added a comment to this effect in the Results section (lines 114-116). Quantifying Trpm8 terminals would be very labour-intensive, and we do not feel that this would be of great benefit.

      (2) Quantification for the percentage of Trpm8+ boutons contacting Phox2a+ neurons that are vGlut3+

      The main purpose of this part of the study was to provide a possible explanation for the finding by Li et al (2015) that some lamina I cells were associated with Vglut3-

      immunoreactive boutons. We found that the percentages of Trpm8+ boutons that contained Vglut3 varied considerably from cell to cell, and this is now stated in the text (lines 133134). However, knowing exact proportions was not an important aspect of the study, we have therefore not carried out a detailed analysis.

      (3) Quantification for the percentage of PBN projections neurons densely innervated by Trpm8+ axons that are calb1+.

      As requested, we have carried out immunohistochemistry to determine the proportion of lamina I ALS neurons with dense Trpm8 input that are calbindin-immunoreactive. We examined 31 neurons from 3 different mice and found that all but 4 (i.e. 87%) were immunoreactive. This is now described (lines 287-293) and illustrated (Figure 5–figure supplement 1). We have now put the electrophysiological characterisation that was in this figure into a separate supplement (Figure 5–figure supplement 2).

      (4) It might be helpful to confirm the brain projection targets of Cal1b+ lamina 1 projection neurons using AAV1-CreON-Synaptophysin-GFP (or other fluorescent proteins) injections

      Please see our response to Public review Reviewer 1 comment 2 above. We have provided further evidence that the brain regions that received input from the Calb1+ cells contain axonal boutons (lines 374-380 and Figure 6–figure supplement 2F-H).

      (5) Figure 6 - Figure Supplements 3 and 4 are duplicated

      We apologise for this duplication, which was made in error in the version originally submitted to eLife. This has now been corrected.

      Reviewer #2 (Recommendations for the authors):

      (1) As mentioned, in the characterization of recorded neurons in close contact or in the absence of this contact with TRPM8 afferents, the number of recorded neurons is relatively low, some recorded in current clamp, a few in voltage clamp. This prevents any solid statistical evaluation of the findings

      Please see response to response to the first point made by Reviewer 1 in the Public reviews. As stated above, we have toned down the statement about the relationship between cells with dense Trpm8 input and cold-selective cells (line 246).

      (2) In addition, the strength of thermal stimuli is not very well controlled, preventing a more precise characterization of the synaptic connection between afferents and ALS projection neurons.

      Please see our response to the Public review comment made by this Reviewer.

      (3) Line 35. In the description of the anterolateral system and the effects of lesions, the species(s) should be specified since rodents and humans have a different anatomical distribution of spinal tracts.

      We now state that while ALS axons ascend in the anterolateral quadrant in humans, they are located in the dorsolateral white matter in rodents (lines 40-42)

      (4) To describe the semi-intact preparation used for recording and stimulation from the periphery, the authors cite a study by Julien Allard (reference 25). However, that study describes an in vivo preparation. I believe there is an error in the citation.

      We thank the Reviewer for pointing this out – it has now been corrected.

      (5) Line 726. Dorsal horn recordings were performed at 25 ºC. What is the temperature of the skin? How would this low temperature affect the excitability of cold afferents and their axons? Perhaps a comment about this issue would be appropriate.

      The skin temperature in this preparation is the same as that of the spinal cord (25 °C). At this temperature, Trpm8 afferents would be active, but are likely to have adapted during the course of the experiment. Since this temperature is below 37 °C, it is likely that the conduction velocity of these afferents will be slower than in the in vivo situation. We have added a comment to this effect (lines 818-821).

      (6) Line 401. The authors could not detect Trpv1-immunoreactivity in the central terminals of Trpm8Flp;RCE:FRT mice. Could they detect Trpv1 immunoreactivity in any central terminal? Do they have positive evidence that their immunostaining worked?

      Trpv1 was readily detected in central terminals with the Trpv1 antibody. An example showing lack of detectable Trpv1-immunoreactivity in GFP-labelled (Trpm8-expressing) afferents is now shown in Figure 2–figure supplement 1K-M.

      (7) Line 437. What is the expected anterograde transport time for YFP from the lumbar cord to the brainstem? Are 2-3 weeks not sufficient based on the literature? I noticed the authors are using longer survival times after intraspinal injections

      In preliminary experiments for a previous study Substance P-expressing excitatory interneurons in the mouse superficial dorsal horn provide a propriospinal input to the lateral spinal nucleus | Brain Structure and Function we had found that a 2 week survival time after injection of AAV1.Cre<sup>ON</sup>.GFP into the lumbar spinal cord of Tac1<sup>Cre</sup> mice was not sufficient to label axons in the brain, although at 4 weeks we saw brain labelling. We have also found that extending survival times from 4 to 6 weeks gives greatly improved labelling, especially in the thalamus.

      (8) Figure 5A. Many of the labelled cells appear to have the somas in the white matter, which makes little sense. It seems the reference section to plot the cells is not optimal

      The placement of cells is accurate. Many spinal projection neurons are present outside the main region of grey matter (i.e. laminae I-X). These cells are found in 2 main regions – the lateral spinal nucleus (LSN) and the lateral reticulated part of lamina V. These two regions are intermediate between grey and white matter – i.e. they contain scattered cell bodies amongst a dense collection of axons. For this reason they appear outside the grey/white border as it is conventionally shown on diagrams of this type. This has been reported in numerous studies, e.g. see Figure 2 in The cells of origin of the spinothalamic tract of the rat: a quantitative reexamination - PubMed.

      (9) Recent transcriptomic studies suggest the presence of more than one subpopulation of Trpm8-expressing DRG or trigeminal neurons. It is unclear to what extent the Trpm8-Flp line is capturing this diversity.

      We are aware that there are at least 3 transcriptomic subsets of Trpm8-expressing primary sensory neurons. However, we are not aware of any suitable molecular markers that would allow us to discriminate between them, and therefore address this point.

      (10) Could the patchy distribution of Trpm8 afferents in lamina I reflect incomplete recombination; the empty spaces could be occupied by unmarked afferents?

      In theory it could, but this seems unlikely. The Trpm8<sup>Flp</sup> line (crossed with RCE:FRT) captures ~83% of Trpm8-positive cell bodies, and it seems very unlikely that the remaining 17% of Trpm8-expressing afferents would fill the spaces between GFP bundles that we see in lamina I. This is now stated in the Results section (lines 116-120).

      Reviewer #3 (Recommendations for the authors):

      (1) It would be a nice addition to the validation of the Trpm8-Flp line to specify what ages (if multiple) have been analysed and whether there are any differences. In addition, is labelling different at different levels of the spinal cord, and is there any labeling in supraspinal regions?

      The tissue used for this part of the study was obtained from mice aged 5-9 weeks and this is now stated (lines 78-79). We did not observe any differences with age, but we did not look at this in detail. Labelling was similar at different levels of the spinal cord, and this is stated (lines 108-109). We have added a brief account of the distribution of GFP labelling in the brain (lines 140-144).

      (2) Line 169. It is not clear how ALS neurons are labeled. It is explained in the material and methods (I believe it is AAV9.mCherry into the LPB or CVLM). Although I could not find a mention of a tdTomato AAV, maybe I missed it. In any case, it would be great to have the experimental strategy briefly explained in the text. For the same reason, I would recommend moving Figure 4 Supplement 1A and 1B schematics to the main figure, very helpful for understanding the experiment.

      We thank the Reviewer for this suggestion. We now explain in the Results section how the ALS neurons were labelled (lines 209-212), and as the Reviewer recommends we have put the schematic diagrams from Figure 4–figure supplement 1 into the main Figure. As noted in the text, the tdTomato labelling resulted from injection of an AAV coding for Cre into mice that contained the Ai9 allele. We have also updated the descriptions of brain injections in the Methods section to cover the new experiments (optogenetics, and calbindin immunohistochemistry).

      (3) Line 184. "Figure 4" would be good to specify the panels; I believe it should be 4A-C. Same for line 194, 4D-F?

      We apologise that this was omitted from the original version – we have now specified the panels.

      (4) Line 179. It would be great to specifiy in the text and figures the temperature used for hot and warm water. In addition, would the responses be different using different temperatures? Can you test ramps? These would go a great way to compare with responses shown in vivo by Ran and colleagues.

      We now specify the hot and cold saline temperatures used to stimulate the skin in the semiintact preparation in the legend for Figure 4 and in the Results section (lines 222-223). As noted above, it is difficult to use more accurate thermal stimuli in this preparation. Please see response to Reviewer 2 public comment 1.

      (5) Figure 4-Figure supplement 1F. It looks like these are very slow responses (1 sec?) for monosynaptic connectivity.

      In this figure (now part 1D) the action potential frequency was determined from counts of APs in 1 sec bins, and this is now stated in the legend. This might have given the impression of slow responses.

      (6) Line 203. I would tone down the statement, as only 6 cells "that were clearly associated with numerous GFP-labelled afferents" have been tested. Thus, it cannot be excluded that other cells with similar anatomical characteristics may also respond to other stimuli

      As requested, we have toned down this statement (line 246).

      (7) Line 230. Here AAV11.CreON.td Tomato is used, in previous retrograde experiments, AAV9 has been used (Figure 4), why the switch to 11? Is the tropism the same? Is it possible that because you are using a different serotype, you are targeting different neurons?

      We have found that although AAV9 coding for fluorescent proteins is very good for retrograde labelling, AAV9 coding for Cre-dependent constructs (e.g. AAV.Cre<sup>ON</sup>.tdTomato) gives very poor recombination in spinal projection neurons, for reasons that we do not understand. We recently became aware of the AAV11 serotype, which was recommended as being suitable for retrograde transport AAV11 enables efficient retrograde targeting of projection neurons and enhances astrocyte-directed transduction | Nature Communications. We have found that this works very well for labelling ALS cells throughout the spinal cord when using Cre-dependent constructs. We have added a reference to this paper at this point in the text. We are not able to say whether tropism is the same or different, but in each case many ALS neurons (including many of those in lamina I) are captured.

      (8) Line 234. Is there any positional organization for the "tdTomato-labelled cells densely innervated byTrpm8 afferents", do they preferentially cluster in some position of lamina I?

      These cells are found throughout the mediolateral extent of the dorsal horn, and this is now stated (lines 279-280).

      (9) Line 237. The actual number of cells/mm would be informative.

      This would be difficult to estimate, as the sections were cut in the horizontal plane, which means that lamina I can appear on a variable number of sections.

      (10) Line 249. From the figures, the action potentials of the Calb+ neurons seem to have a delayed onset (at the end of cold saline treatment, Figure 5, Supplement 1l) compared to lamina I ALS neurons recorded in Figure 4, Supplement 1f. If real, it is an interesting difference in the time-course of response that could indicate different coding properties e.g., response to cooling (general neurons) vs. response to absolute temperature (calb + neurons).

      As for Fig 4-figure supplement 4 (see response to point #5 above), action potential frequency was determined from APs counted in 1 sec bins, and this is now stated in the legend.

      (11) Figure 7. In the model, the disynaptic pathway should also be shown

      We have added a comment to the legend stating that there may also be indirect (“polysynaptic”) input from Trpm8 afferents to ALS3 neurons.

    1. eLife Assessment

      This study offers valuable insights into the anatomical and physiological features of cold-selective lamina I spinal projection neurons. The evidence supporting the authors' claims is convincing, although including a larger sample size and more quantification would have strengthened the study further, and the claims of monosynaptic connectivity would benefit from being stated more cautiously. The work will interest those in the field of somatosensory biology, especially researchers studying spinal cord dorsal horn circuits and projection neuron cell types.

    2. Reviewer #1 (Public review):

      Summary:

      Spinal projection neurons in the anterolateral tract transmit diverse somatosensory signals to the brain, including touch, temperature, itch, and pain. This group of spinal projection neurons is heterogeneous in their molecular identities, projection targets in the brain, and response properties. While most anterolateral tract projection neurons are multimodal (responding to more than one somatosensory modality), it has been shown that cold-selective projection neurons exist in lamina I of the spinal cord dorsal horn. Using a combination of anatomical and physiological approaches, the authors discovered that the cold-selective lamina I projection neurons are heavily innervated by Trpm8+ sensory neuron axons, with calb1+ spinal projection neurons primarily capturing these cold-selective lamina I projection neurons. These neurons project to specific brain targets, including the PBNrel and cPAG. This study adds to the ongoing effort in the field to identify and characterize spinal projection neuron subtypes, their physiology, and functions.

      Strengths:

      (1) The combination of anatomical and physiological analyses is powerful and offers a comprehensive understanding of the cold-selective lamina I projection neurons in the spinal cord dorsal horn. For example, the authors used detailed anatomical methods, including EM imaging of Trpm8+ axon terminals contacting the Phox2a+ lamina I projection neurons. Additionally, they recorded stimulus-evoked activity in Trpm8-recipient neurons, carefully selected by visual confirmation of tdTomato and GFP juxtaposition, which is technically challenging.

      (2) This study identifies, for the first time, a molecular marker (calb1) that labels cold-selective lamina I projection neurons. Although calb1+ projection neurons are not entirely specific to cold-selective neurons, using an intersectional strategy combined with other genes enriched in this ALS group or cold-induced FosTRAP may further enhance specificity in the future.

      (3) This study shows that cold-selective lamina I projection neurons specifically innervate certain brain targets of the anterolateral tract, including the NTS, PBNrel, and cPAG. This connectivity provides insights into the role of these neurons in cold sensation, which will be an exciting area for future research.

      Weaknesses:

      (1) The sample size for the ex vivo electrophysiology is small. Given the difficulty and complexity of the preparation, this is understandable. However, a larger sample size would have strengthened the authors' conclusions.

      (2) The authors used tdTomato expression to identify brain targets innervated by these cold-selective lamina I projection neurons. Since tdTomato is a soluble fluorescent protein that fills the entire cell, using synaptophysin reporters (e.g., synaptophysin-GFP) would have been more convincing in revealing the synaptic targets of these projection neurons.

      (3) The summary cartoon shown in Figure 7 can be misleading because this study did not determine whether these cold-selective lamina I projection neurons have collateral branches to multiple brain targets or if there are anatomical subtypes that may project exclusively to specific targets. For example, a recent study (Ding et al., Neuron, 2025) demonstrated that there are PBN-projecting spinal neurons that do not project to other rostral brain areas. Furthermore, based on the authors' bulk labeling experiments, the three main brain targets are NTS, PBNrel, and cPAG. The VPL projection is very sparse and almost negligible.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, the authors took advantage of a semi-intact ex vivo somatosensory preparation that includes hindlimb skin to characterize the response of projection neurons in the dorsal horn of the spinal cord to peripheral stimulation, including cold thermal stimuli. The main aim was to characterize the connectivity between peripheral afferents expressing the cold-sensing receptor TRPM8 and a set of genetically tagged neurons of the anterolateral system (ALS). These ALS neurons expressed high levels of the calcium-binding protein calbindin 1.

      In addition, combining different viral tracing methods, the authors could identify the anatomical targets of this specific subset of projection neurons within the brainstem and diencephalon.

      Strengths:

      The use of a relatively new (seldom used previously) transgenic line to label TRPM8-expressing afferents, combined with the genetic characterization of a previously identified subset of projection neurons, adds a specificity to the characterization. The transgenic line appears to capture well the subpopulation of Trpm8-expressing neurons

      In addition, the use of electron microscopy techniques makes the interpretation of the structural contacts more compelling.

      The writing is clear, and the presentation of findings follows a logical flow.

      Overall, this study provides solid, novel information about the brain circuits involved in cold thermosensation.

      Weaknesses:

      In the characterization of recorded neurons in close contact or in the absence of this contact with TRPM8 afferents, the number of recorded neurons is relatively low. In addition, the strength of thermal stimuli is not very well controlled, preventing a more precise characterization of the connectivity.

      The authors could provide some sense of the effort needed to record from the 6 cold-activated neurons described. How many preparations were needed, etc?

    4. Reviewer #3 (Public review):

      Summary:

      Razlan and colleagues provide a detailed anatomical characterization of lamina I projection neurons in the mouse spinal cord that are densely innervated by primary afferents activated by cooling of the skin. The authors, building on their previous anatomical work, validate a Trpm8-Flp mouse line, show synaptic contacts between Trpm8⁺ boutons and projection neurons at the ultrastructural level, and demonstrate at the physiological level that these neurons specifically respond to cooling stimuli. Next, by taking advantage of their previous transcriptomic analysis of ALS neurons, they identify calbindin as a marker for cold-activated lamina I projection neurons and map their ascending projections to the rostral lateral parabrachial area, caudal periaqueductal gray, and ventral posterolateral thalamus, well-known thermosensory and thermoregulatory centers. Altogether, these findings provide strong anatomical and functional evidence for a direct line of transmission from Trpm8⁺ sensory afferents through Calb1⁺ lamina I neurons to key supraspinal centers controlling perception of cold and thermoregulatory responses.

      Strengths:

      The combination of mouse genetics, electron microscopy, ex vivo physiology, and viral tracing provides convincing evidence for a direct cold pathway. The work validates the Trpm8-Flp line by extensive anatomical and molecular characterization. Integration with previous transcriptomic and anatomical data neatly links the cold-selective lamina I neurons to a molecularly defined cluster of ALS neurons, strengthening the bridge between molecular identity, anatomy, and physiological function.

      Weaknesses:

      While anatomical evidence for direct synaptic connectivity between Trpm8+ afferents and lamina I projection neurons is compelling, a physiological demonstration of strict monosynaptic transmission is not shown. The conclusion that these inputs are exclusively monosynaptic should be toned down. Similarly, the statement that "Lamina I ALS neurons that are surrounded by Trpm8 afferents are cold-selective" should also be toned down as only a few neurons have been tested and it cannot be excluded that other neurons with similar characteristics may be polymodal.

    1. eLife Assessment

      This study presents data suggesting that excitatory cholecystokinin (CCK)-expressing neurons in hippocampal area CA3 influence hippocampal-dependent memory using multiple methods to manipulate excitatory CCK-expressing CA3 neurons. The study is valuable, particularly considering that most past studies of CCK-expressing neurons have focused on those neurons that co-express CCK and GABA. Currently, the strength of evidence is incomplete, but it would improve if evidence of specificity was provided and other concerns were addressed. If this is not possible, the conclusions, particularly those requiring evidence of specific targeting of excitatory neurons, should be modified accordingly.

    2. Reviewer #1 (Public review):

      Summary:

      CCK is the most abundant neuropeptide in the brain, and many studies have investigated the role of CCK and inhibitory CCK interneurons in modulating neural circuits, especially in the hippocampus. The manuscript presents interesting questions regarding the role of excitatory CCK+ neurons in the hippocampus, which has been much less studied compared to the well-known roles of inhibitory CCK neurons in regulating network function. The authors adopt several methods including transgenic mice and viruses, optogenetics, chemogenetics, RNAi, and behavioral tasks to explore these less-studied roles of excitatory CCK neurons in CA3. They find that the excitatory CCK neurons are involved in hippocampal-dependent tasks such as spatial learning and memory formation, and that CCK-knockdown impairs these tasks.

      However, these questions are very dependent on ensuring that the study is properly targeting excitatory CCK neurons (and thus their specific contributions to behavior).

      There needs to be much more characterization of the CCK transgenic mice and viruses to confirm the targeting. Without this, it is unclear whether the study is looking at excitatory CCK neurons or a more general heterogeneous CCK neuron population.

      Strengths:

      This field has focused mainly on inhibitory CCK+ interneurons and their role in network function and activity, and thus this manuscript raises interesting questions regarding the role of excitatory CCK+ neurons, which have been much less studied.

      Weaknesses:

      (1a) This manuscript is dependent on ensuring that the study is indeed investigating the role of excitatory CCK-expressing neurons themselves and their specific contribution to behavior. There needs to be much more characterization of the CCK-expressing mice (crossed with Ai14 or transduced with various viruses) to confirm the excitatory-cell targeting. Without this, it is unclear whether the study is looking at excitatory CCK neurons or a more general heterogeneous CCK neuron population.

      (2) The methods and figure legends are still extremely sparse, still leading to many questions regarding methodology and accuracy. More details would be useful in evaluating the tools and data, and the lack of proper quantification is still prevalent throughout the paper. In many places, only % values are noted, or only images are presented, and the number of cells counted is almost never reported.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, the authors have demonstrated, through a comprehensive approach combining electrophysiology, chemogenetics, fiber photometry, RNA interference, and multiple behavioral tasks, the necessity of projections from CCK+ CAMKIIergic neurons in the hippocampal CA3 region to the CA1 region for regulating spatial memory in mice. Specifically, authors have shown that CA3-CCK CAMKIIergic neurons are selectively activated by novel locations during a spatial memory task. Furthermore, authors have identified the CA3-CA1 pathway as crucial for this spatial working memory function, thereby suggesting a pivotal role for CA3 excitatory CCK neurons in influencing CA1 LTP. The data presented appear to be well-organized and comprehensive.

      Strengths:

      (1) This work combined various methods to validate the excitatory CCK neurons in the CA3 area; these data are convincing and solid.

      (2) This study demonstrated that the CA3-CCK CAMKIIergic neurons are involved in the spatial memory tasks; these are interesting findings, which suggest that these neurons are important targets for manipulating the memory-related diseases.

      (3) This manuscript also measured the endogenous CCK from the CA3-CCK CAMKIIergic neurons; this means that CCK can be released under certain conditions.

      Weaknesses:

      In summary, this work can be formally accepted after the revision. For the limitations of the revision, the distinct neural effects of cholecystokinin (CCK) receptors (CCK-1R, CCK-2R, and CCK-3R) on hippocampal function have not been fully elucidated. Recent studies indicate that CCK-2R can modulate hippocampal activity at CA3-Schaffer collateral synapses; however, the roles of CCK-1R and CCK-3R in hippocampal function remain poorly characterized, with limited experimental evidence supporting their involvement. Overall, this study provides an interesting and novel perspective on the role of excitatory CCK signaling in hippocampus-dependent navigation learning.

    4. Reviewer #3 (Public review):

      Summary:

      Fengwen Huang et al. used multiple neuroscience techniques (transgenetic mouse, immunochemistry, bulk calcium recording, neural sensor, hippocampal-dependent task, optogenetics, chemogenetics, and interfer RNA technique) to elucidate the role of the excitatory cholecystokinin-positive pyramidal neurons in the hippocampus in regulating the hippocampal functions, including navigation and neuroplasticity.

      Strengths:

      (i) The authors provided the distribution profiles of excitatory cholecystokinin in the dorsal hippocampus via the transgenetic mice (Ai14::CCK Cre mice), immunochemistry, and retrograde AAV.

      (ii) The authors used the neural sensor and light stimulation to monitor the CCK release from the CA3 area, indicating that CCK can be secreted by activation of the excitatory CCK neurons.

      (iii) The authors showed that the activity of the excitatory CCK neurons in CA3 is necessary for navigation learning

      (iv) The authors demonstrated that inhibition of the excitatory CCK neurons and knockdown of the CCK gene expression in CA3 impaired the navigation learning and the neuroplasticity of CA3-CA1 projections.

      Weaknesses:

      (i) The causal relationship between navigation learning and CCK secretion remains nebulous; answering this question will require a more sensitive CCK-BR sensor in future work.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      CCK is the most abundant neuropeptide in the brain, and many studies have investigated the role of CCK and inhibitory CCK interneurons in modulating neural circuits, especially in the hippocampus. The manuscript presents interesting questions regarding the role of excitatory CCK+ neurons in the hippocampus, which has been much less studied compared to the well-known roles of inhibitory CCK neurons in regulating network function. The authors adopt several methods, including transgenic mice and viruses, optogenetics, chemogenetics, RNAi, and behavioral tasks to explore these less-studied roles of excitatory CCK neurons in CA3. They find that the excitatory CCK neurons are involved in hippocampal-dependent tasks such as spatial learning and memory formation, and that CCK-knockdown impairs these tasks.

      However, these questions are very dependent on ensuring that the study is properly targeting excitatory CCK neurons (and thus their specific contributions to behavior). There needs to be much more characterization of the CCK transgenic mice and viruses to confirm the targeting. Without this, it is unclear whether the study is looking at excitatory CCK neurons or a more general heterogeneous CCK neuron population.

      Strengths:

      This field has focused mainly on inhibitory CCK+ interneurons and their role in network function and activity, and thus, this manuscript raises interesting questions regarding the role of excitatory CCK+ neurons, which have been much less studied.

      Weaknesses:

      (1a) This manuscript is dependent on ensuring that the study is indeed investigating the role of excitatory CCK-expressing neurons themselves and their specific contribution to behavior. There needs to be much more characterization of the CCK-expressing mice (crossed with Ai14 or transduced with various viruses) to confirm the excitatory-cell targeting. Without this, it is unclear whether the study is looking at excitatory CCK neurons or a more general heterogeneous CCK neuron population.

      Thank you for this constructive comment. Indeed, the current study lacks comprehensive strategies to unequivocally distinguish excitatory CCK neurons from heterogeneous CCK neuronal populations. Nevertheless, we provide multiple lines of evidence supporting the distribution of CaMKIIα/Vglut1-expressing CCK<sup>+</sup> neurons in the hippocampus (Figure 1F), using complementary approaches including transgenic mouse models as well as viral and antibody-based labeling (Figure 1A, Figure 1H-I). In addition, we demonstrate that 635 nm light reliably evokes field excitatory postsynaptic potentials (fEPSPs) at CA3-Schaffer collateral synapses expressing DIO-CaMKIIα-ChrimsonR in vitro (Figure 2A-F). Importantly, these light-evoked excitatory synaptic responses are abolished by AMPA and NMDA receptor antagonists (CNQX and APV), confirming the excitatory nature of the DIO-CaMKIIα-ChrimsonR-expressing synapses. To demonstrate the future works that can further support our findings and conclusions, we have added the strategies that can be conducted in the Discussion section in the revision:

      “Due to technical limitations at the current stage, we were unable to perform whole-cell recordings or pharmacological manipulations using CCK receptor antagonists. In future studies, the application of these approaches to directly record and selectively block EPSPs from excitatory CCK neurons in the hippocampus will further strengthen and validate our conclusions.” (Line 265 - line 269 in the revision).

      (1b) For the experiments that use a virus with the CCK-IRES-Cre mouse, there is no information or characterization on how well the virus targets excitatory CCK-expressing neurons. (Additionally, it has been reported that with CaMKIIa-driven protein expression, using viruses, can be seen in both pyramidal and inhibitory cells.

      We thank the reviewer for this insightful comment regarding the specificity of viral targeting in CCK-IRES-Cre mice.

      To address this concern, we performed additional characterization of viral expression in CA3. We found that DIO-CaMKIIα-mCherry expression showed a high degree of colocalization with CaMKIIα immunoreactivity, indicating preferential targeting of excitatory neurons (sFigure 1A-B; sFigure 2A-B; sFigure 3A-B). We showed an example to confirmed the high specificity of the AAV for infecting the excitatory CCK neurons in CA3 area.

      Besides, we acknowledge prior reports showing that CaMKIIα-driven viral expression can, in some cases, be detected in a small subset of inhibitory neurons. However, because CA3-Schaffer collateral projections to CA1 arise exclusively from excitatory CA3 pyramidal neurons, any potential expression in inhibitory CCK<sup>+</sup> interneurons are unlikely to directly contribute to the recorded CA1 synaptic responses in our electrophysiological experiments. That said, we cannot fully exclude the possibility that a minor population of inhibitory CCK⁺ neurons could indirectly modulate CA3 pyramidal neuron activity via local circuit mechanisms, particularly in experiments involving optogenetic manipulation or shRNA expression. We now explicitly acknowledge this limitation in the revised manuscript:

      “Importantly, to further improve cell-type specificity, we propose an intersectional genetic strategy using CCK-IRES-Cre × VGlut1-Flp mice combined with a Cre-On/Flp-On (Con/Fon) AAV, which would restrict expression exclusively to excitatory CCK-expressing neurons and eliminate potential contributions from inhibitory CCK<sup>+</sup> cells. This approach will be implemented in future studies to refine circuit specificity.” (Line 269 - line 273 in the revision).

      (2) The methods and figure legends are extremely sparse, leading to many questions regarding methodology and accuracy. More details would be useful in evaluating the tools and data. More details would be useful in evaluating the tools and data. Additionally, further quantification would be useful-e.g. in some places, only % values are noted, or only images are presented.

      Thank you for these constructive comments. We have expanded the methodological descriptions in both the Methods section and the figure legends to provide sufficient detail for evaluating the experimental tools and data accuracy. In addition, we have added quantitative analyses where previously only representative images or percentage values were shown. Specifically, quantification has now been included for each AAV condition in the corresponding figures in the revised manuscript.

      (3) It is unclear whether the reduced CCK expression is correlated, or directly causing the impairments in hippocampal function. Does the CCK-shRNA have any additional detrimental effects besides affecting CCK-expression (e.g., is the CCK-shRNA also affecting some other essential (but not CCK-related) aspect of the neuron itself?)? Is there any histology comparison between the shRNA and the scrambled shRNA?

      Recent studies from our lab demonstrated that knockout the CCK gene expression significantly attenuates the hippocampal-dependent spatial learning and CA3-CA1 LTP, indicating CCK plays a critical role in modulating the hippocampal functions[1,2]. Additionally, CCK-shRNA or CCK-scramble did not significantly affect the excitatory synaptic transmission in the CA3-CA1 projections, hinting that CCK-shRNA may exhibits no obvious adverse effect on other neural components.

      Finally, we have provided the histology comparison between the shRNA and the scrambled shRNA regrading the expression level of the CCK protein (Pro-CCK) in the revision. Our result shows that CCK-shRNA (left panel) significantly reduced CCK expression in CA3<sup>CCK</sup>-positive neurons compared with the CCK-Scramble group (right panel).

      Citation:

      (1) Wang, J. L., Sha, X. Y., Shao, Y., Zhang, Z. H., Huang, S. M., Lin, H., ... & Sun, J. P. (2025). Elucidating pathway-selective biased CCKBR agonism for Alzheimer’s disease treatment. Cell.

      (2) Zhang, N., Sui, Y., Jendrichovsky, P., Feng, H., Shi, H., Zhang, X., ... & He, J. (2024). Cholecystokinin B receptor agonists alleviates anterograde amnesia in cholecystokinin-deficient and aged Alzheimer's disease mice. Alzheimer's research & therapy, 16(1), 109.

      https://doi.org/10.7554/eLife.109001.1.sa2

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors have demonstrated, through a comprehensive approach combining electrophysiology, chemogenetics, fiber photometry, RNA interference, and multiple behavioral tasks, the necessity of projections from CCK+ CAMKIIergic neurons in the hippocampal CA3 region to the CA1 region for regulating spatial memory in mice. Specifically, authors have shown that CA3-CCK CAMKIIergic neurons are selectively activated by novel locations during a spatial memory task. Furthermore, authors have identified the CA3-CA1 pathway as crucial for this spatial working memory function, thereby suggesting a pivotal role for CA3 excitatory CCK neurons in influencing CA1 LTP. The data presented appear to be well-organized and comprehensive.

      Strengths:

      (1) This work combined various methods to validate the excitatory CCK neurons in the CA3 area; these data are convincing and solid.

      (2) This study demonstrated that the CA3-CCK CAMKIIergic neurons are involved in the spatial memory tasks; these are interesting findings, which suggest that these neurons are important targets for manipulating the memory-related diseases.

      (3) This manuscript also measured the endogenous CCK from the CA3-CCK CAMKIIergic neurons; this means that CCK can be released under certain conditions.

      Weaknesses:

      (1) The authors do not mention which receptors of the CCK modulate these processes.

      We appreciate the reviewer for raising this important question. Based on our recent work, CCK-B receptors are the primary neural components mediating CCK functions in the hippocampus at both the synaptic plasticity and behavioral levels (Su et al., 2023; Zhang et al., 2024; Wang et al., 2025). To clarify this mechanism, we have added the following content to the revised manuscript:

      “Based on our recent work, CCK signaling in the hippocampus is predominantly mediated by CCK-B receptors, which play a critical role in regulating synaptic plasticity and spatial memory-related behaviors.” (Line 105 - line 106 in the revision).

      (2) This author does not test the CCK gene knockout mice or the CCK receptor knockout mice in these neural processes.

      Thank you for this insightful comment. We previously tested these experiments in an earlier study. Our results showed that high-frequency electrical stimulation failed to induce significant LTP in the CA3-CA1 pathway in both CCK gene knockout (CCK-KO) mice and CCK-B receptor knockout (CCK-BR-KO) mice in vitro (Su et al., 2023; Zhang et al., 2024; Wang et al., 2025). These findings indicate that CCK mediates its synaptic effects predominantly through CCK-B receptors in the CA3-CA1 pathway. Accordingly, we have added this description to the revised manuscript.

      “Additionally, high-frequency electrical stimulation fails to induce LTP in the CA3-CA1 pathway in both CCK-KO and CCK-BR-KO mice, indicating that CCK-dependent synaptic plasticity in this circuit is primarily mediated by CCK-B receptors.” (Line 170 - line 173 in the revision).

      (3) The author does not test the source of CCK release during the behavioral tasks.

      We thank the reviewer for raising this important point. In our previous work, we directly monitored CCK release in the hippocampus during an object-exploration task using a GPCR-based CCK-BR sensor combined with fiber photometry (Su et al., 2023). During object exploration, we observed a rapid and robust increase in CCK-BR sensor fluorescence, indicating activity-dependent CCK release in the hippocampus. Based on these findings, we deduced that hippocampal CCK release plays a critical role in hippocampus-dependent behavioral tasks.

      We acknowledge that hippocampal neurons receive CCK-positive projections from multiple brain regions, making it technically challenging to isolate and monitor the precise source of CCK release in the CA1 area during behavioral tasks in vivo. One potential strategy to address this limitation is selective overexpression of CCK in CA3 neurons (e.g., AAV-CCK delivery), followed by assessment of CCK-BR sensor responses during hippocampal-dependent behaviors. We have added this discussion to the revised manuscript to clarify the source and functional relevance of CCK release during behavioral tasks.

      “Besides, using a GPCR-based CCK-BR sensor combined with fiber photometry, our previous work demonstrated rapid, activity-dependent CCK release in the hippocampus during object-exploratory behavior, supporting a functional role for hippocampal CCK signaling in cognitive tasks (Su et al., 2023). Given that hippocampal neurons receive CCK-positive projections from multiple brain regions, it remains technically challenging to precisely identify the cellular source of CCK release in CA1 during behavior. Future studies employing selective CCK overexpression in CA3 neurons, together with CCK-BR sensor recordings, may help further delineate the contribution of CA3-derived CCK to hippocampal-dependent behaviors.” (Line 313 - line 321 in the revision).

      Citation:

      (1) Wang, J. L., Sha, X. Y., Shao, Y., Zhang, Z. H., Huang, S. M., Lin, H., ... & Sun, J. P. (2025). Elucidating pathway-selective biased CCKBR agonism for Alzheimer’s disease treatment. Cell.

      (2) Zhang, N., Sui, Y., Jendrichovsky, P., Feng, H., Shi, H., Zhang, X., ... & He, J. (2024). Cholecystokinin B receptor agonists alleviates anterograde amnesia in cholecystokinin-deficient and aged Alzheimer's disease mice. Alzheimer's research & therapy, 16(1), 109.

      (3) Su, J., Huang, F., Tian, Y., Tian, R., Qianqian, G., Bello, S. T., ... & He, J. (2023). Entorhinohippocampal cholecystokinin modulates spatial learning by facilitating neuroplasticity of hippocampal CA3-CA1 synapses. Cell Reports, 42(12).

      https://doi.org/10.7554/eLife.109001.1.sa1

      Reviewer #3 (Public review):

      Summary:

      Fengwen Huang et al. used multiple neuroscience techniques (transgenetic mouse, immunochemistry, bulk calcium recording, neural sensor, hippocampal-dependent task, optogenetics, chemogenetics, and interfer RNA technique) to elucidate the role of the excitatory cholecystokinin-positive pyramidal neurons in the hippocampus in regulating the hippocampal functions, including navigation and neuroplasticity.

      Strengths:

      (1) The authors provided the distribution profiles of excitatory cholecystokinin in the dorsal hippocampus via the transgenetic mice (Ai14::CCK Cre mice), immunochemistry, and retrograde AAV.

      (2) The authors used the neural sensor and light stimulation to monitor the CCK release from the CA3 area, indicating that CCK can be secreted by activation of the excitatory CCK neurons.

      (3) The authors showed that the activity of the excitatory CCK neurons in CA3 is necessary for navigation learning.

      (4) The authors demonstrated that inhibition of the excitatory CCK neurons and knockdown of the CCK gene expression in CA3 impaired the navigation learning and the neuroplasticity of CA3-CA1 projections.

      Weaknesses:

      (1) The causal relationship between navigation learning and CCK secretion?

      Thank you for pointing out this important issue. Previous studies have shown that CCK can be rapidly secreted during exploratory behaviors, as detected by the CCK-BR sensor. In parallel, CCK-positive neurons have been demonstrated to play a critical role in the precise execution of hippocampus-dependent spatial learning. Together, these findings suggest that exploratory behavior induces CCK secretion, which in turn contributes to the accuracy of hippocampal-dependent learning and memory processes. Based on this evidence, we propose that CCK secretion serves as a functional link between behavioral exploration and spatial learning. We have added these explanations in the revised manuscript to better clarify the causal relationship between behavioral exploration and CCK secretion:

      “Besides, using a GPCR-based CCK-BR sensor combined with fiber photometry, our previous work demonstrated rapid, activity-dependent CCK release in the hippocampus during object-exploratory behavior, supporting a functional role for hippocampal CCK signaling in cognitive tasks (Su et al., 2023). Given that hippocampal neurons receive CCK-positive projections from multiple brain regions, it remains technically challenging to precisely identify the cellular source of CCK release in CA1 during behavior. Future studies employing selective CCK overexpression in CA3 neurons, together with CCK-BR sensor recordings, may help further delineate the contribution of CA3-derived CCK to hippocampal-dependent behaviors.” (Line 313 - line 321 in the revision)

      (2) The effect of overexpression of the CCK gene on hippocampal functions?

      We thank the reviewer for this comment. In fact, an earlier study from our laboratory demonstrated that intraperitoneal injection of exogenous CCK-4 significantly improved performance in hippocampus-dependent spatial learning tasks in both CCK gene knockout (CCK-KO) mice and Alzheimer’s disease (AD) mouse models. These findings suggest that enhancing CCK signaling can ameliorate hippocampal dysfunction at both the behavioral and synaptic plasticity levels (Zhang et al., 2024; Wang et al., 2025). Accordingly, although direct genetic overexpression of CCK in the hippocampus has not yet been extensively characterized, the observed benefits of exogenous CCK delivery support the notion that increased CCK availability positively modulates hippocampal function and spatial learning. We have cited this study in the revised manuscript to support this interpretation.

      “Interestingly, an earlier study demonstrated that intraperitoneal injection of exogenous CCK-4 significantly improved performance in hippocampus-dependent spatial learning tasks in both CCK gene knockout (CCK-KO) mice and Alzheimer’s disease (AD) mouse models (Zhang et al., 2024). These findings suggest that enhancing CCK signaling can ameliorate hippocampal dysfunction at both the behavioral and synaptic plasticity levels.” (Line 291 - line 297 in the revision)

      (3) What are the functional differences between the excitatory and inhibitory CCK neurons in the hippocampus?

      In the hippocampus, CCK-expressing neurons consist of two major populations with distinct functions: excitatory (glutamatergic) and inhibitory (GABAergic) neurons. Excitatory CCK neurons are relatively sparse and intermingled with pyramidal cells. By releasing glutamate, they directly contribute to excitatory transmission and are thought to participate in synaptic plasticity and information processing related to learning and memory. In contrast, inhibitory CCK neurons are more abundant and include well-characterized interneuron subtypes such as CCK-positive basket cells. These neurons release GABA and primarily target the perisomatic region of pyramidal neurons, providing strong control over neuronal firing. Notably, inhibitory CCK interneurons are highly sensitive to neuromodulatory signals, particularly endocannabinoids via CB1 receptors, enabling dynamic regulation of inhibitory tone and network activity. Together, excitatory CCK neurons mainly support hippocampal excitation and plasticity, whereas inhibitory CCK neurons regulate network dynamics and spike timing. As the focus of the present study is on excitatory CCK neurons, a detailed comparison between these two populations was not included in the original manuscript.

      (4) Do CCK sources come from the local CA3 or entorhinal cortex (EC) during the high-frequency electrical stimulation?

      Thank you for this insightful comment. Our data indicate that the CCK detected during high-frequency stimulation originates from CA3 neurons rather than the entorhinal cortex (EC). As shown in Figure 2, we used an optogenetic approach combined with a GPCR-based CCK sensor to selectively examine CCK release from the CA3-CA1 pathway. ChrimsonR was specifically expressed in CA3 neurons projecting to CA1, restricting light stimulation to CA3 axon terminals. In parallel, the CCK sensor was locally expressed in CA1, allowing real-time detection of CCK release at CA3 presynaptic sites. High-frequency light stimulation robustly evoked CCK signals in CA1, demonstrating activity-dependent CCK release from CA3 terminals. Importantly, EC inputs were neither genetically targeted nor optically stimulated in this experiment, excluding the EC as a source of the detected CCK. Together, these results support the conclusion that CCK released during high-frequency stimulation is derived from local CA3 projections to CA1. Similarly, as the focus of the present study is on excitatory CCK neurons in the CA3 area, a detailed comparison between these two CCK sources was not included in the original manuscript.

      Citation:

      (4) Wang, J. L., Sha, X. Y., Shao, Y., Zhang, Z. H., Huang, S. M., Lin, H., ... & Sun, J. P. (2025). Elucidating pathway-selective biased CCKBR agonism for Alzheimer’s disease treatment. Cell.

      (5) Zhang, N., Sui, Y., Jendrichovsky, P., Feng, H., Shi, H., Zhang, X., ... & He, J. (2024). Cholecystokinin B receptor agonists alleviates anterograde amnesia in cholecystokinin-deficient and aged Alzheimer's disease mice. Alzheimer's research & therapy, 16(1), 109.

      (6) Su, J., Huang, F., Tian, Y., Tian, R., Qianqian, G., Bello, S. T., ... & He, J. (2023). Entorhinohippocampal cholecystokinin modulates spatial learning by facilitating neuroplasticity of hippocampal CA3-CA1 synapses. Cell Reports, 42(12).

    1. eLife Assessment

      Using isolated frog brainstem preparations, pharmacological manipulation of excitability, systematic extracellular unit mapping, and focal microinjections, this study provides important findings on whether the buccal rhythm generator is a discrete anatomical nucleus or a distributed, state-dependent network. The question is conceptually significant and of interest to researchers working within respiratory neurobiology and rhythmogenicity in general, and the preparation and experimental strategy are generally appropriate. However, the evidence for the strongest architectural claims is incomplete, with pseudoreplication in pooled unit-mapping analyses, inconsistent statistical reporting, and limited controls in necessity/sufficiency experiments. Overall, although data are largely convincing, substantial revision and more nuanced interpretation of the results are required before claims of state-dependent architectural reorganization can be considered well-supported.

    2. Reviewer #1 (Public review):

      Summary:

      The authors test whether the frog buccal ventilatory rhythm generator behaves as a discrete, anatomically localized oscillator or as a distributed, state-dependent network. They combine reduced preparations (segment/subsegment work), systematic extracellular unit surveys over a defined grid, and local AMPA/GABA microinjections in a hemisected brainstem preparation. Based on these approaches, the authors conclude that mild global excitation (bath AMPA) broadens the distribution of rhythmically active units and renders a previously defined "buccal area" functionally non-identifiable as a unique necessary/sufficient locus.

      The central idea is plausible, and the overall experimental strategy is appropriate for the question being asked. However, in its current form, the manuscript overstates the strength of inference supporting the "expansion" and "loss of necessity/sufficiency" conclusions. This is primarily due to (a) statistical treatment of unit-mapping data that does not respect clustering by preparation/animal, (b) inconsistent statistical reporting across sections, and (c) limited interpretability of focal inhibitory perturbations under a globally excited state.

      Strengths:

      (1) The manuscript addresses a clear mechanistic question with broader relevance: whether rhythm generation is best conceptualized as a localized kernel or as an emergent distributed property that changes with excitatory state.

      (2) The authors use convergent approaches (reduced preparations, mapping, and necessity/sufficiency-style pharmacological perturbations), which is appropriate for circuit-level inference.

      (3) A strong element is the within-unit analysis supporting state-dependent changes in phase coupling for a subset of units ("lung" units adopting a buccal-like pattern). The authors' offline PCA-based spike sorting (with cluster-quality selection via silhouette score) provides some reassurance that the reported pre/post injection changes are not simply driven by unit misidentification.

      Weaknesses:

      (1) Pseudoreplication in unit-survey statistics undermines the main mapping inference. The Methods state that "Units were pooled from multiple preparations" and that chi-squared tests were used to compare proportions across conditions (baseline vs 60 nM AMPA). The Results similarly report proportion changes (e.g., 110 units pooled from three preparations vs 137 units pooled from three additional animals) analyzed with chi-squared tests. Because many units come from the same preparation/animal, independence is unlikely to hold; therefore, inference about state-dependent reorganization at the systems level should be made at the preparation/animal level or via hierarchical models that explicitly account for clustering.

      (2) Statistical methods are inconsistently described and need harmonization. In the segment dose-response "Analysis," values are described as compared to zero using a "One-sample t-test." Yet Table 1 is titled as using a "Wilcoxon One-sample Test." These discrepancies must be resolved throughout (Methods, Results, figure legends, and tables), including clear reporting of the unit of n and exact test statistics.

      (3) Unit classification and operational definitions raise interpretational concerns. The unit classification scheme defines "buccal units" as those firing during buccal bursts as well as lung bursts, and explicitly notes that "no units were found which fired only during buccal bursts." This is a consequential result, and it currently reads more like a limitation of detection/classification (or state-space sampled) than a robust biological conclusion. Without additional evidence, it weakens claims about a distinct buccal rhythmogenic module and complicates the interpretation of "buccal identity" changes under excitation.

      (4) Microinjection mapping: high exclusion rate and alternative explanations for 'loss of necessity' under excitation. The manuscript reports that 15 experiments were conducted, but 9 were excluded because the buccal area was not found or the preparation was "overdriven." This exclusion rate is too high to leave implicit; it raises concerns about selection bias and demands transparent accounting. Moreover, under baseline conditions, GABA (or AMPA-GABA) microinjections reliably reduce/abolish buccal bursts, but under bath 60 nM AMPA, the same injections produce no significant change in instantaneous frequency. This pattern can be interpreted as network redistribution, but it can also reflect state-dependent changes in gain, dynamic range, or local pharmacological impact (e.g., inhibition being comparatively underpowered in the globally excited state). Additional controls/analyses are required to distinguish these explanations.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors investigate the response of the amphibian respiratory rhythm generator under varying excitability conditions. They use pharmacological agents to increase and/ or decrease synaptic excitability and demonstrate the resilience of buccal rhythms under different conditions. They employ these results to formulate their primary thesis, that there is no obligatory locus of the buccal respiratory rhythm in the frog, and that their respiratory rhythmogenic mechanisms should be considered diffuse and anatomically distributed across a larger brainstem region.

      Strengths:

      This manuscript is well written, with a sufficiently large number of experiments, for which the authors should be congratulated.

      Weaknesses:

      The presented results don't support the authors' main conclusions, and the interpretation of the data is heavily biased toward their hypothesis. This impregnates an unsubstantiated narrative in the Abstract, Introduction, and Discussion of this manuscript, which must be reexamined with the following points in consideration:

      (1) The authors seem to confuse degeneracy with redundancy. For instance, at line 54, they state, "These findings support the broader hypothesis that respiratory rhythm-generating circuits can switch to being diffuse and redundant, with discrete oscillators quickly drowning in a sea of excitations."

      Redundancy means having the same component repeated multiple times to buffer the failure of any single component, whereas degeneracy means different functional components that compensate for one another under perturbations (Goaillard and Marder, ARN 2021)

      Since the premotor-lung units get converted to buccal units under high excitability, this suggests a degenerate mechanism for respiratory rhythm generation- rather than a redundant mechanism, where there should be multiple buccal units that get recruited under different excitability conditions.

      (2) Line 83, "but the essential requirement for a discrete, rudimentary buccal oscillator is also lost".

      This statement is not supported by the data presented in this study. How does the expansion of the buccal unit imply that the essential requirement for discreteness is lost? Under increased excitability, does the burst/rhythm initiation zone also expand? Or does it still remain centered around the location of buccal units under physiological conditions? Increased excitability can lead to recruitment of a larger area, without a change in the location of the rhythmogenic kernel.

      (3) Line 86, "... oscillators should be viewed as promiscuous flexible functional entities that expand or contract...".

      Oscillators can be regarded as promiscuous only if, under physiological conditions, they switch positions. Under high excitability, only the flexibility argument holds, which has been established in mammals before (e.g., CA Del Negro, K Kam, JA Hayes, JL Feldman, The Journal of physiology 587 (6), 1217-1231; CA Del Negro, C Morgado-Valle, JL Feldman,Neuron 34 (5), 821-830; NA Baertsch, LJ Severs, TM Anderson, JM Ramirez, Proceedings of the National Academy of Sciences 116 (15), 7493-7502; NA Baertsch, HC Baertsch, JM Ramirez Nature communications 9 (1), 843).

      Results:

      (4) Interpretation of data in Figure 6.

      How does the Buccal activity and L2 Power stroke change with 60nm AMPA (in CN5)? Does the increase in the Buccal neurons and decrease in power stroke neurons also reflect in the CN5 activity? Also see comments on Figure 9 data below.

      (5) Interpretation of data in Figure 7.

      Here, classifying buccal neurons solely by spiking may obscure the fact that the 'silent' neurons under baseline conditions were part of the rhythmic network but could not spike due to subthreshold inputs. 60 nM AMPA increased their firing in response to previously subthreshold synchronous inputs during the buccal burst. Intracellular recordings are required to negate this possibility and establish that the neuronal classification is robust.

      (6) Interpretation of data in Figure 8.

      "Lung units can transform into buccal units under excitation".<br /> CN5 buccal and lung bursts need to be compared before and after AMPA injection. From Figure 8 A-D, it is apparent that the example Unit2's activity increases during the buccal bursts, after AMPA injection. However, they are also present in buccal burst pre-AMPA, albeit with less frequency.

      It is striking that the pre-AMPA epoch (panel A) is less than half of the post-AMPA epoch. This would, in itself, lead to a biased estimate of lung units that are active under the baseline condition during the buccal bursts.

      Figure 8G, meta-analysis of lung units spiking during the baseline buccal bursts is warranted to interpret the main claim of this figure. Similarly, analysis of spiking per lung burst for the post-AMPA condition is essential for comparing the lung unit's contribution under high excitability.

      (7) Interpretation of data in Figure 9

      "Buccal area loses importance under increased excitation."

      This interpretation is not fully supported by the data presented in this manuscript. Under 60 nm AMPA, does the ratio of lung burst to buccal burst change in CN5? This analysis is crucial for determining whether the lung units are indeed converted into buccal bursts at the expense of lung activity or whether their appearance during buccal bursts is incidental due to increased excitability. In the baseline, there are 4-5 buccal bursts per lung burst, whereas under high excitability, there are 2-3 buccal bursts per lung burst (Figure 9 A-B). This seems inconsistent with the conclusion that increased excitability converts lung units into buccal units (Figures 6 &7).

      Could the authors comment on the connectivity between the lung and the buccal units? Results in Figure 9A-B indicate that lung units may receive an efference copy of buccal units, and under high excitability, their spikes may generate negative feedback onto the buccal units, terminating their bursts. This could explain the decrease in the buccal-to-lung burst in high-AMPA conditions. This type of circuit interaction resembles the mammalian breathing CPG, in which the parafacial/RTN (which controls the abdominal muscles) and preBötC (which controls the diaphragm) interact and cross-inhibit each other.

      (8) Line 382.

      "Buccal-like bursting produced from two independent slices".

      The two "independent" slices have portions of the same anatomical kernel, the buccal rhythm generator. This experiment is like the sandwich slice preparation of preBötC (Del Negro Lab), in which two thinner slices exhibit rhythmic activity. Thus, the two slices are not independent; they are anatomically adjacent and functionally overlapping.

    4. Reviewer #3 (Public review):

      Summary:

      This study uses isolated frog brainstem preparations to test whether inspiratory rhythm generation is confined to a narrowly defined neural center or instead reflects the activity of a distributed and adaptable network. Building on prior rodent work, the authors examine structural and functional parallels between the frog Buccal Area and the mammalian preBötzinger complex. By increasing excitatory drive, they assess whether a localized rhythmogenic region can expand into a broader network that participates in buccal rhythm generation, providing insight into how respiratory circuits are dynamically reconfigured across physiological states.

      Strengths:

      The work presents compelling evidence that ventilatory rhythm generation is supported by a flexible, state-dependent network rather than a fixed anatomical locus. The experimental preparation is well-suited to address these questions, and the data are generally of high quality. The demonstration that increased excitation recruits a more distributed network parallels observations in mammalian systems and strengthens the translational relevance of the findings. Overall, the analyses are thoughtful, and the interpretations are largely well supported by the results.

      Weaknesses:

      Some issues limit the strength of the conclusions. First, the study does not address the transition from eupnea to gasping in mammals, which could provide important physiological context for the observed AMPA-induced network reorganization. Second, the reported transformation of lung-active neurons into buccal-active neurons would benefit from additional analyses to clarify whether neurons switch identities or acquire dual activity. Finally, the necessity and sufficiency experiments in Figure 9 require further support, particularly through AMPA dose-response analyses and more comprehensive GABA manipulations, to confirm that network expansion does not obscure the continued functional importance of the core buccal region.

    5. Author response:

      Reviewer #1 (Public review):

      Hierarchical Inference (Unit Survey)

      We agree that pooling units across preparations can overstate the strength of inference if preparation-level clustering is ignored. We will therefore reanalyze the unit-survey dataset using a hierarchical approach in which the preparation/animal is treated as the unit of inference. Our pooled dataset was derived from three chunk preparations exposed to AMPA and three baseline preparations, allowing us to report per-preparation proportions and variability as requested.

      A preliminary reanalysis of the buccal segment preparations is summarized below. In this analysis, the unit of inference is shifted from individual recorded units to the preparation level (n = 3 baseline; n = 3 at 60 nM AMPA), thereby accounting for potential within-preparation dependence.

      Author response table 1.

      The distribution of units for each of the three preparations per condition is as follows:

      Using the proportion of buccal units per preparation as the dependent variable:

      Baseline (n = 3): mean proportion of buccal units = 6.5% (SD 5.7%).

      60 nM AMPA (n = 3): mean proportion of buccal units = 53.2% (SD 6.0%).

      Absolute difference in proportions = 46.7% (95% CI 33.4% to 59.8%).

      Independent-samples t-test on per-preparation proportions: t(4) = 9.77, p = 0.0006.

      Thus, this preliminary hierarchical reanalysis indicates that the observed recruitment is consistent across preparations and is not driven by outlier data from a single animal. These results support substantial expansion of the buccal oscillator with excitation.

      Statistical Standardization: In the revision, we will better justify our use of parametric and non-parametric versions of the one-sample tests and review usage in the Methods, Table 1, and figure legends for consistency.

      Exclusion criteria for microinjection experiments: We will extend the description of these experiments by including a flow diagram summarizing the 15 attempted microinjection experiments and documenting the technical reasons for the 9 exclusions. These exclusions reflected the technical requirements of the preparation: (a) the buccal area had to be localized before AMPA excitation so that the effects of buccal-area manipulation during excitation could be interpreted reliably, which was not always possible; and (b) preparations had to exhibit sufficiently sustained periods of consecutive buccal bursting to permit quantification of buccal burst frequency, whereas some preparations expressed motor patterns dominated by lung bursts.

      Pharmacological Potency and Necessity: We will revise the wording of this section to make the causal interpretation more precise. Our data already show that local GABA microinjections can reverse the excitatory effects of local AMPA microinjections, providing an internal control for local pharmacological efficacy of GABA when the local network is excited. Notably, the local AMPA concentration used in these experiments (5 µM) is nearly two orders of magnitude greater than the 60 nM concentration used in bath application. We therefore interpret the failure of focal GABA inhibition to abolish rhythm during global excitation as being consistent with expansion of rhythmogenic capacity beyond the spatial reach of the local injection, rather than with failure of the GABA manipulation itself.

      Finding an inhibitory site that remains sensitive in bath applied AMPA is an interesting experiment but this would require identifying the anatomical substrate of a brainstem circuit for a non-ventilatory circuit in Rana that is guaranteed not to undergo reconfiguration with AMPA. This is beyond the scope of the current manuscript; based on our work to identify the neuronal substrate for ventilation in Rana, this would take at least five years to complete. In addition, having identified such a circuit there would be no guarantee that AMPA would not cause reconfiguration in this case too. With regards to transection boundaries and location of injections, we agree these would be useful refinements. We used the location of nerves as reliable landmarks to guide transections and located the buccal area using stereotactic coordinates to guide micropipette insertion and functional criteria (AMPA and GABA sufficiency and necessity tests) to locate the exact position based on our previous work.

      Unit Classification: We will review the nomenclature we use to define units to ensure it does not cause confusion and provide more explicit criteria for unit classes. This will include clarification of the absence of “buccal-only” units as currently defined. Specifically, when both buccal and lung rhythms are present, units active during buccal bursts are also active during lung bursts in our preparation. This does not conflict with the multiple interacting oscillator model we have proposed previously. Rather, recruitment of buccal-area neurons during lung bursts is consistent with a model in which the lung oscillator excites the buccal oscillator. It is also consistent with prior evidence that lung bursts persist after buccal-area ablation. In addition, burst frequency during lung episodes exceeds buccal burst frequency during intervening buccal periods. We will revise the text to make this logic clearer.

      Reviewer #2 (Public review):

      (1) Degeneracy vs. Redundancy

      We agree that degeneracy is the more precise term for the phenomenon our data demonstrate, in which structurally and functionally distinct neurons (lung units) acquire the capacity to participate in buccal rhythm generation under excitation. The Discussion already uses this language (e.g., "necessity and sufficiency may not work in a large degenerate network where rhythm generation is distributed across many elements"), but we used the word "redundant" in the Key Points Summary and Abstract in the broader sense of distributed robustness that a wider readership could grasp. Nonetheless, we recognize the distinction drawn by Goaillard and Marder (2021) and, considering the reviewers concerns, we will revise the Abstract and Key Points to adopt the degeneracy framework consistently.

      (2) Loss of Essential Requirement for a Discrete Oscillator

      The reviewer asks whether expansion of the rhythmically active region necessarily implies loss of the rhythmogenic kernel. We believe our necessity and sufficiency experiments (Figure 9) directly address this. Under baseline conditions, GABA microinjection into the buccal area reliably abolishes buccal bursting; under 60 nM bath AMPA, the same injection at the same location and volume has no significant effect on buccal frequency. If the kernel remained essential and the surrounding recruitment were merely supplementary, local inhibition of the kernel should still slow or abolish the rhythm. It does not. We interpret this as evidence that the essential requirement for the discrete buccal area is lost under excitation, not merely that a larger area has been recruited around a still-critical core. We acknowledge, however, that the word "lost" could be read as implying permanent elimination rather than state-dependent suspension, and we will temper this language in the revision.

      (3) Novelty Relative to Mammalian Studies

      We appreciate the reviewer drawing attention to the cited mammalian literature (Del Negro et al., 2002, 2009; Baertsch et al., 2018, 2019), which we discuss in detail in the manuscript. However, we respectfully note that our findings extend this literature in several ways that the public review does not acknowledge. First, Baertsch et al. demonstrated recruitment of tonic or silent neurons to become phasically active during inspiration; we show that neurons already assigned to one oscillator phase (lung) can be dynamically reassigned to another (buccal), which represents a qualitatively different form of reconfiguration. Second, we developed a novel approach to functionally ablate motor neuron pools using high-frequency nerve stimulation, enabling the unit survey to be interpreted at the premotor level which was not achieved in the mammalian studies cited. Third, our data provide the first demonstration of state-dependent oscillator expansion in a non-mammalian tetrapod, offering evolutionary context that strengthens the generality of the principle. We will revise the term "promiscuous" if it overstates the claim, but we maintain that our data support the conclusion that oscillator boundaries are flexible, which goes beyond what has been shown in mammals.

      (4) Figure 6, CN5 Output Under AMPA

      The reviewer asks whether the shift in premotor unit composition is reflected in CN5 motor output. This is a reasonable question. As noted in the manuscript, 60 nM AMPA produces only minor changes in the overt motor pattern as recorded from CN5, which is precisely why we interpret the premotor changes as a reorganization of the network's internal architecture that is not readily apparent from motor output alone. This is in sharp contrast to observations of substantive network reconfiguration in mammals in which eupnea is replaced by the pathological condition of gasping. We will add quantification of CN5 burst parameters (amplitude, duration, frequency) under baseline and 60 nM AMPA to make this point explicit.

      (5) Subthreshold Recruitment vs. Network Expansion

      The reviewer suggests that neurons classified as newly rhythmic under AMPA may have been part of the rhythmic network all along, receiving subthreshold inputs at baseline. We are grateful to the reviewer for highlighting this and hope they would agree that the literature clearly demonstrates that all respiratory neurons receive subthreshold phasic inputs of one kind or another, perhaps providing a clue that reconfiguration is a common feature of respiratory networks generally. Regardless of the implications for other animals, we agree this is likely the mechanism at work in the frog, and indeed our manuscript states that "this increase in the number and proportion of premotor buccal units is due in part to recruitment of sub-threshold buccal neurons that, under low excitability, only fire during lung bursts," citing intracellular evidence from Kogo and Remmers (1994) that lung neurons in this region receive subthreshold buccal-timed input. We note that this observation does not diminish our conclusion and likely explains the mechanism by which network expansion occurs. Whether one calls these neurons "newly recruited" or "pushed above threshold," the functional consequence is the same: a larger population of neurons is now rhythmically active during buccal bursts, and the necessity of the original buccal area is lost. We will clarify this reasoning in the revision and acknowledge the limitation that additional intracellular recordings from our preparation would be needed to fully characterize the subthreshold dynamics.

      (6) Figure 8, Epoch Length and Meta-analysis

      The reviewer notes that the pre-AMPA epoch appears shorter than the post-AMPA epoch in Figure 8A, which could bias unit classification. We will address this in the revision by reporting epoch durations explicitly and addressing its implication on spike counts where appropriate. Regarding the request for meta-analysis of lung unit spiking during baseline buccal bursts: this analysis is part of the rationale for the phase-recruitment panels, and we will expand Figure 8 to include the requested cross-condition comparisons (lung unit activity during baseline buccal bursts, and during post-AMPA lung bursts) as also suggested by Reviewer 3.

      (7) Figure 9, Buccal-to-Lung Burst Ratio

      The reviewer observes that the ratio of buccal to lung bursts decreases from approximately 4-5:1 under baseline to 2-3:1 under 60 nM AMPA, and suggests this is inconsistent with conversion of lung units into buccal units. We do not believe this is inconsistent. The buccal-to-lung burst ratio reflects the overt motor pattern, which is determined by the interaction of multiple oscillators and is influenced by AMPA at both buccal and lung levels. A change in this ratio does not speak to whether individual premotor units have acquired buccal-timed activity; the unit survey and the single-unit transformation data (Figure 8) address that question directly. Regarding the alternative model involving efference copy and cross-inhibition: this is an interesting hypothesis, but it is speculative and not tested by the current dataset. We are happy to discuss lung-buccal interactions more fully in the revision, including the parallels to parafacial/preBötC interactions in mammals, but we note that our data on unit transformation are better explained by network reconfiguration than by a feedback model that remains to be tested.

      (8) "Independent" Slices

      The reviewer compares our Level 2 transection to the preBötC sandwich slice preparation and argues the two resulting slices are not independent. We take the reviewer's point that "independent" may be read as implying no shared developmental or functional origin, which is not our intent. By "independent" we mean that the two physically separated slices can each generate rhythmic output without being synaptically connected to each other. This is, in fact, our central point: rhythmogenic capacity is distributed across a region broad enough to endow two separated slices with independent rhythm-generating capability when excited. We note that the analogy to the sandwich slice is imperfect because in our Level 1 cuts, only the rostral slice containing the buccal area generates rhythm -- the caudal slice does not -- whereas Level 2 cuts that bisect the buccal area produce rhythmicity in both halves, consistent with distributed capacity specifically within the buccal region. We will revise the wording to clarify what we mean by "independent" in this context.

      Reviewer #3 (Public review):

      Physiological Parallels: We will expand the Discussion to place these findings in a broader comparative context, including the eupnea-to-gasping transition in mammals as an example of state-dependent reconfiguration of respiratory networks. This will also allow us to clarify two advances that may otherwise be missed when comparing our work to that in mammals: (a) we developed a novel approach to functionally eliminate motor neurons, allowing mapped units to be interpreted as premotor; and (b) the state-dependent reconfiguration of the buccal oscillator occurred without qualitative changes in the overt lung-buccal motor pattern.

      Unit Transformation Analysis: We will revise Figure 8 to improve clarity around the observed lung-to-buccal transformation by expanding the phase-recruitment panels as suggested and will revisit the operational definitions of lung and buccal unit identity to reduce ambiguity. The central observation is that some units active only during lung bursts under one condition become active during buccal bursts when network excitation is increased.

      Saturation vs. Network Expansion: We will directly address the possibility that 60 nM bath-applied AMPA simply pushes the network toward a frequency ceiling. Two observations strongly argue against this interpretation: (a) 60 nM global AMPA produced only mild changes in buccal frequency, whereas local AMPA injection at much higher concentrations produced larger effects; and (b) local GABA was sufficient to reverse the effects of high-concentration local AMPA microinjections but insufficient to abolish rhythm during low-concentration global AMPA application. Together, these findings are more consistent with global AMPA endowing the network with distributed rhythm-generating capacity than with simple saturation of a discrete local oscillator. Notwithstanding these arguments, we will attempt to extend AMPA/GABA dose response experiment as suggested or add the lack of such experiments as a caveat to our interpretation.

      Figure 9C Correction: We will correct the statistical markings in Figure 9C to align with the text in the Results regarding the significance of frequency changes under 60 nM AMPA.

      In total, we believe these revisions will improve the rigor and clarity of the manuscript while preserving the central conclusion supported by the data: that the organization of the frog respiratory rhythmogenic network is state dependent and becomes more distributed under excitation.

    1. eLife Assessment

      This valuable study addresses a timely question regarding the contribution of transposable elements to splice isoform diversity in the Drosophila brain, directly engaging with recent conflicting findings in the field. The work provides convincing evidence that TE-gene chimeric transcripts are detectable and that prior discrepancies largely arise from methodological differences in computational pipelines and experimental design. The combination of reanalysis, methodological clarification, and targeted validation represents a technical contribution that will be of interest to researchers studying transcriptome complexity and transposable elements. However, the strength of evidence would be further enhanced by increased methodological transparency, more rigorous experimental controls, and a more cautious interpretation of functional implications.

    2. Reviewer #1 (Public review):

      Summary:

      Choucri and Treiber have reassessed their previous study on TE-gene chimeric transcripts in neural genes in response to Azad et al (2024). Azad and colleagues argued that, contrary to Choucri and Treiber's findings, chimeric TE-mRNAs are relatively infrequent, and they cautioned that further optimization of bioinformatics pipelines is needed to detect TE insertions from RNAseq accurately. In this short response, Choucri and Treiber clearly demonstrate that differences in the tools used between their study and that of Azad et al. likely account for the contrasting results, along with RT-PCR failure in designing primers that would match the chimeric transcript, and the use of different Drosophila lines. The authors emphasize the need for uniform, standardized criteria in such analysis, which would ultimately strengthen and advance the field.

      Strengths:

      The addition of a ratio to compute the number of splice reads specific to the chimeric transcript and compare to the exon-exon splice reads is really interesting because it opens the door to finally quantify the contribution of chimeric TEs to the overall gene expression, although this is not the scope of the present article. The clear dissection of chimeric transcripts, along with the results from Azad et al, allows us to understand the differences between the two studies confidently. Finally, the discussion on Drosophila lines is indeed essential, given that the lines and even individuals have high TE polymorphism.

      Weaknesses:

      I think it is necessary to add more detail to this article, for instance, the differences between TEchim and Tidal could be laid out more precisely. Regarding the roo example, one of the caveats of this family, along with others, is the presence of simple repeats. It would be important to show that the simple repeats are not interfering with the read mapping. Regarding the experiments, if we are looking for a standardized protocol, then we should have a detailed material and methods section, with every experiment, replicate, and PCR temperature clearly defined. Finally, and in my opinion, more importantly, the use of RT negative controls on the RT PCRs, along with DNA PCRs to show insertion presence, is mandatory for testing the presence of chimeric genes. Of course, water negative PCR controls are also needed, and unfortunately, absent from Figure 3.

    3. Reviewer #2 (Public review):

      Summary:

      This study by Choucri and Treiber aims to directly address a recent critique regarding the role of transposable elements (TEs) in diversifying the neural transcriptome of Drosophila. The authors seek to demonstrate that TEs are not merely genomic "noise" but are frequently and reliably "exonized" into brain-specific mRNA. By introducing an upgraded computational pipeline, TEChim, and conducting precise experimental validations, the authors set out to show that TE-mediated splicing represents a genuine biological phenomenon that expands the molecular repertoire of the nervous system.

      Strengths:

      The study's primary strength lies in its rigorous technical "forensic" analysis of previous failed replication attempts. The authors convincingly demonstrate that the lack of signal in the opposing study stemmed from a fundamental methodological mismatch: the software used by the critics (TIDAL) is logically incapable of detecting splice sites located within TE sequences. Importantly, the authors complement this computational clarification with definitive experimental evidence through an effective "experimental rescue." By employing correctly designed primers and matching the genetic backgrounds of the fly strains, thereby accounting for genomic polymorphisms, they successfully validated all seven loci that were previously reported as undetectable. This dual-pronged strategy, addressing both algorithmic bias and experimental design, establishes a more robust technical benchmark for the detection and validation of TE-derived exons in neural tissues.

      Weaknesses:

      While the technical rebuttal is highly convincing, the scope of the study remains primarily defensive. As a response to a prior critique, the work focuses on establishing the existence and detectability of chimeric TE-derived transcripts rather than exploring their broader functional consequences. As a result, there is limited new insight into how these TE-modified isoforms influence neural circuit function or organismal behavior. In addition, the detection and validation of these events remain technically demanding, requiring deep sequencing and specialized bioinformatic expertise, which may limit broader adoption by laboratories without dedicated computational resources.

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript by Choucri and Treiber responds to a recent paper by Azad et al., which responds to a paper by Treiber and Wadell (Genome Research, 2020). The controversy relates to the detection of transcripts with transposable elements (TEs) spliced into them in the Drosophila brain.

      Strengths:

      The authors now argue convincingly that these transcripts exist using an improved, updated version of their pipeline. They also validate some of their findings using RT-PCR and explain why Azad et al. failed to detect these transcripts due to methodological errors. Overall, I am convinced that these transcripts exist and that the TE-derived transcripts described by Choucri and Treiber are real.

      Weaknesses:

      The authors should mention that combining PCR-amplified cDNA generation with short-read sequencing is suboptimal for detecting TE-fusion transcripts. Recently, direct long-read ONT RNA sequencing, which does not require amplification and spans the entire transcript, has been used to detect similar transcripts in human stem cells and the human brain (PMID: 40848716 & Garza et al, BioRxiv). Had the authors used this technology to validate their findings, there would be no question about these transcripts. If not doing such experiments, then they should at least discuss the possibility and the advantage of the approach.

    1. eLife Assessment

      This study presents an important methodological advance-Liver-CUBIC combined with multicolor metallic nanoparticle perfusion-that enables high-resolution 3D visualization of the liver's complex multi-ductal architecture. The identification of the Periportal Lamellar Complex (PLC) as a novel perivascular structure with distinct cellular composition and low-permeability characteristics is convincing, supported by rigorous imaging data. The observed scaffolding role during fibrosis offers intriguing biological insights, though the functional claims would benefit from direct experimental validation.

    2. Reviewer #1 (Public review):

      [Editors' note: this version has been assessed by the Reviewing Editor without further input from the original reviewers. The authors have addressed the minor comments raised in the previous round of review.]

      Summary:

      In this manuscript, Chengjian Zhao et al. focused on the interactions between vascular, biliary, and neural networks in the liver microenvironment, addressing the critical bottleneck that the lack of high-resolution 3D visualization has hindered understanding of these interactions in liver disease.

      Strengths:

      This study developed a high-resolution multiplex 3D imaging method that integrates multicolor metallic compound nanoparticle (MCNP) perfusion with optimized CUBIC tissue clearing. This method enables the simultaneous 3D visualization of spatial networks of the portal vein, hepatic artery, bile ducts, and central vein in the mouse liver. The authors reported a perivascular structure termed the Periportal Lamellar Complex (PLC), which is identified along the portal vein axis. This study clarifies that the PLC comprises CD34⁺Sca-1⁺ dual-positive endothelial cells with a distinct gene expression profile, and reveals its colocalization with terminal bile duct branches and sympathetic nerve fibers under physiological conditions.

      Comments on revisions:

      The authors very nicely addressed all concerns from this reviewer. There are no further concerns and comments.

    3. Reviewer #3 (Public review):

      Xu, Cao and colleagues aimed to overcome the obstacles of high-resolution imaging of intact liver tissue. They report successful modification of the existing CUBIC protocol into Liver-CUBIC, a high-resolution multiplex 3D imaging method that integrates multicolor metallic compound nanoparticle (MCNP) perfusion with optimized liver tissue clearing, significantly reducing clearing time and enabling simultaneous 3D visualization of the portal vein, hepatic artery, bile ducts, and central vein spatial networks in the mouse liver. Using this novel platform, the researchers describe a previously unrecognized perivascular structure they termed Periportal Lamellar Complex (PLC), regularly distributed along the adult liver portal veins.<br /> Using available scRNAseq data, the authors assessed the CD34⁺Sca-1⁺ cells' expression profile, highlighting mRNA presence of genes linked to neurodevelopment, bile acid transport, and hematopoietic niche potential. Different aspects of this analysis were then addressed by protein staining of selected marker proteins in the mouse liver tissue. Next, the authors addressed how the PLC and biliary system react to CCL4-induced liver fibrosis, implying PLC dynamically extends, acting as a scaffold that guides the migration and expansion of terminal bile ducts and sympathetic nerve fibers into the hepatic parenchyma upon injury.

      The work clearly demonstrates the usefulness of the Liver-CUBIC technique and the improvement of both resolution and complexity of the information, gained by simultaneous visualization of multiple vascular and biliary systems of the liver. The identification of PLC and the interpretation of its function represent an intriguing set of observations that will surely attract the attention of liver biologists as well as hepatologists. The importance of the CD34+/Sca1+ endothelial cell population and claims based on transcriptomic re-analysis require future assessment by functional experimental approaches to decipher the functional molecules involved in PLC formation, maintenance, and the involvement in injury response before establishing their role in biliary, arterial, and neural liver systems.

      Strengths:

      The authors clearly demonstrate an improved technique tailored to the visualization of the liver vasulo-biliary architecture in unprecedented resolution.<br /> This work proposes a new morphological feature of adult liver facilitating interaction between the portal vein, hepatic arteries, biliary tree, and intrahepatic innervation, centered at previously underappreciated protrusions of the portal veins - PLCs.

      Weaknesses:

      The importance of CD34+Sca1+ endothelial cell sub-population for PLC formation and function was not tested and warrants further validation.

    4. Author Response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Chengjian Zhao et al. focused on the interactions between vascular, biliary, and neural networks in the liver microenvironment, addressing the critical bottleneck that the lack of high-resolution 3D visualization has hindered understanding of these interactions in liver disease.

      Strengths:

      This study developed a high-resolution multiplex 3D imaging method that integrates multicolor metallic compound nanoparticle (MCNP) perfusion with optimized CUBIC tissue clearing. This method enables the simultaneous 3D visualization of spatial networks of the portal vein, hepatic artery, bile ducts, and central vein in the mouse liver. The authors reported a perivascular structure termed the Periportal Lamellar Complex (PLC), which is identified along the portal vein axis. This study clarifies that the PLC comprises CD34⁺Sca-1⁺ dual-positive endothelial cells with a distinct gene expression profile, and reveals its colocalization with terminal bile duct branches and sympathetic nerve fibers under physiological conditions.

      Comments on revisions:

      The authors very nicely addressed all concerns from this reviewer. There are no further concerns and comments.

      We thank the reviewer for the positive evaluation and helpful feedback.

      Reviewer #3 (Public review):

      Xu, Cao and colleagues aimed to overcome the obstacles of high-resolution imaging of intact liver tissue. They report successful modification of the existing CUBIC protocol into Liver-CUBIC, a high-resolution multiplex 3D imaging method that integrates multicolor metallic compound nanoparticle (MCNP) perfusion with optimized liver tissue clearing, significantly reducing clearing time and enabling simultaneous 3D visualization of the portal vein, hepatic artery, bile ducts, and central vein spatial networks in the mouse liver. Using this novel platform, the researchers describe a previously unrecognized perivascular structure they termed Periportal Lamellar Complex (PLC), regularly distributed along the adult liver portal veins.

      Using available scRNAseq data, the authors assessed the CD34<sup>+</sup>/Sca-1<sup>+</sup> cells' expression profile, highlighting mRNA presence of genes linked to neurodevelopment, bile acid transport, and hematopoietic niche potential. Different aspects of this analysis were then addressed by protein staining of selected marker proteins in the mouse liver tissue. Next, the authors addressed how the PLC and biliary system react to CCL4-induced liver fibrosis, implying PLC dynamically extends, acting as a scaffold that guides the migration and expansion of terminal bile ducts and sympathetic nerve fibers into the hepatic parenchyma upon injury.

      The work clearly demonstrates the usefulness of the Liver-CUBIC technique and the improvement of both resolution and complexity of the information, gained by simultaneous visualization of multiple vascular and biliary systems of the liver. The identification of PLC and the interpretation of its function represent an intriguing set of observations that will surely attract the attention of liver biologists as well as hepatologists. The importance of the CD34+/Sca1+ endothelial cell population and claims based on transcriptomic re-analysis require future assessment by functional experimental approaches to decipher the functional molecules involved in PLC formation, maintenance, and the involvement in injury response before establishing their role in biliary, arterial, and neural liver systems.

      Strengths:

      The authors clearly demonstrate an improved technique tailored to the visualization of the liver vasulo-biliary architecture in unprecedented resolution.

      This work proposes a new morphological feature of adult liver facilitating interaction between the portal vein, hepatic arteries, biliary tree, and intrahepatic innervation, centered at previously underappreciated protrusions of the portal veins - PLCs.

      Weaknesses:

      The importance of CD34+Sca1+ endothelial cell sub-population for PLC formation and function was not tested and warrants further validation.

      We thank the reviewer for the valuable comment regarding the potential role of the CD34<sup>+</sup>/Sca-1<sup>+</sup> endothelial cell sub-population in PLC function.

      We agree that direct functional validation would be a crucial next step to confirm the contribution of this specific sub-population to PLC formation and function. The focus of the present study remains on the spatial localization and reproducible characterization of PLC structures based on 3D imaging, as well as the relevant transcriptional features revealed by single-cell analysis.

      To avoid overinterpretation, we have revised the Discussion section accordingly, providing a more focused and cautious description of the related findings.

      Comments on revisions:

      I appreciate the author's effort to revise the text so it more rigorously adheres to the presented evidence. Following a thorough read of the revised text, a few remaining minor issues were identified in the Discussion.

      (1) From where comes the hard evidence for PLC being the stem cell niche in the following sentence?

      for the two following statements:

      This suggests that the PLC may not only provide structural support but also serve as a perivascular stem cell niche specific to the portal region, potentially involved in hematopoiesis and tissue regeneration.

      The PLC serves as a directional scaffold for ductal growth, a specialized stem cell niche, and a potential site of neurovascular coupling.

      We thank the reviewer for this important comment. We agree that the term “stem cell niche” may imply functional evidence for direct stem cell regulation, which was not demonstrated in this study. Our conclusions were based on the spatial enrichment and transcriptional features of CD34<sup>+</sup>/Sca-1<sup>+</sup> endothelial populations expressing hematopoiesis-related genes in the portal region.

      To avoid overinterpretation, we have revised the sentence to remove the term “stem cell niche” and instead describe the PLC as being enriched in perivascular endothelial cell populations with hematopoiesis-related gene expression features. The revised text now reads:

      “These results suggest that, beyond structural support, the PLC in the portal region is enriched with perivascular endothelial cell populations exhibiting hematopoiesis-related gene expression features.”

      We have also modified the corresponding statement later in the Discussion. It now reads:

      “The PLC serves as a directional scaffold for ductal growth, displays distinct perivascular endothelial transcriptional features in the portal region, and may represent a potential site of neurovascular coupling.”

      We believe this wording more accurately reflects the descriptive and transcriptomic nature of our data without implying functional niche activity.

      (2) In the following paragraph, I lack references to the previously published evidence of liver innervation guidance mechanisms, such as the mesenchyme-mediated guidance (CD31- population) Gannoun et al., 2023 https://doi.org/10.1242/dev.201642, an important context for your finding.

      Further analysis showed significant upregulation of genes involved in neurodevelopment and axonal guidance in the CD34<sup>+</sup>/Sca-1<sup>+</sup> cluster, along with activation of neuronal signaling pathways. Immunostaining confirmed the presence of TH<sup>+</sup> sympathetic nerve fibers wrapping around the PLC in a "beads-on-a-string" pattern (Fig. 6), consistent with a classic neurovascular unit(Adori et al., 2021). Previous studies have shown that sympathetic nerves enter the liver along collagen fibers of Glisson's capsule and interact with hepatic arteries, portal veins, and bile duct epithelium, supporting the PLC as a scaffold for intrahepatic neurovascular integration.

      We thank the reviewer for highlighting the importance of previously published evidence regarding liver innervation guidance mechanisms. We agree that these studies provide important context for interpreting the neurodevelopmental and axon guidance–related transcriptional signatures observed in our dataset. Accordingly, we have revised the Discussion section to incorporate reference to mesenchyme-mediated axon guidance mechanisms in the portal region during liver development (Gannoun et al., 2023). This addition better situates our findings within the existing literature.

      (3) Several sentences have issues with a lack of space between words.

      We have carefully re-examined the entire manuscript for spacing and formatting inconsistencies and corrected minor typographical issues to ensure uniform formatting throughout the text.

    1. eLife Assessment

      This manuscript presents a valuable study of the activity and functional relevance of different circuits in the dentate gyrus of mice performing a pattern separation task. Solid evidence is presented to support the paper's central conclusions. The study is likely to be of interest to those studying the subregional organization and cell type-specific functions of the dentate gyrus.

    2. Reviewer #1 (Public review):

      This manuscript investigates how dentate gyrus (DG) granule cell subregions, specifically suprapyramidal (SB) and infrapyramidal (IB) blades, are differentially recruited during a high cognitive demand pattern separation task. The authors combine TRAP2 activity labeling, touchscreen-based TUNL behavior, and chemogenetic inhibition of adult-born dentate granule cells (abDGCs) or mature granule cells (mGCs) to dissect circuit contributions.

      This manuscript presents an interesting and well-designed investigation into DG activity patterns under varying cognitive demands and the role of abDGCs in shaping mGC activity. The integration of TRAP2-based activity labeling, chemogenetic manipulation, and behavioral assays provides valuable insight into DG subregional organization and functional recruitment. However, several methodological and quantitative issues limit the interpretability of the findings. Addressing the concerns below will greatly strengthen the rigor and clarity of the study.

      Major points:

      (1) Quantification methods for TRAP+ cells are not applied consistently across panels in Figure 1, making interpretation difficult. Specifically, Figure 1F reports TRAP+ mGCs as density, whereas Figure 1G reports TRAP+ abDGCs as a percentage, hindering direct comparison. Additionally, Figure 1H presents reactivation analysis only for mGCs; a parallel analysis for abDGCs is needed for comparison across cell types.

      (2) The anatomical distribution of TRAP+ cells is different between low- and high-cognitive demand conditions (Figure 2). Are these sections from dorsal or ventral DG? Is this specific to dorsal DG, as itis preferentially involved in cognitive function? What happens in ventral DG?

      (3) The activity manipulation using chemogenetic inhibition of abDGCs in AsclCreER; hM4 mice was performed; however, because tamoxifen chow was administered for 4 or 7 weeks, the labeled abDGC population was not properly birth-dated. Instead, it consisted of a heterogeneous cohort of cells ranging from 0 to 5-7 weeks old. Thus, caution should be taken when interpreting these results, and the limitations of this approach should be acknowledged.

      (4) There is a major issue related to the quantification of the DREADD experiments in Figure 4, Figure 5, Figure 6, and Figure 7. The hM4 mouse line used in this study should be quantified using HA, rather than mCitrine, to reliably identify cells derived from the Ascl lineage. mCitrine expression in this mouse line is not specific to adult-born neurons (off-targets), and its expression does not accurately reflect hM4 expression.

      (5) Key markers needed to assess the maturation state of abDGCs are missing from the quantification. Incorporating DCX and NeuN into the analysis would provide essential information about the developmental stage of these cells.

      Minor points:

      (1) The labeling (Distance from the hilus) in Figure 2B is misleading. Is that the same location as the subgranular zone (SGZ)? If so, it's better to use the term SGZ to avoid confusion.

      (2) Cell number information is missing from Figures 2B and 2C; please include this data.

      (3) Sample DG images should clearly delineate the borders between the dentate gyrus and the hilus. In several images, this boundary is difficult to discern.

      (4) In Figure 6, it is not clear how tamoxifen was administered to selectively inhibit the more mature 6-7-week-old abDGC population, nor how this paradigm differs from the chow-based approach. Please clarify the tamoxifen administration protocol and the rationale for its specificity.

      Comments on revisions:

      I appreciate the authors' careful and thorough revisions. They have addressed all of my previous concerns satisfactorily, and the manuscript is now significantly strengthened. I have no further concerns.

    3. Reviewer #2 (Public review):

      In this study, the authors investigate how increasing cognitive demand shapes activity patterns in the dorsal dentate gyrus (DG). Using a touchscreen-based TUNL task combined with TRAP/c-Fos tagging, birth-dating of adult-born granule cells (abDGCs), and chemogenetic inhibition, they show that higher task demand increases mature granule cell (mGC) recruitment and enhances suprapyramidal (SB) versus infrapyramidal (IB) blade bias. Functionally, mGC inhibition reduces overall activity and impairs performance without disrupting blade bias, whereas inhibition of {less than or equal to}7-week-old abDGCs increases mGC activity, abolishes blade bias, and impairs discrimination under high-demand conditions. These findings suggest that effective pattern separation depends not only on overall DG activity levels but also on the spatial organization of recruited ensembles.

      The integration of touchscreen TUNL with temporally controlled activity tagging and birth-dated cohorts is technically strong. Quantification of SB-IB bias and radial/apical distributions adds anatomical precision beyond bulk activity measures. The comparison between mGC and abDGC inhibition is conceptually compelling and supports dissociable functional roles. Overall, the data convincingly demonstrate that increasing cognitive demand amplifies blade-biased DG recruitment and that mGCs and abDGCs differentially contribute to both behavioral performance and network organization.

      However, how abDGCs are integrated into the mGC network under high cognitive demand remains unresolved. Additional experiments are needed to clarify how abDGCs shape spatial recruitment patterns and whether they directly inhibit or indirectly regulate mGC activity to maintain high performance.

      Furthermore, the authors frame "high cognitive demand" as a multidimensional construct encompassing broad behavioral challenge. It would strengthen the work to delineate how local abDGC-mGC circuit interactions regulate specific task components in real time. This will require higher temporal resolution approaches, as TRAP and c-Fos labeling integrate activity over prolonged windows and primarily reflect sustained engagement rather than moment-to-moment computations.<br /> The central conclusion that dentate function depends on coordinated spatial recruitment rather than total activity magnitude is supported by the data, although mechanistic interpretations should be tempered given methodological limitations.<br /> Overall, this work advances models of adult neurogenesis by emphasizing a critical-period modulatory role of abDGCs in organizing DG network activity during high-demand discrimination. The combined behavioral and circuit-level framework is likely to be influential in the field.

    4. Reviewer #3 (Public review):

      This study examines the role of dentate gyrus neuronal populations, reflecting neurogenesis and anatomical location (suprapyramidal vs infrapyramidal blade), in a mnemonic discrimination task that taxes the pattern separation functions of the dentate. The authors measure dentate gyrus activity resulting from cognitive training and test whether adult neurogenesis is required for both the anatomical patterns of activity and performance in the cognitive task. The authors find that more cognitively challenging variants of the task evoked more dentate activity, but also distinct patterns of activity (more activity in the suprapyramidal blade, less in the infdrapyramidal blade). Using chemogenetic approaches they silence mature vs immature dentate gyrus neurons and find that only mature neurons (either the general population or specifically mature adult-born neurons), and not immature adult-born neurons, are required for the difficult version of the task. Inhibition of mature adult-born neurons furthermore increased overall activity in the dentate and reduced the biased pattern of activity across the blades, consistent with evidence that adult-born neurons broadly regulate dentate gyrus activity.

      Comments on revisions:

      I appreciate the efforts the authors have taken to revise this manuscript. I have only minor concerns with this revised version of the manuscript:

      Methods state that significance is defined as P<0.05 but some results are interpreted as significant when P=0.05. Either the alpha value needs to change or the interpretation needs to change.

      I believe the statistical results for group and blade effects for the ANOVAs, in Figs 2,3 & 4, appear to be switched (blade should be significant, not group).

      I appreciate that sometimes there is not a perfect overlap between immunohistochemical signals, but I continue to believe that the spatially-non-overlapping TRAP and EDU signals in Fig 3 is caused by these 2 markers being in different cells. A Z-stack or orthogonal projection could verify/disprove this concern.

    5. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      This manuscript investigates how dentate gyrus (DG) granule cell subregions, specifically suprapyramidal (SB) and infrapyramidal (IB) blades, are differentially recruited during a high cognitive demand pattern separation task. The authors combine TRAP2 activity labeling, touchscreen-based TUNL behavior, and chemogenetic inhibition of adult-born dentate granule cells (abDGCs) or mature granule cells (mGCs) to dissect circuit contributions.

      This manuscript presents an interesting and well-designed investigation into DG activity patterns under varying cognitive demands and the role of abDGCs in shaping mGC activity. The integration of TRAP2-based activity labeling, chemogenetic manipulation, and behavioral assays provides valuable insight into DG subregional organization and functional recruitment. However, several methodological and quantitative issues limit the interpretability of the findings. Addressing the concerns below will greatly strengthen the rigor and clarity of the study.

      Major points:

      (1) Quantification methods for TRAP+ cells are not applied consistently across panels in Figure 1, making interpretation difficult. Specifically, Figure 1F reports TRAP+ mGCs as density, whereas Figure 1G reports TRAP+ abDGCs as a percentage, hindering direct comparison. Additionally, Figure 1H presents reactivation analysis only for mGCs; a parallel analysis for abDGCs is needed for comparison across cell types.

      In Figure 1G and 1H we report TRAP+ abDGCs as a percentage rather than density because we are analyzing colocalization of the two markers, which are very sparse in this population. Given the very low number of double-labeled abDGCs, calculating density would not be practical. In the revised manuscript we have clarified the rationale for using these measures. As noted in the current text, we did not observe abDGCs co-expressing TRAP and c-Fos; we have made this point more explicit to guide interpretation of these data.

      (2) The anatomical distribution of TRAP+ cells is different between low- and high-cognitive demand conditions (Figure 2). Are these sections from dorsal or ventral DG? Is this specific to dorsal DG, as it is preferentially involved in cognitive function? What happens in ventral DG?

      The sections shown in Figure 2 were obtained from the dorsal dentate gyrus (see Methods, “Histology and imaging”: stereotaxic coordinates −1.20 to −2.30 mm relative to bregma, Paxinos atlas). From a feasibility standpoint, it is not possible to analyze the entire longitudinal extent of the hippocampus with these low-throughput histological approaches. We therefore focused on the dorsal DG, for which there is a strong functional rationale. A large body of work indicates that the dorsal hippocampus, and specifically the dorsal DG, is preferentially involved in spatial memory and in the fine contextual discrimination that underlies pattern separation. The dorsal hippocampus is critical for encoding and distinguishing similar spatial representations, a core component of the high-cognitive demand task used here. In contrast, the ventral DG is more strongly associated with emotional regulation and affective memory processing and is less implicated in high-resolution spatial encoding. For these reasons, the present study was designed to assess TRAP+ cell distributions specifically in the dorsal DG.

      (3) The activity manipulation using chemogenetic inhibition of abDGCs in AsclCreER; hM4 mice was performed; however, because tamoxifen chow was administered for 4 or 7 weeks, the labeled abDGC population was not properly birth-dated. Instead, it consisted of a heterogeneous cohort of cells ranging from 0 to 5-7 weeks old. Thus, caution should be taken when interpreting these results, and the limitations of this approach should be acknowledged.

      We agree that prolonged tamoxifen administration results in labeling a heterogeneous population of abDGCs spanning approximately 0 to 5–7 weeks of age, rather than a precisely birth-dated cohort. This is a limitation of this approach and we have included discussion of this in more detail in the revised manuscript.

      (4) There is a major issue related to the quantification of the DREADD experiments in Figure 4, Figure 5, Figure 6, and Figure 7. The hM4 mouse line used in this study should be quantified using HA, rather than mCitrine, to reliably identify cells derived from the Ascl lineage. mCitrine expression in this mouse line is not specific to adult-born neurons (off-targets), and its expression does not accurately reflect hM4 expression.

      We agree that mCitrine is not a marker that allows localization of hM4Di as it is well known that the mCitrine can be independently expressed in a Cre independent manner in this mouse. As suggested, we have removed the figure that showed the mCitrine and have performed immunohistochemical localization of the DREADD with an antibody against the HA tag. This is now shown in Figure 5.

      (5) Key markers needed to assess the maturation state of abDGCs are missing from the quantification. Incorporating DCX and NeuN into the analysis would provide essential information about the developmental stage of these cells.

      The goal of this study was to examine activity patterns of adult-born versus mature granule cells, rather than to assess maturation state. The adult-born neurons analyzed were 25–39 days old, an age at which point most cells have progressed beyond the DCX⁺ stage and are expected to express NeuN based on prior work. We therefore do not think that including DCX or NeuN quantification would provide additional information relevant to the aims or interpretation of this study.

      Minor points:

      (1) The labeling (Distance from the hilus) in Figure 2B is misleading. Is that the same location as the subgranular zone (SGZ)? If so, it's better to use the term SGZ to avoid confusion.

      We have updated Figure 2B, the Methods, and the main text to more explicitly localize this which it the boundary between the subgranular zone (SGZ) and the hilus.

      (2) Cell number information is missing from Figures 2B and 2C; please include this data.

      We have now added the cell number information to the figure legends. In Figures 2B and 2C, each point corresponds to a single cell, with an equal number of mice per group. The total number of TRAP⁺ cells per mouse is shown in Figure 1F, which reports TRAP⁺ cell densities by group.

      (3) Sample DG images should clearly delineate the borders between the dentate gyrus and the hilus. In several images, this boundary is difficult to discern.

      We made the DG-hilus boundaries clearer in the sample images to improve visualization and interpretation.

      (4) In Figure 6, it is not clear how tamoxifen was administered to selectively inhibit the more mature 6-7-week-old abDGC population, nor how this paradigm differs from the chow-based approach. Please clarify the tamoxifen administration protocol and the rationale for its specificity.

      We apologize for the confusion here. The protocol used in Figure 6 is the same tamoxifen chow–based approach as in Figure 5, differing only in the duration of tamoxifen exposure. Mice in Figure 5 received tamoxifen chow for 7 weeks, whereas mice in Figure 6 received it for 4 weeks, restricting labeling to a younger and narrower cohort of adult-born DGCs. Thus, the population targeted in Figure 6 is younger than that in Figure 5 and does not correspond to mature 6–7-week-old neurons. By contrast, the experiment in Figure 4 targets a more mature population, consisting predominantly of ~5-week-old adult-born neurons as well as mature granule cells, which are Dock10-positive and express Cre endogenously, allowing selective manipulation of this later-stage population.

      We have corrected the paragraph accordingly and clarified the age range of the labeled populations in the revised manuscript.

      Reviewer #2 (Public review):

      Summary

      In this manuscript, the authors combine an automated touchscreen-based trial-unique nonmatching-to-location (TUNL) task with activity-dependent labeling (TRAP/c-Fos) and birth-dating of adult-born dentate granule cells (abDGCs) to examine how cognitive demand modulates dentate gyrus (DG) activity patterns. By varying spatial separation between sample and choice locations, the authors operationally increase task difficulty and show that higher demand is associated with increased mature granule cell (mGC) activity and an amplified suprapyramidal (SB) versus infrapyramidal (IB) blade bias. Using chemogenetic inhibition, they further demonstrate dissociable contributions of abDGCs and mGCs to task performance and DG activation patterns.

      The combination of behavioral manipulation, spatially resolved activity tagging, and temporally defined abDGC perturbations is a strength of the study and provides a novel circuit-level perspective on how adult neurogenesis modulates DG function. In particular, the comparison across different abDGC maturation windows is well designed and narrows the functionally relevant population to neurons within the critical period (~4-7 weeks). The finding that overall mGC activity levels, in addition to spatially biased activation patterns, are required for successful performance under high cognitive demand is intriguing.

      Major Comments

      (1) Individual variability and the relationship between performance and DG activation.

      The manuscript reports substantial inter-animal variability in the number of days required to reach the criterion, particularly during large-separation training. Given this variability, it would be informative to examine whether individual differences in performance correlate with TRAP+ or c-Fos+ density and/or spatial bias metrics. While the authors report no correlation between success and TRAP+ density in some analyses, a more systematic correlation across learning rate, final performance, and DG activation patterns (mGC vs abDGC, SB vs IB) could strengthen the interpretation that DG activity reflects task engagement rather than performance only.

      As mentioned, we previously reported no correlation between task success and TRAP+ density. We have now performed additional analyses examining correlations with learning rate, final performance, and DG activation patterns (mGC vs abDGC, SB vs IB), and found no significant relationships. Therefore, as we did not find any positive correlations the original interpretation that DG activity primarily reflects task engagement rather than performance level seems the most parsimonious.

      (2) Operational definition of "cognitive demand".

      The distinction between low (large separation) and high (small separation) cognitive demand is central to the manuscript, yet the definition remains somewhat broad. Reduced spatial separation likely alters multiple behavioral variables beyond cognitive load, including reward expectation, attentional demands, confidence, engagement, and potentially motivation. The authors should more explicitly acknowledge these alternative interpretations and clarify whether "cognitive demand" is intended as a composite construct rather than a strictly defined cognitive operation.

      We agree that reducing spatial separation between stimuli likely engages multiple behavioral and cognitive processes beyond a single, strictly defined operation. We have now clarified this point in the manuscript and explicitly state that our use of the term “cognitive demand” reflects a multidimensional behavioral challenge rather than a singular cognitive process (see Discussion).

      (3) Potential effects of task engagement on neurogenesis.

      Given the extensive behavioral training and known effects of experience on adult neurogenesis, it remains unclear whether the task itself alters the size or maturation state of the abDGC population. Although the focus is on activity and function rather than cell number, it would be useful to clarify whether neurogenesis rates were assessed or controlled for, or to explicitly state this as a limitation.

      While the primary goal of this study was to examine activity and functional recruitment of adult-born granule cells, we also quantified the survival of birth-dated neurons at the end of behavioral training. Density measurements of BrdU⁺ and EdU⁺ cells revealed no differences across experimental groups, indicating that engagement in the pattern separation task, across low to high cognitive demand conditions, did not significantly alter survival of adult-born neurons. In addition, we examined the spatial distribution of BrdU⁺ and EdU⁺ neurons between the suprapyramidal and infrapyramidal blades of the dentate gyrus. The proportion of newborn neurons was consistent across all groups, with approximately 60% located in the suprapyramidal blade and 40% in the infrapyramidal blade. These findings indicate that behavioral training did not alter the baseline distribution of adult-born neurons. We have now clarified these points in the manuscript (See Results).

      (4) Temporal resolution of activity tagging.

      TRAP and c-Fos labeling provide a snapshot of neural activity integrated over a temporal window, making it difficult to determine which task epochs or trial types drive the observed activation patterns. This limitation is partially acknowledged, but the conclusions occasionally imply trial-specific or demand-specific encoding. The authors should more clearly distinguish between sustained task engagement and moment-to-moment trial processing, and temper interpretations accordingly. While beyond the scope of the current study, this also motivates future experiments using in vivo recording approaches.

      We agree and have made changes to the manuscript to discuss these points (see Discussion and Limitations).

      (5) Interpretation of altered spatial patterns following abDGC inhibition.

      In the abDGC inhibition experiments, Cre+ DCZ animals show delayed learning relative to controls. As a result, when animals are sacrificed, they may be at an intermediate learning stage rather than at an equivalent behavioral endpoint. This raises the possibility that altered DG activation patterns reflect the learning stage rather than a direct circuit effect of abDGC inhibition. Additional clarification or analysis controlling for the learning stage would strengthen the causal interpretation.

      We agree that differences in learning stage could in principle confound the interpretation of DG activation patterns. However, although Cre+ DCZ-treated mice exhibited delayed learning, they ultimately reached the same performance criterion as control animals. Thus, adult-born DGC inhibition did not prevent learning but increased the time required to reach criterion, indicating that these neurons are beneficial for learning efficiency rather than strictly necessary for task acquisition. Importantly, all animals were sacrificed only after reaching the predefined success criterion. Therefore, the immunohistochemical analyses were performed at the same behavioral endpoint for Cre+ DCZ and control groups, even though the number of training days differed. Consequently, the observed differences in DG activation reflect circuit recruitment at equivalent task mastery rather than differences in learning stage.

      (6) Relationship between c-Fos density and behavioral performance.

      The study reports that abDGC inhibition increases c-Fos density while impairing performance, whereas mGC inhibition decreases c-Fos density and also impairs performance. This raises an important conceptual question regarding the relationship between overall activity levels and task success. The authors suggest that both sufficient activity and appropriate spatial patterning are required, but the manuscript would benefit from a more explicit discussion of how different perturbations may shift the identity, composition, or coordination of the active neuronal ensemble rather than simply altering total activity levels.

      We agree that our findings highlight that successful performance is not determined solely by the overall level of dentate gyrus activity, but rather by the composition and spatial organization of the active neuronal ensemble. In our study, inhibition of abDGCs increased overall mGC activity while disrupting the spatially organized, blade-biased activation pattern and impaired performance. In contrast, direct inhibition of mGCs reduced global excitability but preserved the relative spatial organization of active neurons in animals that continued to perform the task. These findings suggest that different perturbations alter task performance by shifting the identity and coordination of the active neuronal ensemble, rather than simply increasing or decreasing total activity levels. We have now expanded the Discussion to more explicitly address how dentate gyrus computations may depend on the structured recruitment of granule cell ensembles and how distinct manipulations differentially disrupt this organization.

      Reviewer #3 (Public review):

      Summary:

      The authors used genetic models and immunohistochemistry to identify how training in a spatial discrimination working memory task influences activity in the dentate gyrus subregion of the hippocampus. Finding that more cognitively challenging variants of the task evoked more and distinct patterns of activity, they then investigated whether newborn neurons in particular were important for learning this task and regulating the spatial activity patterns.

      Strengths:

      The focus on precise anatomical locations of activity is relatively novel and potentially important, given that little is known about how DG subregions contribute to behavior. The authors also use a task that is known to depend on this memory-related part of the brain.

      Weaknesses:

      Statistical rigor is insufficient. Many statistical results are not stated, inappropriate tests are used, and sample sizes differ across experiments (which appear to potentially underlie null results). The chemogenetic approach to inhibit adult-born neurons also does not appear to be targeting these neurons, as judged by their location in the DG.

      Please refer to the updated statistical analyses in response to the recommendations below.

      Recommendations for the authors:

      Reviewing Editor Comments

      Please note that reviewers agreed that appropriate revisions are needed to increase the strength of evidence for the paper's claims. Concerns were raised about a lack of statistical rigor in the statistical analyses used. Results of statistical tests were not consistently provided (i.e., statistic applied, value of statistic, degrees of freedom, p-value), and seemingly inappropriate statistical tests were used in some instances. Also, some comparisons had lower statistical power than others. When clarifying the statistical approaches used in the manuscript, we also encourage you to consider reading this article that outlines common statistical mistakes (Makin TR, Orban de Xivry JJ. Ten common statistical mistakes to watch out for when writing or reviewing a manuscript. Elife. 2019 Oct 9;8:e48175. doi: 10.7554/eLife.48175.), such as the importance of not basing conclusions on a significant p-value for one pair-wise comparison vs a non-significant p-value for another pairwise comparison (i.e., groups that are being compared should be included in the same statistical analysis, and interaction effects should be reported when appropriate). We hope that you find this information to be helpful should you decide to submit a revised manuscript to eLife.

      Reviewer #1 (Recommendations for the authors):

      (1) Standardize TRAP+ quantification across Figure 1.

      Please report TRAP+ cell numbers using consistent metrics (e.g., density or percentage) to enable comparison across cell types. In addition, extend the TRAP+ reactivation analysis in Figure 1H to include abDGCs so that reactivation dynamics can be compared directly between mGCs and abDGCs.

      Reply in Public Review

      (2) Clarify whether dorsal or ventral DG was analyzed in Figure 2.

      The differing anatomical distributions of TRAP+ cells under low- and high-demand conditions raise important questions about DG axis specificity. Please indicate whether analyses were performed in dorsal DG, ventral DG, or both, and provide data or justification accordingly.

      Reply in Public Review

      (3) Acknowledge limitations of the tamoxifen-chow labeling strategy in AsclCreER; hM4 experiments.

      Since tamoxifen chow administered over 4-7 weeks labels a heterogeneous abDGC population spanning a broad age range, this approach does not generate birth-dated cohorts. This limitation should be clearly addressed in the text and interpretations, particularly related to cell age-dependent effects, should be tempered.

      Reply in Public Review

      (4) Revise DREADD quantification using HA rather than mCitrine.

      The hM4 mouse line requires HA immunostaining to accurately identify Ascl-lineage cells expressing the DREADD receptor. Because mCitrine is not specific to adult-born neurons and does not reliably reflect hM4 expression, quantification based on mCitrine should be revised.

      Reply in Public Review

      (5) Include markers to assess abDGC maturation state.

      Adding quantification of DCX and NeuN would help define the developmental stage of abDGCs in key experiments and improve the interpretation of cell-age-dependent effects.

      Reply in Public Review

      (6) Clarify DG layer boundaries and terminology in Figure 2.

      If the metric labeled "Distance from the hilus" corresponds to the subgranular zone (SGZ), using SGZ terminology would prevent confusion. Additionally, please provide clearer delineation of DG and hilus borders in sample images.

      Reply in Public Review

      (7) Provide missing cell number data for Figures 2B and 2C.

      Reply in Public Review

      (8) Clarify the tamoxifen administration protocol in Figure 6.

      Please describe how the protocol selectively targets 6-7-week-old abDGCs and how it differs from the chow-based approach. This will help readers understand the intended specificity of the manipulation.

      Reply in Public Review

      Reviewer #2 (Recommendations for the authors):

      (1) EdU birth-dating timeline

      The manuscript would benefit from a clearer description of the EdU birth-dating timeline, ideally with a schematic similar to that provided for BrdU in Supplementary Figure 1.

      We appreciate the suggestion. However, we did not include a separate schematic for EdU because its use and birth-dating logic are identical to BrdU (both are thymidine analogs administered systemically and incorporated during S-phase). Therefore, the timeline shown in Supplementary Figure 1 applies equally to both markers. We have clarified this point in the Methods section to avoid confusion.

      (2) Clarity of TUNL task description.

      The description of the TUNL task, particularly for readers unfamiliar with touchscreen-based paradigms, is difficult to follow without consulting prior literature. A simplified schematic or a clearer step-by-step explanation in the main text or supplementary material would improve accessibility.

      We note that the main steps of the TUNL protocol are illustrated in Figure 1A, Supplementary Figure 2A and 2B. Nevertheless, we agree that the description in the text can be made clearer for readers less familiar with touchscreen-based tasks. Thus , we have now revised the Methods section to provide a clearer step-by-step description of the TUNL.

      (3) Influence of outliers in Figure 1G.

      In Figure 1G, the reported trend that ~1% of 25-39-day-old abDGCs are TRAP+ during LS trials appears to be driven by a small number of outliers. This should be acknowledged, and the wording of the conclusion moderated to reflect the variability in the data.

      We agree with the reviewer that the apparent outliers reflect the inherent sparsity of TRAP labeling in this population. In absolute terms, this corresponds to between 0 and 2 TRAP⁺ 25–39-day-old abDGCs per mouse, such that the presence or absence of a small number of labeled cells can appear as outliers when expressed as a percentage. We have revised the text to acknowledge this (see Results).

      (4) Presentation of learning curves.

      Rather than focusing primarily on "days before criterion" (DBC), it would be helpful to show full learning curves across the entire training period. This would provide a clearer picture of acquisition dynamics and inter-animal variability.

      We agree that learning curves can be informative in many behavioral paradigms. However, in our protocol, mice do not undergo the same number of training days because training stops individually once each animal reaches criterion. As a result, plotting full learning curves would produce trajectories of different lengths, making group comparisons difficult and visually cluttered. For this reason, we aligned animals based on days before criterion (DBC), which allows direct comparison of learning dynamics relative to task acquisition. We also consider the cumulative probability representation to be the most appropriate way to summarize learning progression across animals in this context which are also included in the figures.

      (5) Clarification of Figure 3B labeling

      In Figure 3B, the identity of the orange-labeled group above the LS condition is unclear. Clarification in the figure legend would improve interoperability.

      Figure 3B includes two experimental groups. One group performed both the large- and small-separation conditions; this group is shown in orange and labeled LS. Within this group, the upper orange trace corresponds to performance in the large-separation condition, while the lower orange trace corresponds to performance in the small-separation condition. The second group is a control group that performed only the large-separation configuration, and therefore only a single green trace is shown. We agree that this distinction was not sufficiently clear and have revised the figure legend and text to clarify the identity of each trace.

      Reviewer #3 (Recommendations for the authors):

      (1) Please label figures and, even better, put the legends on the same page.

      (2) Just to confirm, in establishing the task, mice performed above 70% for the small separation trials in one of the sessions on 2 consecutive days, for each criterion? Performance seems to be below 70%.

      Yes. To meet the criterion, each mouse had to reach ≥70% correct performance in at least one of the two daily sessions on two consecutive days. We then averaged the performance across both sessions for each of those days. As a result, if one session was ≥70% but the other was lower, the daily average could fall below 70%. The values shown in the figure correspond to these daily averages, further averaged across mice.

      (3) mGC needs to be explicitly defined. Am I assuming any non-birthdated GC is an mGC according to the authors? (which means it is unknown whether they are in fact mature, though likely most of them are).

      In this study, “mature granule cells” (mGCs) refer operationally to granule cells that are not birth-dated with BrdU or EdU and therefore are not classified as adult-born neurons within the defined labeling window. We agree that this population is not directly age-defined, and that while the majority are expected to be mature based on their birth timing relative to the labeling period, we cannot exclude the possibility that a small fraction may include younger, unlabeled neurons. We have now explicitly defined this usage of mGCs in the Methods and clarified this point in the text to avoid ambiguity.

      (4) Methods state that Kruskal-Wallis tests were used when more than 3 groups were compared, but I don't see these stats presented (e.g., for trap data in Figure 1, blade x task TRAP expt in Figure 3 (should be 2-way RM anova here and elsewhere), etc) or any corrections for multiple comparisons. I appreciate that the mean rates of TRAPed abGCs are higher in the S and LS groups than in the shaping group, but most mice do not have any BrdU+ cells that are also TRAPed, and there are no statistics here to support the claim. I don't think there is enough sampling to accurately quantify activation of abGCs. Also, no stats to support the claim that TRAPing increases at the "tip of the SB after the more demanding LS task".

      We agree with this comment. We have now systematically tested all datasets for normality (by group) and applied parametric tests when the data met normality assumptions, and non-parametric tests otherwise. The statistical analyses have been revised accordingly. We added the appropriate tests (including two-way ANOVA where relevant, such as for blade × group comparisons) and now report full statistics in the figure legends and results sections. For the TRAP analyses in adult-born DGCs, we explicitly acknowledge the very low number of BrdU⁺/TRAP⁺ cells, which limits statistical power and, in some cases, precludes robust statistical testing. These limitations are now clearly stated in the Results and Discussion, and the corresponding interpretations have been tempered. For all Kruskal–Wallis tests, post hoc pairwise comparisons were performed using Dunn’s test, with Bonferroni correction for multiple comparisons, as now specified in the Methods section. We also expanded the Methods to describe the statistical workflow in detail. In addition, we have added the previously missing statistical analysis for Figure 2C. Comparisons were performed between the 0–50% and 50–100% portions of the blade, where 0% corresponds to the apex and 100% corresponds to the distal tip of the blade.

      (5) Figure 3I: I can't figure out which effect is statistically significant here (what does the asterisk signify?). Why no individual data points in this graph?

      We agree that the absence of individual data points reduced interpretability, and we have now updated the figure to include individual data points to better illustrate data distribution and variability.

      (6) The gradient of activity (shap < S < LS) could be due to how long they've been trained on a given stage (e.g. less activity during shaping because they have habituated, and neurons encoding that task phase have already been selected)

      We agree that task duration and habituation could, in principle, influence activity levels. Under this interpretation, higher activity would primarily reflect task novelty rather than cognitive demand. However, our data do not support this explanation. Specifically, we found no correlation between the number of training days required to reach criterion and c-Fos–positive or TRAP-positive cell density within a given stage. Thus, animals that reached criterion rapidly did not show higher activity levels than animals that required more days of training and were presumably more habituated to the task demands. This suggests that the observed activity gradient (shaping < S < LS) is not driven by exposure duration or habituation, but rather reflects differences in cognitive demand across task stages.

      (7) The TRAP+ EDU+ cell in Figure 3 looks odd because the BrdU signal is (a lot) larger than the TRAP signal, but BrdU is in the nucleus and should be smaller.

      We agree that the example in Figure 3 is not optimal. In dividing cells, BrdU/EdU signals can sometimes appear broader or closely apposed, which may affect their apparent size.

      (8) For the Ascl-HM4Di experiment, HM4Di appears to be expressed in all of the areas of the granule cell layer where abGCs are NOT located (i.e. no expression in the deep cell layer, near the sgz). This is problematic because it suggests perhaps abGCs are not inhibited as expected.

      As noted in our response to Reviewer #1, we did not use the mCitrine to localize the DREADD receptor as it has been demonstrated that mCitrine expression is expressed in a Cre-independent manner and not correlated with hM4Di expression. In the revised manuscript we include a representative image were we performed immunostaining using an HA antibody to directly visualize hM4Di and confirm its expression in adult-born granule cells (Figure 5).

      (9) Line 267: "6-7 week old neurons by themselves do not influence either the performance of mice in the task". I don't think this is fair because this experiment wasn't designed with as much power to detect an effect. The group trends are in the same direction, but there are many fewer mice in this experiment (n=6/group) than in the =<7w experiment (n=11/group), where the effect just reached statistical significance.

      We are sorry for this confusion which came from an incorrect version. The experiment shown in Figure 6 does not target 6–7-week-old neurons specifically. It uses the same tamoxifen chow–based protocol as Figure 5, but with a shorter exposure (4 weeks vs. 7 weeks), thereby labeling a younger and more restricted cohort of adult-born DGCs. By contrast, Figure 4 targets a more mature population, consisting predominantly of ~5-week-old adult-born neurons as well as mature granule cells (Dock10+).

      We have corrected the paragraph accordingly and clarified the age range of the labeled populations in the revised manuscript.

    1. eLife Assessment

      This paper describes Unbend - a new method for measuring and correcting motions in cryo-EM images, with a particular emphasis on more challenging in situ samples such as lamellae and whole cells. The method, which fits a B-spline model using cross-correlation-based local patch alignment of micrograph frames, represents an important tool for the cryo-EM community. The authors elegantly use 2D template matching to provide convincing evidence that Unbend outperforms the previously reported method of Unblur by the same authors. Comparison to alternative programs for motion correction shows smaller gains, but with interesting differences between data sets.

    2. Reviewer #1 (Public review):

      Kong et al.'s work describes a new approach that does exactly what the title states, "Correction of local beam-induced sample motion in cryo-EM images using a 3D spline model." It is, therefore, a more elaborate approach than current methods in the field for the "movie alignment" stage. Additionally, the work uses 2DTM (2D Template Matching)-related measurements to quantify the improvement of the new method compared to other methods in the field. I find both parts very compelling (the new method and the testing approach)

      On a "focused" view, the strengths of the work rest on presenting a better approach for motion correction and on measuring their performance very well at the 2D level in a compelling manner

      On a more "general" view, the authors introduce the important notion that even one of the most worked-out steps in the processing workflow can still be done better in a measurable way, and that this could lead to better results beyond the 2DTM metrics used for testing, reflecting in better results along the processing pipeline (although the manuscript does not explore further this notion)

      On the "usability" side, the method is still CPU-based and is slower than standards in the field. This may pose significant limitations in practical work, although the authors are aware of this issue and are working on it.

    3. Reviewer #2 (Public review):

      Summary:

      The authors present a new method, Unbend, for measuring motion in cryo-EM images, with a particular emphasis on more challenging in situ samples such as lamella and whole cells (that can be more prone to overall motion and/or variability in motion across a field of view). Building on their previous approach of full-frame alignment (Unblur), they now perform full-frame alignment followed by patch alignment, and then use these outputs to generate a 3D model of the motion. This model allows them to estimate a continuous, per-pixel shift field for each movie frame that aims to better describe complex motions and so ultimately generate improved motion-corrected micrographs. Performance of Unbend is evaluated using the 2D template matching (2DTM) method developed previously by the lab, and results are compared to using full-frame correction alone and to the leading local motion correction methods. Several different in situ samples are used for evaluation covering a broad range that will be of interest to the rapidly growing in situ cryo-EM community.

      Strengths:

      The method appears an elegant way of describing complex motions in cryo-EM samples and the authors present sound data that Unbend generally improves SNR of aligned micrographs as well as increases detection of particles matching the 60S ribosome template when compared to using full-frame correction alone and since review to the leading local motion correction methods. The authors also give interesting insights into how different areas of a lamella behave with respect to motion by using Unbend on a montage dataset collected previously by the group. There is growing interest in imaging larger areas of in situ samples at high resolution and these insights contribute valuable knowledge. Additionally, the availability of data collected in this study through the EMPIAR repository will be much appreciated by the field.

      Weaknesses:

      A major weakness was comparing this method to full-frame approaches only but this has since been addressed by the authors during review and Unbend is compared to MotionCor2, 3, CryoSPARC and Warp. The improvements here are smaller, generally it seems to perform on par with the above methods, but there are significant gains for certain samples (e.g. the M. pneumoniae sample). A comment from this reviewer about using an adaptive approach to decide if/when to proceed to the full Unbend pipeline, over full-frame alone, has been addressed by the authors.

    4. Reviewer #3 (Public review):

      Summary

      Kong and coauthors describe and implement a method to correct local deformations due to beam induced motion in cryo-EM movie frames. This is done by fitting a 3D spline model to a stack of micrograph frames using cross-correlation-based local patch alignment to describe the deformations across the micrograph in each frame, and then computing the value of the deformed micrograph at each pixel by interpolating the undeformed micrograph at the displacement positions given by the spline model. A graphical interface in cisTEM allows the user to visualise the deformations in the sample, and the method is proved to be successful by showing improvements in 2D template matching (2DTM) results on the corrected micrographs using five in situ samples.

      Impact

      This method has great potential to further streamline the cryo-EM single particle analysis pipeline by shortening the required processing time as a result of obtaining higher quality particles early in the pipeline, and is applicable to both old and new datasets, therefore being relevant to all cryo-EM users.

      Strengths

      (1) The key idea of the paper is that local beam induced motion affects frames continuously in space (in the image plane) as well as in time (along the frame stack), so one can obtain improvements in the image quality by correcting such deformations in a continuous way (deformations vary continuously from pixel to pixel and from frame to frame) rather than based on local discrete patches only. 3D splines are used to model the deformations: they are initialised using local patch alignments and further refined using cross-correlation between individual patch frames and the average of the other frames in the same patch stack.

      (2) Another strength of the paper is using 2DTM to show that correcting such deformations continuously using the proposed method does indeed lead to improvements, as evidenced by the number of particles found and the quality of the detections (measured using 2DTM SNR). This is shown using five in situ datasets, where local motion is quantified using statistics based on the estimated motions of ribosomes. The same analysis is performed using other deformation correction tools, with Unbend showing superior performance in terms of particle detected or 2DTM SNR of the detections.

    5. Author Response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their constructive comments. A central concern raised is the comparison of performance with existing motion-correction methods. In response, we performed motion correction using several widely used approaches and compared results using the number of particles detected by 2DTM and their associated SNR. To minimize potential bias, we selected parameters to give each method a comparable level of model flexibility so that the results are as directly comparable as possible. Overall, Unbend performs the best. We note that extensive, method-specific parameter optimization could further affect absolute performance, and a comprehensive benchmarking study is therefore beyond the scope of this work

      Public Reviews:

      Reviewer #1 (Public review):

      Kong et al.'s work describes a new approach that does exactly what the title states: "Correction of local beam-induced sample motion in cryo-EM images using a 3D spline model." I find the method appropriate, logical, and well-explained. Additionally, the work suggests using 2DTM-related measurements to quantify the improvement of the new method compared to the old one in cisTEM, Unblur. I find this part engaging; it is straightforward, accurate, and, of course, the group has a strong command of 2DTM, presenting a thorough study.

      However, everything in the paper (except some correct general references) refers to comparisons with the full-frame approach, Unblur. Still, we have known for more than a decade that local correction approaches perform better than global ones, so I do not find anything truly novel in their proposal of using local methods (the method itself- Unbend- is new, but many others have been described previously). In fact, the use of 2DTM is perhaps a more interesting novelty of the work, and here, a more systematic study comparing different methods with these proposed well-defined metrics would be very valuable. As currently presented, there is no doubt that it is better than an older, well-established approach, and the way to measure "better" is very interesting, but there is no indication of how the situation stands regarding newer methods.

      Regarding practical aspects, it seems that the current implementation of the method is significantly slower than other patch-based approaches. If its results are shown to exceed those of existing local methods, then exploring the use of Unbend, possibly optimizing its code first, could be a valuable task. However, without more recent comparisons, the impact of Unbend remains unclear.

      We thank the reviewer for this important point. We agree that comparing against modern local motion-correction approaches is a valuable task. To address this, we added a new benchmarking section (pp. 17–18, lines 444–492, Fig. 8, Fig. 8—figure supplement 1) that compares Unbend against widely used patch-based local correction methods, including MotionCor2, MotionCor3, Warp, and CryoSPARC. Using the same 2DTM-based metrics described in the manuscript (detections per micrograph and SNR distributions for commonly detected particles), we find that Unbend provides the most stable performance across the tested datasets and, in most cases, yields higher detection counts and higher SNR than the alternative methods.

      Regarding runtime, the current implementation is CPU-based and is therefore slower than some optimized GPU-accelerated packages. We now clarify this limitation in the manuscript (line 498–499). Our primary goal in this study is to improve motion-correction accuracy and quantify its impact using 2DTM-based measures. Importantly, higher-quality motion-corrected micrographs can reduce downstream processing cost (e.g., by increasing particle detection efficiency and reducing ambiguous candidates), so modest additional compute times at the motion-correction stage can be offset later in the workflow. We also note that GPU acceleration and additional code-level optimizations are planned for future releases (line 501-503); however, they are not required to evaluate the methodological contribution and the benchmarking results presented here.

      Reviewer #2 (Public review):

      Summary:

      The authors present a new method, Unbend, for measuring motion in cryo-EM images, with a particular emphasis on more challenging in situ samples such as lamella and whole cells (that can be more prone to overall motion and/or variability in motion across a field of view). Building on their previous approach of full-frame alignment (Unblur), they now perform full-frame alignment followed by patch alignment, and then use these outputs to generate a 3D cubic spline model of the motion. This model allows them to estimate a continuous, per-pixel shift field for each movie frame that aims to better describe complex motions and so ultimately generate improved motion-corrected micrographs. Performance of Unbend is evaluated using the 2D template matching (2DTM) method developed previously by the lab, and results are compared to using full-frame correction alone. Several different in situ samples are used for evaluation, covering a broad range that will be of interest to the rapidly growing in situ cryo-EM community.

      Strengths:

      The method appears to be an elegant way of describing complex motions in cryo-EM samples, and the authors present convincing data that Unbend generally improves SNR of aligned micrographs as well as increases detection of particles matching the 60S ribosome template when compared to using full-frame correction alone. The authors also give interesting insights into how different areas of a lamella behave with respect to motion by using Unbend on a montage dataset collected previously by the group. There is growing interest in imaging larger areas of in situ samples at high resolution, and these insights contribute valuable knowledge. Additionally, the availability of data collected in this study through the EMPIAR repository will be much appreciated by the field.

      Thank you for this positive assessment.

      Weaknesses:

      While the improvements with Unbend vs. Unblur appear clear, it is less obvious whether Unbend provides substantial gains over patch motion correction alone (the current norm in the field). It might be helpful for readers if this comparison were investigated for the in situ datasets. Additionally, the authors are open that in cases where full motion correction already does a good job, the extra degrees of freedom in Unbend can perhaps overfit the motions, making the corrections ultimately worse. I wonder if an adaptive approach could be explored, for example, using the readout from full-frame or patch correction to decide whether a movie should proceed to the full Unbend pipeline, or whether correction should stop at the patch estimation stage.

      We thank the reviewer for suggesting an adaptive criterion to decide whether to proceed patch alignment or not. We agree that such an approach could be valuable for efficiency and for avoiding unnecessary model flexibility. However, our results indicate that a simple criterion based on the magnitude of estimated local patch motion is unlikely to be sufficient. For example, in the BS-C-1 cell lysate dataset, (see line 412-417 on page 16), we observe minimal local motion (Figure 4b) with mean patch shifts of only 0.7Å and full-frame alignment already yields comparable detection counts, yet local correction still produces a measurable SNR gain (13.84 ± 0.04 to 14.25 ± 0.04, 3%) and improves SNR for ~70% of the commonly detected targets (Figure 6c). This suggests that residual local distortion can remain even when overall local motion appears small. Establishing a robust, dataset-agnostic stopping rule would therefore require a dedicated, systematic benchmarking study across many samples and acquisition conditions.

      Reviewer #3 (Public review):

      Summary

      Kong and coauthors describe and implement a method to correct local deformations due to beam-induced motion in cryo-EM movie frames. This is done by fitting a 3D spline model to a stack of micrograph frames using cross-correlation-based local patch alignment to describe the deformations across the micrograph in each frame, and then computing the value of the deformed micrograph at each pixel by interpolating the undeformed micrograph at the displacement positions given by the spline model. A graphical interface in cisTEM allows the user to visualise the deformations in the sample, and the method has been proven to be successful by showing improvements in 2D template matching (2DTM) results on the corrected micrographs using five in situ samples.

      Impact

      This method has great potential to further streamline the cryo-EM single particle analysis pipeline by shortening the required processing time as a result of obtaining higher quality particles early in the pipeline, and is applicable to both old and new datasets, therefore being relevant to all cryo-EM users.

      Strengths

      (1) One key idea of the paper is that local beam induced motion affects frames continuously in space (in the image plane) as well as in time (along the frame stack), so one can obtain improvements in the image quality by correcting such deformations in a continuous way (deformations vary continuously from pixel to pixel and from frame to frame) rather than based on local discrete patches only. 3D splines are used to model the deformations: they are initialised using local patch alignments and further refined using cross-correlation between individual patch frames and the average of the other frames in the same patch stack.

      (2) Another strength of the paper is using 2DTM to show that correcting such deformations continuously using the proposed method does indeed lead to improvements. This is shown using five in situ datasets, where local motion is quantified using statistics based on the estimated motions of ribosomes.

      Thank you for this positive assessment.

      Weaknesses

      (1) While very interesting, it is not clear how the proposed method using 3D splines for estimating local deformations compares with other existing methods that also aim to correct local beam-induced motion by approximating the deformations throughout the frames using other types of approximation, such as polynomials, as done, for example MotionCor2.

      We thank the reviewer for this suggestion. We agree that positioning Unbend relative to existing local motion-correction methods is important. In the revised manuscript, we added a dedicated benchmarking section comparing Unbend with widely used local correction approaches, including MotionCor2, MotionCor3, Warp, and CryoSPARC, using the same 2DTM-based metrics (Fig. 8, Fig. 8—figure supplement 1). This section is included on pp. 17–18, lines 444–492. To make the comparison as fair as possible, we matched nominal model flexibility across methods and otherwise used default parameters to reduce method-specific tuning. This expanded comparison provides a direct baseline against current patch-/spline-based approaches and shows that Unbend performs consistently across the in situ datasets evaluated here, with improvements in detection counts and/or SNR in multiple cases.

      (2) The use of 2DTM is appropriate, and the results of the analysis are enlightening, but one shortcoming is that some relevant technical details are missing. For example, the 2DTM SNR is not defined in the article, and it is not clear how the authors ensured that no false positives were included in the particles counted before and after deformation correction. The Jupyter notebooks where this analysis was performed have not been made publicly available.

      We agree that these technical details improve clarity and reproducibility. We have therefore made three changes.

      (1) Definition of 2DTM SNR. We added an explicit definition of the 2DTM SNR in Section “2DTM provides a one-step verification for motion correction”, pp. 11, lines 277–287). Briefly, at each image location we compute cross-correlation values over the searched orientation space and define the 2DTM SNR as the maximum per location z-score across orientations.

      (2) False-positive control / detection threshold. We clarified how detection thresholds were set to control false positives (pp. 11, lines 285–287). Specifically, we used the standard 2DTM statistical framework in which the threshold  is chosen using the one-false-positive (1-FP) criterion (or equivalently, a specified expected false-positive rate). We applied the same thresholding procedure consistently across all motion-corrected micrographs. This ensures that particle counts before/after correction reflect changes in signal recovery.

      (3) Reproducibility of the analysis. We have made the script used for the benchmarking and figure generation publicly available (pp. 24 line 622-623), and we provide a link in the Data Availability statement (pp. 25 line 650). The repository includes sample .star files and a python package that computes detections per micrograph, commonly detected particles, and SNR comparisons.

      (3) It is also not clear how the proposed deformation correction method is affected by CTF defocus in the different samples (are the defocus values used in the different datasets similar or significantly different?) or if there is any effect at all.

      We thank the reviewer for raising this point. In the revised manuscript, we now report the defocus ranges used for each dataset (Table 1) and clarify that all motion-correction comparisons were performed within each dataset using the same CTF estimation and 2DTM settings (pp. 23 line 615-618). Across the five datasets, four were collected at similar defocus ranges (1.0 µm to 1.5µm), whereas one dataset includes near-focus (0.4 µm) micrographs (Table 1). Because Unbend operates on frame alignment/warping rather than CTF modeling, we do not expect a defocus specific effect beyond indirect influences through image SNR and reliability of cross-correlation-based alignment.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The obvious recommendation would be to use their 2DTM approach for a comparison of their new method with other currently used ones

      We agree and added a new comparison section (pp. 17–18, lines 444–492). Addressed above in Response to Reviewer #1 Public Review.

      Reviewer #2 (Recommendations for the authors):

      (1) Line 29, typo. 3 ~ 8% > 3 - 8%.

      Corrected.

      (2) Lines 220 and 226. Should this be e-/Angstrom squared for the exposure?

      Corrected to e<sup>-</sup>/Å<sup>2</sup> (Now pp. 9 lines 230, 236).

      (3) Figure 2 c-d. These are good for instinctively seeing the movement, but I found the legend confusing, as a 10 x 10 pixel array is mentioned, yet the schematics show a higher sampling (30 x 30 pixels? in c-e).

      Thank you for pointing this out. The “10×10” annotation refers to the physical scale, whereas the grid represents pixel sampling. We removed the “10×10” label and now show only the pixel grid to avoid confusion. The caption has been updated to state that the grid corresponds to a 30×30 pixel sampling. (Fig. 2c, d; pp. 31, line 766)

      (4) Figure 4. It would be good if the n of movies analyzed was given in the figure legend.

      Thank you for noticing this. We report the number of movies per dataset in the corresponding summary table (Table 1).

      (5) Figure 5. X/Y axes labels missing (assume pixels). Also, suggest changing the strain scale to % to match the main text description of this figure.

      We added X/Y axis labels, changed the strain scale to % (Figure 5), and specified that the strains are per pixel on pp. 14 line 367. Correspondingly, the X/Y labels and strain scale in strain plots in Figure 4—figure supplementary 1 to 5 are also changed.

      (6) Unify labelling of Figure 4 and 6 (i.e., Bacteria vs. M. pneumoniae, etc.).

      Corrected. Sample labels are now consistent across figures. (Figures 4 and 6)

      Reviewer #3 (Recommendations for the authors):

      Some recommendations related to the points mentioned in the 'Weaknesses' section in the public review:

      (1) If feasible, it would be useful to see a comparison with other existing methods that estimate local deformations (e.g., MotionCor2), at least on some of the datasets. For example, does the proposed method lead to better 2DTM SNR in the detected particles compared to other methods, or higher detection numbers? Alternatively, if such a comparison would require too much additional work and the authors have good reasons to believe that the results are evident, it would be helpful to include a discussion about why the proposed method is expected to perform better, both in terms of the general approach and specific implementation details.

      We agree that this comparison is important. (pp. 17–18, lines 444–492). Addressed above in Response to Reviewer #3 Public Review (1).

      (2) It would be useful to define the 2DTM SNR in the main text of the paper, as well as to address the point about false positives in the picked particles.

      We added an explicit definition of 2DTM SNR and clarified the detection thresholding/false-positive control used in our analysis (pp. 11, lines 277–287). Addressed above in Response to Reviewer #3 Public Review (2.1 and 2.2).

      (3) Regarding the results shown in Figures 4 and 6: do the authors have any insight about how the CTF defocus affects the deformation estimation and correction across the different sample types?

      We now report the defocus ranges used for each dataset (Table 1). We have addressed this problem in Response to Reviewer #3 Public Review (3).

      (4) Will the Jupyter notebooks used for the 2DTM analysis be made publicly available?

      Yes. We have deposited a python script used for the 2DTM benchmarking and figure generation in a public repository and added the link in Data Availability statement. (pp. 23 line 622, pp. 25 line 650). Addressed above in Response to Reviewer #3 Public Review (2.3).

      (5) I would also appreciate a few words about the implementation details of the 3D spline model (e.g., what libraries have been used, if any, or if the authors have implemented their own code for this).

      The 3D spline model and warping code were implemented by us (no external spline library was used) and the relevant implementation details are described in the “Sample distortion modeling and correction” section (pp. 7–10, lines 174–246). For optimization, we used the L-BFGS implementation provided by the dlib library, which is now explicitly cited (pp. 10, line 264).

      Some comments regarding the presentation of the work:

      (1) I found the mathematical background on splines on pages 7-9 a little distracting from the main ideas of the paper, and I believe it could be moved to the methods section. A short description of this in the main text of the paper would suffice, and it would be useful to state clearly when this is background material and when it is the authors' contribution.

      We appreciate the suggestion. Because Unbend includes an in-house spline implementation (no external spline library) and it is the central part of this work, we retained the spline description to support reproducibility. (pp. 7–10, lines 174–246).

      (2) More generally, I found the whole method very interesting, but understanding exactly what all the steps involved were was a bit cumbersome, as they are spread across different sections of the main text. I think it would be useful to have a dedicated section giving the exact steps taken in the algorithm, possibly pointing to the relevant section in the text for more details about each step. This could be, for example, in the form of an 'Algorithm' box or a flowchart.

      We added an Algorithm box as Figure 2 supplement summarizing the end-to-end workflow and pointing to the relevant sections for details (Figure 2—figure supplement 1 Algorithm, pp. 4, line 96–103, pp. 32 line 799). This is intended to make the sequence of steps easier to follow.

      (3) In Figure 3, panels (b) and (c), the difference between the two micrographs, before and after correction, is not very noticeable, particularly the Thon rings in the spectra. I don't know if this is due to the image quality in the paper or if a better example could be shown. For example, the differences are clear in some of the supplementary figures.

      Thank you for the suggestion. We revised the figure by adding annotations to show the recovered Thon rings. This figure shows a vertex motion and is intended not only to show improvement but also to illustrate complex, spatially varying deformation patterns that motivate the 3D spline model (pp. 12, lines 304–308). The supplementary figures display those with highest motions in each sample type, thus the Thon rings for the motion corrected micrograph in higher frequency space look more obvious. We also refer readers to the supplementary examples where the differences are more pronounced (pp. 12, lines 310–312).

    1. eLife Assessment

      This is a valuable study that integrates behavioral and molecular approaches to identify neuromodulators influencing blood-feeding behavior in the disease vector Anopheles stephensi. Through gene expression analyses across blood-seeking life stages and RNA interference experiments, the authors present solid evidence that co-knockdown of the neuromodulators short Neuropeptide F and RYamide affects blood-seeking states in A. stephensi. However, evidence demonstrating that these neuropeptides are sufficient to promote host-seeking is lacking.

    2. Reviewer #2 (Public review):

      Summary:

      In this study, Bansal et al examine and characterize feeding behaviour in Anopheles stephensi mosquitoes. While sharing some similarities to the well-studied Aedes aegypti mosquito, the authors demonstrate that mated-females, but not unmated (virgin) females, exhibit suppression in their blood-feeding behaviour after imbibing an initial bloodmeal. Using brain transcriptomic analysis comparing sugar fed, blood fed and starved mosquitoes, several candidate genes potentially responsible for influencing blood-feeding behaviour were identified, including two neuropeptides (short NPF and RYamide) that are known to modulate feeding behaviour in other mosquito species. Using molecular tools including in situ hybridization, the authors map the distribution of cells producing these neuropeptides in the nervous system and in the gut. Further, by implementing systemic RNA interference (RNAi), the study suggests that both neuropeptides (particularly in the brain, but not in the abdomen since knockdown outside the brain did not affect feeding behaviour) appear to promote blood-feeding while having no impact on sugar feeding. Interestingly, when either of these two neuropeptide gene transcripts were reduced independently by RNAi, the proportion of females acquiring a blood meal was not affected, whereas simultaneous knockdown of both sNPF and RYa led to a reduction in blood feeding behaviour but did not impact sugar feeding.

      Given that the expression of both neuropeptide genes was found in mostly in non-overlapping brain neurons, this suggests that these two neuropeptides may elicit at least partially complementary actions promoting blood feeding in A. stephensi. Indeed, their putative receptors appear to be colocalized within several neurons within the brain, which could explain why knockdown of both sNPF and RYa transcripts was required to affect blood feeding behaviour (although authors could not confirm if either of these neuropeptides act independently as only partial knockdown was achieved in the brain). Finally, while sNPF was mapped to brain neurons and midgut enteroendocrine cells, the authors mapped RYa only in the brain while reporting expression in the abdomen by qPCR, but that was not localized to the midgut EECs (like sNPF). Therefore, the source of RYamide in the abdomen remains unknown in this mosquito species, but could involve the abdominal ganglia where this neuropeptide has been localized in Ae. aegypti.

      Strengths and/or weaknesses:

      Overall, the manuscript was effectively communicated. Previous concerns and requested clarifications have been addressed in the revised manuscript. While advanced cell-specific tools are lacking in this mosquito species, one weakness here is that peptides could have been applied ectopically in attempts to rescue the deficit in blood feeding behaviour following knockdown by RNAi. Further insight in this regard may be provided in future studies by this and other research groups.

      Reviewing editor comment:

      Inclusion of a schematic in Supplementary Figure S9B addresses the point raised by reviewer 1 in the previous round.

    3. Author Response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Here Bansal et al., present a study on the fundamental blood and nectar feeding behaviors of the critical disease vector, Anopheles stephensi. The study encompasses not just the fundamental changes in blood feeding behaviors of the crucially understudied vector, but then use a transcriptomic approach to identify candidate neuromodulation path ways which influence blood feeding behavior in this mosquito species. The authors then provide evidence through RNAi knockdown of candidate pathways that the neuromodulators sNPF and Rya modulate feeding either via their physiological activity in the brain alone or through joint physiological activity along the brain-gut axis (but critically not the gut alone). Overall, I found this study to be built on tractable, well-designed behavioral experiments.

      Their study begins with a well-structured experiment to assess how the feeding behaviors of A. stephensi changes over the course of its life history and in response to its age, mating and oviposition status. The authors are careful and validate their experimental paradigm in the more well-studied Ae. aegypti, and are able to recapitulate the results of prior studies which show that mating is pre-requisite for blood feeding behaviors in Ae. aegypt. Here they find A. stephensi like another Anopheline mosquitoes has a more nuanced regulation of its blood and nectar feeding behaviors.

      The authors then go on to show in a Y- maze olfactometer that to some degree, changes in blood feeding status depend on behavioral modulation to host-cues, and this is not likely to be a simple change to the biting behaviors alone. I was especially struck by the swap in valence of the host-cues for the blood-fed and mated individuals which had not yet oviposited. This indicates that there is a change in behavior that is not simply desensitization to host-cues while navigating in flight, but something much more exciting happening.

      The authors then use a transcriptomic approach to identify candidate genes in the blood feeding stages of the mosquito's life cycle to identify a list of 9 candidates which have a role in regulating the host-seeking status of A. stephensi. Then through investigations of gene knockdown of candidates they identify the dual action of RYa and sNPF and candidate neuromodulators of host-seeking in this species. Overrall, I found the experiments to be welldesigned. I found the molecular approach to be sound. While I do not think the molecular approach is necessarily an all-encompassing mechanism identification (owing mostly to the fact that genetic resources are not yet available in A. stephensi as they are in other dipteran models), I think it sets up a rich lines of research questions for the neurobiology of mosquito behavioral plasticity and comparative evolution of neuromodulator action.

      Strengths:

      I am especially impressed by the authors' attention to small details in the course of this article. As I read and evaluated this article I continued to think how many crucial details I may have missed if I were the scientist conducting these experiments. That attention to detail paid off in spades and allowed the authors to carefully tease apart molecular candidates of blood-seeking stages. The authors top down approach to identifying RYamide and sNPF starting from first principles behavioral experiments is especially comprehensive. The results from both the behavioral and molecular target studies will have broad implications for the vectorial capacity of this species and comparative evolution of neural circuit modulation.

      I believe the authors have adequately addressed all of my concerns; however, I think an accompanying figure to match the explained methods of the tissue-specific knockdown would help readers. The methods are now explicitly written for the timing and concentrations required to achieve tissue-specific knockdown, but seeing the data as a supplement would be especially reassuring given the critical nature of tissue-specific knockdown to the final interpretations of this paper.

      We thank the reviewer for the suggestion and have now incorporated a schematic in the supplementary figure S9B, explaining our methodology for achieving tissue-specific knockdowns.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Bansal et al examine and characterize feeding behaviour in Anopheles stephensi mosquitoes. While sharing some similarities to the well-studied Aedes aegypti mosquito, the authors demonstrate that mated-females, but not unmated (virgin) females, exhibit suppression in their blood-feeding behaviour. Using brain transcriptomic analysis comparing sugar fed, blood fed and starved mosquitoes, several candidate genes potentially responsible for influencing blood-feeding behaviour were identified, including two neuropeptides (short NPF and RYamide) that are known to modulate feeding behaviour in other mosquito species. Using molecular tools including in situ hybridization, the authors map the distribution of cells producing these neuropeptides in the nervous system and in the gut. Further, by implementing systemic RNA interference (RNAi), the study suggests that both neuropeptides appear to promote blood-feeding (but do not impact sugar feeding) although the impact was observed only after both neuropeptide genes underwent knockdown.

      While the authors have addressed most of the concerns of the original manuscript, a few issues remain. Particularly, the following two points:

      (5) Figure 4

      The authors state that there is more efficient knockdown in the head of unfed females; however, this is not accurate since they only get knockdown in unfed animals, and no evidence of any knockdown in fed animals (panel D). This point should be revised in the results test as well.

      Perhaps we do not understand the reviewer's point or there has been a misunderstanding. In Figure 4D, we show that while there is more robust gene knockdown in unfed females, bloodfed females also showed modest but measurable knockdowns ranging from 5-40% for RYamide and 2-21% for sNPF.

      NEW-

      In both the dsRNA treatments where animals were fed, neither was significantly different from control. Therefore, there is no change, and indeed this is confirmed by the author's labelling of the figure stats in panel 4D.

      We agree with the reviewer and thank them for pointing it out. We have now revised the figure legend and the text to reflect these results (see lines 351-354).

      In addition, do the uninjected and dsGFP-injected relative mRNA expression data reflect combined RYa and sNPF levels? Why is there no variation in these data,...

      In these qPCRs, we calculated relative mRNA expression using the delta-delta Ct method (see line 975). For each neuropeptide its respective control was used. For simplicity, we combined the RYa and sNPF control data into a single representation. The value of this control is invariant because this method sets the control baseline to a value of 1.

      NEW-

      The authors are claiming that there is no variation between individual qPCR experiments (particularly in their controls)? Normally, one uses a known standard value (or calibrator) across multiple experiments/plates so that variation across biological replicates can be assessed. This has an impact on statistical analyses since there is no variation in the control data. Indeed, this impacts all figures/datasets in the manuscript where qPCR data is presented. All the controls have zero variation!

      We are truly thankful to this reviewer for insisting on this point. It has made us revisit what we thought we understood and now realise were doing wrong (though many in literature do it this way!). We were – incorrectly – setting each control to 1 and calculating relative fold changes for each replicate independently. While this is often seen in literature, we now realise that it is incorrect. We have revisited all our analyses and normalized all samples to the mean ΔCt of the control group, which captures biological variation in both control and experimental groups. All data are now re-plotted to show individual data points for both control and experimental groups, and the error bars on controls represent the biological variation across replicates (Figure 4D, 4F, 4G, S8, S9). Statistical analyses were also revised accordingly, and, importantly, they do not change any conclusions. Please note that the abdominal expression of sNPF and RYa are so low that the controls show very variable baseline expression values.

      Reviewer #3 (Public review):

      Summary:

      This manuscript investigates the regulation of host-seeking behavior in Anopheles stephensi females across different life stages and mating states. Through transcriptomic profiling, the authors identify differential gene expression between "blood-hungry" and "blood-sated" states. Two neuropeptides, sNPF and RYamide, are highlighted as potential mediators of host-seeking behavior. RNAi knockdown of these peptides alters host-seeking activity, and their expression is anatomically mapped in the mosquito brain (sNPF and RYamide) and midgut (sNPF only).

      Strengths:

      (1) The study addresses an important question in mosquito biology, with relevance to vector control and disease transmission.

      Transcriptomic profiling is used to uncover gene expression changes linked to behavioral states.

      (2) The identification of sNPF and RYamide as candidate regulators provides a clear focus for downstream mechanistic work.

      (3) RNAi experiments demonstrate that these neuropeptides are necessary for normal hostseeking behavior.

      (4) Anatomical localization of neuropeptide expression adds depth to the functional findings.

      Weaknesses:

      (1) The title implies that the neuropeptides promote host-seeking, but sufficiency is not demonstrated and some conclusions appear premature based on the current data. The support for this conclusion would be strengthened with functional validation using peptide injection or genetic manipulation.

      (2) The identification of candidate receptors is promising, but the manuscript would be significantly strengthened by testing whether receptor knockdowns phenocopy peptide knockdowns. Without this, it is difficult to conclude that the identified receptors mediate the behavioral effects.

      (3) Some important caveats, such as variation in knockdown efficiency and the possibility of offtarget effects, are not adequately discussed.

      These comments were addressed in the previous round.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Awesome paper everyone. A delight to read and review.

      Thank you very much! We appreciated your comments too!

    1. eLife Assessment

      This study presents valuable findings and employs modern analytical approaches on how transient absence of visual input (darkness) affects tactile encoding in the rat somatosensory cortex (S1). The evidence supporting the authors' claims is solid, as population-level neural activity recorded in S1 and decoded by a CNN carries more discriminable texture information in darkness. The underlying basis of this effect remains only partly resolved, however, because it is still unclear which neural features from the CNN drive the decoding and if visual interference is appropriately accounted for, which might confound true neural representational change.

    2. Reviewer #1 (Public review):

      Summary:

      The authors aimed to investigate how short-term visual deprivation influences tactile processing in the primary somatosensory cortex (S1) of sighted rats. They justify the study based on previous studies that have shown that long-term blindness can enhance tactile perception, and aim to investigate the change in neural representations underlying rapid, short-term cross-modal effects. The authors recorded local field potentials from S1 as rats encountered different tactile textures (smooth and rough sandpaper) under light and dark conditions. They used deep learning techniques to decode the neural signals and assess how tactile representations changed across the four different conditions. Their goal was to uncover whether the absence of visual cues leads to a rapid reorganization of tactile encoding in the brain.

      Strengths:

      The study effectively integrates high-density local field potential (LFP) recordings with convolutional neural network (CNN) analysis. This combination allows for decoding high-dimensional population-level signals, revealing changes in neural representations that traditional analyses (e.g., amplitude measures) failed to detect. The custom treadmill paradigm permits independent manipulation of visual and tactile inputs under stable locomotion conditions. Gait analysis confirms that motor behavior was consistent across conditions, strengthening the conclusion that neural changes are due to sensory input rather than movement artifacts.

      Weaknesses:

      (1) While the study interprets the emergence of more distinct texture representations in the dark as evidence of rapid cross-modal plasticity, the claim rests on correlational data from a short-term manipulation and decoding analysis. The authors show that CNN-derived feature embeddings cluster more clearly by texture in the dark, but this does not directly demonstrate plasticity in the classical sense (e.g., synaptic or circuit-level reorganization). The authors have noted this as a limitation and have clarified that the observed changes reflect functional reorganization rather than structural plasticity.

      (2) Although gait was controlled, changes in arousal or exploratory behavior in light versus dark conditions might play a role in the observed neural differences. The authors have controlled for various factors in relation to locomotion, but future studies would benefit from more direct behavioural readouts of arousal states (e.g., via pupillometry or cortical state indicators).

      (3) It should be noted that the time course of the observed changes (within 10 minutes) is quite rapid, and while intriguing, the study does not include direct evidence that the underlying circuits were reorganized-only that population-level signals become more discriminable. The authors have adequately discussed this as an avenue for more mechanistic future research.

      (4) The authors have adequately discussed that, while these findings are consistent with somatotopy and context-dependent dynamics, they do not provide strong independent evidence for novel spatial or temporal organization.

      (5) The authors have also discussed that, while the neural data suggest enhanced tactile representations, the study does not assess whether rats' actual tactile perception improved. Future studies including an assessment of a behavioral readout (e.g., discrimination accuracy), would be insightful.

      (6) The authors' discussion about the implications for sensory rehabilitation, including Braille training and haptic feedback enhancement was a bit premature, but they have amended this, and it remains an interesting translational potential to be explored in future studies.

      (7) While the CNN showed good performance, more transparent models (e.g., linear classifiers or dimensionality reduction) appear to not exceed chance level. The implications of this are that there is an underlying complex structure in the LFPs that has yet to be fully uncovered, on the mechanistic level. This would be important to push the findings forward in future studies.

      Therefore, while the authors raise interesting hypotheses around rapid plasticity, somatotopic dynamics, and rehabilitation, the evidence for each is indirect. Stronger claims will require future causal experiments, behavioral readouts, and mechanistic specificity beyond what the current data provides. However, the work represents an interesting starting point to a more mechanistic understanding in the future.

    3. Reviewer #2 (Public review):

      Summary:

      Yamashiro et al. investigated how transient absence of visual input (i.e. darkness) impacts tactile neural encoding in the rat primary somatosensory cortex (S1). They recorded local field potentials (LFPs) using a 32-channel array implanted in forelimb and hindlimb primary somatosensory cortex while rats walked on smooth or rough textures under illuminated and dark conditions. Employing a convolutional neural network (CNN), they successfully decoded both texture and lighting conditions from the LFPs. The authors conclude that the subtle differences in LFP patterns underlie tactile representation surface roughness and become more distinct in darkness, suggesting a rapid cross-modal reorganization of the neural code for this sensory feature.

      Strengths:

      • The manuscript addresses a valuable question regarding how sensory cortices dynamically adapt to changes in sensory context.<br /> • The use of machine learning (CNNs) enables the analysis to go beyond conventional amplitude-based metrics, potentially uncovering subtle but meaningful effects.<br /> • The authors have substantially improved the manuscript with clearer figures, additional statistical analyses (including permutation tests and cross-validation), and greater methodological transparency.

      Weaknesses:

      • The new analyses (grand-average LFPs, correlation maps, wavelet decompositions, attribution-score correlations) improve transparency but do not yet clarify which specific neural features the CNN exploits, leaving the central interpretability question unresolved.<br /> • A plausible alternative explanation for the increased discriminability in darkness remains insufficiently ruled out: visually driven activity in the light condition (e.g., ambient illumination changes or self-motion-induced visual input) could contaminate S1 LFPs and account for the effect without reflecting a true neural representational change.<br /> • Behavioural and order controls have been improved but remain somewhat limited in sample size.

      Overall assessment:

      The revised manuscript is clearer, more transparent, and technically strengthened. However, the true nature of the signal changes underlying the observed differences in discriminability remains unclear, limiting the scientific strength of the conclusions. The possibility that visual interference contributes to the observed effects remains a plausible and untested alternative interpretation. Additional experiments or analyses quantifying visually evoked activity in S1 would be required to confirm the claim of genuine reorganization of neural representation depending on the illumination condition.

    4. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) While the study interprets the emergence of more distinct texture representations in the dark as evidence of rapid cross-modal plasticity, the claim rests on correlational data from a short-term manipulation and decoding analysis. The authors show that CNN-derived feature embeddings cluster more clearly by texture in the dark, but this does not directly demonstrate plasticity in the classical sense (e.g., synaptic or circuit-level reorganization).

      Thank you for this insightful comment. We acknowledge that our claim of “rapid cross-modal plasticity” is based on correlational evidence and does not directly address synaptic or circuit-level reorganization, which would require more invasive methods. Our study instead focuses on changes in the representational structure of tactile stimuli when visual input is temporarily removed, highlighting the adaptability of sensory coding to environmental context. We agree that this distinction is important and have revised the manuscript to clarify that the observed changes reflect functional reorganization rather than structural plasticity, as indicated by the enhanced separability of texture representations in S1 during darkness.

      (2) Although gait was controlled, changes in arousal or exploratory behavior in light versus dark conditions might contribute to the observed neural differences. These factors are acknowledged but not directly measured (e.g., via pupillometry or cortical state indicators).

      Thank you for your insightful comment. We agree that arousal and exploratory behavior could influence neural differences and have considered these factors in our study. While gait was controlled, we did not directly measure arousal (e.g., via pupillometry or cortical indicators).

      To partially address this, we reviewed locomotor-speed traces (Supplementary Figure 1), which showed no significant differences between light and dark conditions, suggesting movement speed did not drive the neural differences. We also reversed the order of light and dark conditions, and although the separability of textures was not significantly different, it further supports that motivation did not confound our results.

      However, we acknowledge that arousal may still affect cortical dynamics, especially in the dark condition, where the lack of visual input might alter exploratory behavior. Due to technical limitations, we could not directly measure arousal states, and this is now discussed in the revised manuscript. While we cannot rule out the influence of arousal, the enhanced separability of texture representations suggests that sensory reorganization due to visual deprivation likely played a substantial role.

      (3) Moreover, the time course of the observed changes (within 10 minutes) is quite rapid, and while intriguing, the study does not include direct evidence that the underlying circuits were reorganized - only that population-level signals become more discriminable. As such, the term "plasticity" may overstate the conclusions and should be interpreted with caution unless validated by additional causal or longitudinal data.

      Thank you for your important comment. We agree that the term "plasticity" may overstate our conclusions, as our study focuses on population-level signal changes rather than direct evidence of circuit-level reorganization.

      To address this, we have revised the manuscript to clarify that while the observed changes in neural separability suggest functional reorganization of sensory representations, they do not confirm structural plasticity. We have updated the wording throughout the manuscript to emphasize that these findings reflect functional reorganization in response to short-term visual input loss, rather than structural or long-term plasticity.

      We also updated the discussion to highlight the need for future research with more invasive approaches to validate the causal mechanisms behind these rapid changes in neural dynamics.

      (4) The study highlights the forelimb region of S1 and a post-contact temporal window as particularly important for decoding texture, based on occlusion and integrated gradient analyses. However, this finding may be somewhat circular: The LFPs were aligned to forelimb contact, and the floor textures were sensed primarily via the forelimbs, making it unsurprising that forelimb electrodes were most informative. The observed temporal window corresponds directly to the event-aligned epoch, and while it may shift slightly in duration in the dark, this could reflect general differences in sensory gain or arousal, rather than changes in stimulus-specific encoding. Thus, while these findings are consistent with somatotopy and context-dependent dynamics, they do not provide strong independent evidence for novel spatial or temporal organization.

      Thank you for your insightful comment. We understand your concern that the finding of forelimb electrodes being most informative might seem circular, given that the LFPs were aligned to forelimb contact, and the floor textures were primarily sensed by the forelimbs. This design choice was intentional, as the task focused on texture perception through the forelimb, and the forelimb subregion of S1 is naturally expected to play a dominant role in this process. While this somatotopic specificity may make the results predictable, our aim was to emphasize the changes in temporal dynamics of neural processing under visual deprivation.

      We observed a shift in the temporal window's duration in the dark condition, which we interpret as a change in how texture information is processed without visual input. While this could reflect sensory gain or arousal differences, the lack of significant differences in locomotor speed or other behavioral measures (Supplementary Figure 1) suggests that these changes are more likely due to functional reorganization of sensory processing.

      We have clarified in the discussion that the shift in the temporal window is consistent with previous research on sensory reorganization involving both spatial and temporal cortical adjustments. While we do not claim novel spatial or temporal organization, we emphasize that the shift in temporal dynamics suggests adaptation in encoding strategy for texture perception in the absence of visual input. Future studies measuring arousal states (e.g., pupil diameter or cortical state markers) would help distinguish the contributions of arousal versus sensory reorganization to these dynamics.

      (5) While the neural data suggest enhanced tactile representations, the study does not assess whether rats' actual tactile perception improved. Without a behavioral readout (e.g., discrimination accuracy), claims about perceptual enhancement remain speculative.

      Thank you for raising this important point. We agree that while the neural data suggest enhanced separability of tactile representations in the dark condition, we do not directly assess whether these changes translate into improved tactile perception behaviorally.

      However, the primary aim of our study is not to claim perceptual enhancement, but to demonstrate that neural representations in the somatosensory cortex can rapidly reorganize in response to visual deprivation. To clarify this distinction, we have revised the manuscript to emphasize that the observed neural changes in S1 are consistent with functional reorganization of tactile representations, rather than a direct indication of perceptual improvement.

      Future studies will be crucial to directly test whether the enhanced separability of tactile representations in S1 correlates with improved tactile perception in a behavioral task. We have highlighted this as an avenue for future research to better understand the link between neural changes and perceptual outcomes.

      (6) In addition to point 4, the authors discuss implications for sensory rehabilitation, including Braille training and haptic feedback enhancement. However, the lack of actual chronic or even more acute pathological sensory deprivation, behavioral data, or subsequent intervention in this study limits the ability to draw translational conclusions. It remains unknown whether the more distinct neural representations observed actually translate into better tactile performance, discriminability, or perception. Additionally, extrapolating from rats walking on sandpaper in the dark to human rehabilitative contexts is speculative without a clearer behavioral or mechanistic bridge. The potential is certainly there, but the claim is currently aspirational rather than empirically grounded.

      Thank you for raising this important point. Upon careful consideration, we have decided to remove the discussion of sensory rehabilitation implications from the revised manuscript. We have refocused the manuscript to concentrate solely on the neural findings related to tactile encoding reorganization in response to short-term sensory deprivation, avoiding speculative extrapolation to human rehabilitative contexts. This revised approach ensures that the manuscript emphasizes the empirical findings without overstating the translational potential.

      (7) While the CNN showed good performance, details on generalization robustness and validation (e.g., cross-validation folds, variance across animals) are not deeply discussed. Also, while explainability tools were used, interpretability of CNNs remains limited, and more transparent models (e.g., linear classifiers or dimensionality reduction) could offer complementary insights.

      We appreciate the reviewer’s valuable feedback. In response to the concern about generalization robustness and validation, we have now conducted 5-fold cross-validation to assess the model's performance within animals (Figure 6C). We also have added supplementary information on the average silhouette scores across the different folds and animals (Supplementary Table 1, 2). These details are provided in the methods section and discussed in the results to offer a clearer picture of the model's robustness and consistency across rats.

      Regarding the interpretability of CNNs, we acknowledge that deep learning models can lack transparency. We also attempted classification using more transparent models such as PCA and SVM, but their performance did not exceed chance level (Supplementary Figure 2). This indicates that while these simpler models are more interpretable, they cannot capture the complex representations in the LFPs, making deep learning models like CNNs necessary for extracting these insights.

      Reviewer #2 (Public review):

      (1) Despite applying explainability techniques to the CNN-based decoder, the study does not clearly demonstrate the precise "subtle, high-dimensional patterns" exploited by the CNN for surface roughness decoding, limiting the physiological interpretability of the results. Additional analyses (e.g., detailed waveform morphology analysis on grand averages, time-frequency decompositions, or further use of explainability methods) are necessary to clarify the exact nature of the discriminative activity features enabling the CNN to decode surface roughness and how these change with the sensory context (i.e., in light or darkness).

      Thank you for your insightful comment. We recognize the importance of clarifying the exact nature of the high-dimensional neural patterns that the CNN exploits for surface roughness decoding. In response, we have performed additional analyses to provide a more detailed explanation of the CNN's decision-making process and the discriminative features it learned:

      Grand-Average LFP Waveforms Analysis: We calculated the grand-average LFP waveforms for each texture × lighting condition (Figure 4A). While visual inspection did not reveal distinct features in the averaged waveforms, we explored the channel-wise correlations between textures under both light and dark conditions (Figure 4B). We found that the correlation between textures was lower in the dark condition, suggesting that LFPs become more distinct between textures when visual input is absent, which aligns with the CNN’s output.

      Time-Frequency Decomposition (Wavelet Analysis): We also performed time-frequency decomposition of the LFPs using wavelet transforms (Figure 4D). No prominent differences emerged across texture × lighting conditions in the spectral domain. However, upon computing differences in wavelet features between light and dark conditions and analyzing the relationship with the CNN's attribution scores (Supplementary Figures 5A-C), we observed a negative correlation in the 50-60 Hz range and a positive correlation in the 80-90 Hz range. This suggests frequency-specific modulation in LFP activity that may contribute to texture representations, providing further support for the CNN’s learned features.

      (2) The claim regarding cross-modal representation reorganization heavily relies on a silhouette analysis (Figure 5C), which shows a modest effect size and borderline statistical significance (p≈0.05 with n=9+2). More rigorous statistical quantification, such as permutation tests and reporting underlying cluster distances for all animals, would strengthen confidence in this finding.

      Thank you for your thoughtful comment. We appreciate your suggestion to strengthen the statistical rigor of our analysis regarding the cross-modal representation reorganization. In response, we have implemented several additional analyses to more rigorously quantify the separability of neural representations between light and dark conditions:

      (1) Permutation Test for Cluster Separability: We performed a permutation test to assess whether the observed differences in cluster separability between light and dark conditions were statistically significant or could have arisen by chance. The results showed that the silhouette scores for the dark condition consistently exceeded the 95th percentile of the null distribution (Supplementary Figure 4). This permutation test strengthens the validity of our findings, indicating that the enhanced separability in darkness is a systematic reorganization of neural representations, not due to random fluctuations.

      (2) Reporting Cluster Distances: To address concerns about the modest effect size and borderline significance, we have explicitly reported the underlying cluster distances in the form of silhouette scores for each individual animal (Supplementary Table 1, 2). These values reflect the Euclidean distance between clusters within each rat, providing a clearer understanding of the separability observed.

      (3) Additional Statistical Analysis on Silhouette Scores: To further enhance the rigor of our statistical analysis, we recalculated the silhouette scores using 5-fold cross-validation within each animal, ensuring that our results are robust across multiple data splits (Figure 6C).

      By incorporating these additional analyses and reporting detailed cluster distances, we believe we have significantly strengthened the confidence in our claim of cross-modal reorganization.

      (3) While the authors recorded in the somatosensory cortex, primarily known for its tactile responsivity, I would be cautious not to rule out a priori the presence of crossmodal (visual) responses in the area. In this case, the stronger texture separation in darkness might be explained by the absence of some visually-evoked potentials (VEPs) rather than genuine cross-modal reorganization. Clarification is needed to rule out visual interference and this would strengthen the claim.

      Thank you for raising this important point. In response to your concern, we carefully examined whether visually-evoked potentials (VEPs) could be present in the S1 recordings, particularly under the light condition. However, we observed that this experiment did not involve any cue-guided visual stimulation, such as flashing lights or visual cues aligned with the LFP recordings. Without such external visual stimuli, it is unlikely that VEPs would be reliably evoked in the S1. Therefore, we believe the stronger texture separation observed in the dark condition is not due to visual interference, but rather reflects a genuine sensory reorganization in response to the absence of visual input.

      (4) Behavioural controls are limited to gross gait parameters; more detailed analyses of locomotor behavior and additional metrics (e.g., pupil size or locomotor variance) would robustly rule out potential arousal or motor confounds.

      Thank you for your insightful comment regarding behavioral controls. In response, we have added locomotor speed traces aligned with corresponding LFPs (Supplementary Figure 1) to demonstrate that locomotion remained consistent across trials, irrespective of environmental condition (light vs. dark). Additionally, we report locomotor speed variance over 10-minute blocks to confirm no significant motor changes affecting neural recordings. These analyses indicate that LFP differences are unlikely due to locomotor confounds.

      While measuring pupil size could be useful for assessing arousal, the camera resolution in our study was insufficient for reliable measurements. We have noted this limitation in the Discussion and recommend that future studies with high-resolution eye-tracking explore arousal's role in sensory processing in S1.

      (5) The consistent ordering of trials (10 minutes of light then 10 minutes of dark) could introduce confounds such as fatigue or satiation (and also related arousal state), which should be controlled by analyzing sessions with reversed condition ordering.

      Thank you for highlighting the potential confounds due to trial ordering. To address this, we reversed the condition order (dark before light) in a subset of sessions from six rats and reanalyzed the data (Supplementary Figure 3). The results showed not significant, but increase separability in the dark condition, suggesting that the enhanced separability in the dark condition is not due to trial order effects like fatigue or satiation. While order effects may contribute to trial-to-trial variability, the consistent pattern of enhanced separability in the dark further supports the interpretation that visual deprivation directly influences the reorganization of tactile representations in S1.

      (6) The focus on forelimb-aligned LFP analyses raises the possibility that hindlimb-aligned data might yield different conclusions, suggesting alignment effects might bias the results.

      Thank you for your insightful comment on the potential bias of forelimb-aligned LFP analyses. We acknowledge that the choice of alignment event can influence the results and appreciate the suggestion to consider hindlimb-aligned data. However, our experimental design specifically focused on forelimb S1. The forelimb region of S1 was oversampled in our array, and as expected, we observed larger responses there, consistent with the known somatotopic organization of S1.

      While hindlimb-aligned data could provide additional insights, it is not directly relevant to the primary question of how forelimb S1 codes tactile information under visual deprivation. We do not believe the forelimb alignment introduces a bias, as it aligns with the sensory task being investigated. However, we recognize the value of exploring alternative alignments and have now included a discussion in the Methods section regarding the rationale for our design choices.

      (7) The authors' dismissal of amplitude-based metrics as ineffective is inadequately substantiated. A clearer demonstration (e.g., event-related waveforms averaged by conditions, presented both spatially and temporally) would support this claim.

      Thank you for your constructive comment. In response, we have added a more detailed analysis of event-related waveforms, averaged across conditions (light vs. dark, smooth vs. rough textures), and presented them spatially and temporally aligned to forelimb contact (Figure 4A). These waveforms did not show clear, distinct features that could differentiate conditions, which highlights the limitations of traditional amplitude-based metrics in detecting subtle neural activity changes related to visual deprivation.

      We further performed channel-wise correlation analyses (Figure 4B), revealing stronger texture correlations in the light condition, indicating that averaged waveforms do not capture the nuanced differences in neural dynamics. Additionally, time-frequency spectrograms and channel–channel correlation matrices (Figures 4C and 4D) did not show distinct condition differences, reinforcing the limitations of amplitude-based metrics.

      These findings, along with the superior performance of machine learning-based decoding methods (e.g., CNN), support our claim that amplitude-based approaches are insufficient for fully capturing the complexity of the neural data.

      (8) Wording ambiguity regarding "attribution score" versus "activation amplitude" (Figure 5) complicates the interpretation of key findings. This distinction must be clarified for proper assessment of the results.

      Thank you for pointing out the ambiguity between "attribution score" and "activation amplitude." To address this, we have revised the manuscript to use "attribution score" only.

      (9) Generalization across animals remains unaddressed. The current within-subject decoding setup limits conclusions regarding shared neural representations across individuals. Adopting cross-validation strategies and exploring between-animal analyses would add significant value to the manuscript.

      Thank you for highlighting the importance of generalization across animals. While our study focused on within-subject decoding, we acknowledge that this limits conclusions about shared neural representations across individuals. We expect that inter-animal generalization would be challenging, as models trained on data from a single rat may not perform well on data from others due to differences in electrode placement, brain anatomy, and neural representations. We recognize the value of cross-validation strategies and between-animal analyses and will consider them in future work to address this limitation.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) I would strongly recommend that the authors refine their introduction to be more concise. Many concepts and study aims are repeated many times and, therefore, present as highly redundant text. The introduction may be half the length and still contain the important concepts to set up the justification for the study. I would also suggest refining to be less about sensory deprivation (e.g., with blindness) and more in relation to context, as the acute nature of the study allows one to conclude more about the latter than the former.

      Thank you for your feedback on the introduction. We have revised the section to reduce redundancy and present the key concepts more concisely. We also streamlined the study aims and focused more on the context of the acute nature of the study, as you suggested, rather than emphasizing sensory deprivation. This revision better aligns with the main focus of the research and improves clarity. We believe the updated introduction provides a more direct justification for the study.

      (2) I am not sure if Figures 1-3 are meant to be in grey-scale for some reason (perhaps to represent light and dark), but I would encourage the authors to examine if this is necessary, as the use of color generally helps one more easily follow Figures.

      Thank you for this suggestion. Upon review, we agree that the use of color would enhance the clarity and readability of our figures. We have revised the figures including the newly added supplementary figures to incorporate color.

      (3) Figure 5, Figure legend title - check wording.

      Thank you for pointing this out. The title has been adjusted for consistency with the other figure legends.

      Reviewer #2 (Recommendations for the authors):

      (1) Analyses that would strengthen the main claims (major):

      (a) Identify the features exploited by the CNN.

      (i) Provide grand-average LFP waveforms for each texture × lighting condition (fore- and hind-limb channels shown separately, spatially arranged as in Figure 3C) and try to relate them to the decoding strategy learned by the CNN.

      Thank you for your helpful suggestion. We have calculated the grand-average LFP waveforms for each texture × lighting condition and included them in Figure 4A, with fore- and hind-limb channels shown separately and spatially arranged as in Figure 3C. Upon visual inspection, the mean waveforms did not reveal clear, distinct features. To further investigate, we computed the channel-wise correlation between different textures under both dark and light conditions. By subtracting the correlation coefficients for the dark environment from those in the light, we observed that the correlation between textures was lower in the dark environment (Figure 4B). This suggests that LFPs are more distinct between textures in the dark, supporting the CNN model's output. However, this also indicates that the CNN has captured more complex, nuanced information, as it is able to discriminate between LFPs on a single-trial basis, rather than relying on mean traces.

      To assess how the correlation between average LFP waveforms varied across channels, we also calculated the channel-channel correlation matrix for all 32 channels in each condition. While we found stronger correlations within each S1 subregion, we did not observe clear differences of correlation matrix between light and dark conditions, nor between different textures (Figure 4C).

      (ii) Add channel-wise and time-frequency maps (e.g., wavelet or spectrograms) for each texture × lighting condition and try to relate them to the decoding strategy learned by the CNN.

      Thank you for the valuable suggestion. We calculated wavelet features for each LFP segment and averaged them across trials to assess differences in LFP between light and dark conditions, as well as across textures (Figure 4D). However, no distinct differences were observed in the spectral map. To investigate further, we computed the differences in spectral maps for LFPs in light and dark trials. We then calculated the difference in attribution scores derived from the integrated gradient map (Supplementary Figure 4A). Subsequently, we calculated the correlation coefficients between the differences in integrated gradients and the differences in power across each frequency band in the spectral map (Supplementary Figures 4B and 4C). A negative correlation was found in the 50-60 Hz range, while a positive correlation was observed in the 80-90 Hz range. These findings suggest that frequency-specific patterns of LFP activity in different conditions may be linked to the texture representations captured by the CNN model. We have included a discussion of these findings in [lines 463-468].

      (b) Quantify the "enhanced separability in darkness" more rigorously.

      (i) Report cluster-distances (e.g. Euclidean) for each individual animal.

      We thank the reviewer for this helpful comment. When calculating the silhouette score, we used Euclidean distance as the distance metric. The silhouette score is defined for each data point as the difference between the average distance to points within its assigned cluster and the average distance to points in the nearest other cluster, normalized by the larger of the two values. Thus, the silhouette score inherently reflects the relative cluster distances both within and across conditions for each individual animal. Because we report and statistically analyze silhouette scores (Figure 6C), these values already quantify and compare the Euclidean cluster distances across conditions at the animal level. For clarity, we have now added a definition of the silhouette score in the Methods section of the main text [lines 269-278]. We also included the calculated silhouette scores in Supplementary Table 1.

      (ii) Run a permutation or bootstrap test (shuffling darkness/light labels within animals) to obtain an empirical null distribution for cluster separability in the network embedding space.

      We thank the reviewer for this important suggestion. In response, we implemented a permutation test to assess the robustness of our cluster separability results. Specifically, we shuffled the darkness/light labels within each animal and recalculated silhouette scores across 1000 resamples to generate an empirical null distribution. The observed separability between light and dark conditions consistently exceeded the 95th percentile of the null distribution (Supplementary Figure 3). This confirms that the enhanced cluster separability in darkness was not attributable to random fluctuations in labeling but instead reflected a systematic reorganization of neural representations.

      (c) Control for possible visually-evoked potentials (VEPs).

      (i) Search the LFPs recorded in light for stereotyped VEP components and/or comment on this possible confound (i.e., VEPs in S1?).

      Thank you for raising this point. Although it would be interesting to observe if a VEP is present in the S1 of rats, this experiment did not involve cue-guided visual stimulation. Additionally, there was no environmental visual cue that could serve as an external trigger to align the LFPs for VEP analysis in S1. Furthermore, since even the somatosensory evoked potential was not clearly visible in the S1 LFP without averaging the aligned LFPs, it is unlikely that we would be able to observe VEPs in single trials.

      (d) Address behavioral and arousal confounds.

      (i) Provide example locomotor-speed traces (aligned with corresponding LFPs) and report locomotor-speed variance across the 10-min blocks.

      Thank you for your comment. We had speedometer installed for the recording of the last two rats. We have now provided example speed traces and the speed variance across blocks in Supplementary Figure 1. The traces show that the locomotor-speed was stable in each trial.

      (ii) If available from the camera recordings, include pupil diameter as a proxy for arousal; otherwise, discuss explicitly how arousal changes might affect S1 LFPs.

      Thank you for this suggestion. We strongly agree that measuring pupil diameters should be incorporated into future studies. However, because our camera did not have sufficient resolution to capture pupil diameters, we have addressed this limitation in the discussion section [lines 525-537].

      (e) Address order effects (and motivation/satiety confounds)

      (i) Present at least a subset of sessions in which the dark block precedes the light block; re-analyze the silhouette score/discriminability with block order as a factor.

      Thank you for this helpful suggestion. We conducted additional analyses using sessions from 6 rats in which the dark block preceded the light block (Supplementary Figure 5A). Using the same model architecture, we calculated the silhouette score for each rat (Supplementary Figure 5B). However, when the order was reversed (dark preceding light), this discriminability effect disappeared. Thus, while we observed a trend toward higher scores in the dark condition, no statistically significant differences in texture discriminability were observed.

      If trial order alone accounted for the increase in discriminability, reversing the order would be expected to yield higher silhouette scores in the light condition. Our findings suggest that factors related to order (e.g., thirst or motivation, as you proposed) are not the sole contributors. Furthermore, previous studies in human participants have shown that brief blindfolding can produce lingering increases in tactile sensitivity, indicating a lasting effect of visual deprivation. Thus, the absence of significant differences in texture representation when the dark condition preceded the light condition may reflect such lasting effects. We have included a discussion in [lines 441-452].

      (ii) Discuss explicitly the potential confounding effect of motivational state/thirst.

      We appreciate the reviewer’s insightful comment. In the revised manuscript, we now explicitly address the potential confounding role of motivational state and thirst in shaping our results. Because animals were water-restricted to maintain task engagement, it is possible that increasing thirst or fluctuating motivation over the course of a session could alter arousal or attentional state, thereby influencing neural separability. However, when the trial order was reversed (dark condition preceding light), silhouette scores did not show a significant increase in the second (light) trial. Thus, while we acknowledge that motivational state may contribute to trial-to-trial variability, the systematic increase in separability during darkness cannot be fully explained by thirst or motivational confounds. This addition has been incorporated into the discussion section [lines 441-452].

      (f) Alignment control and the role of forelimb S1.

      (i) Repeat the decoding analysis with LFPs aligned to hind-limb strike; report whether the fore-limb dominance persists.

      Thank you for your thoughtful suggestion. We appreciate the opportunity to clarify. Our study was designed to ask a different question: how the absence of visual input reorganizes tactile encoding for the body part that actually initiates texture contact in our paradigm (the forepaw). Accordingly, all analyses were aligned to forelimb strike and our array intentionally oversampled S1-forelimb relative to S1-hindlimb (18 vs. 14 electrodes; Fig. 1F–G), yielding clear topographic forelimb-locked event-related responses (Fig. 3B–D) and forelimb-channel dominance in the decoding explainability analyses (Fig. 5D–E). Repeating the full decoding locked to hind-limb strike would test a different hypothesis and would be difficult to interpret for three reasons:

      Design/measurement alignment. Our kinematic detection was built to identify forelimb foot strikes. Extending the detector to hindlimb would require new model training/validation and introduces uncertainty in the exact contact timing relative to the LFP segments we analyze.

      Sampling asymmetry. The array and cortical magnification are not balanced across subregions (18 forelimb vs. 14 hindlimb electrodes; Fig. 1G), so a hind-limb–aligned comparison would be confounded by unequal coverage and signal-to-noise across S1 subdivisions rather than reflecting true “dominance.”

      Scope of the claim. We do not claim that the forelimb is globally more informative about texture; we show the intuitive and topographically specific result that “forelimb S1 codes textures touching the forelimb,” and that these representations become more separable in darkness (silhouette increase; Fig. 5C). A hind-limb–locked re-analysis would likely reveal hindlimb contributions when the hindpaw is the alignment event — but that would not change the central conclusion about darkness enhancing tactile representational separability.

      To address the underlying concern about generality without introducing the above confounds, we have clarified these design choices and limitations in the revised Methods [lines 194-197].

      (g) Amplitude-based baseline.

      (i) Show that a simple linear discriminant or logistic-regression model on peak amplitudes (and/or other simple features like trough width/slope) cannot reach the CNN's accuracy. This kind of "baseline" analysis could also be useful to pinpoint the discriminative features learned by the CNN.

      Thank you for your insightful suggestion. We agree that performing a baseline comparison with a simpler model could help highlight the advantage of using a CNN. However, in our dataset, individual LFP traces do not exhibit clear peaks or well-defined features such as peak amplitude, width, or energy, which makes feature extraction using traditional methods like linear discriminants or logistic regression challenging.

      To address this, we performed principal component analysis (PCA) on the raw LFP traces to reduce the dimensionality and applied a support vector machine (SVM) classifier on the reduced features, in line with the approach used for the CNN models (Supplementary Figure 2A). The results of this analysis, demonstrate that the SVM model struggles to effectively discriminate between conditions, further reinforcing the necessity of the CNN model. The CNN’s ability to automatically learn complex features from the raw LFP data appears to be a crucial factor in achieving superior classification performance (Supplementary Figure 2B).

      (h) Cross-validation and inter-animal generalization.

      (i) Consider replacing the single 80/20 split with k-fold cross-validation within animals.

      Thank you for this suggestion. Instead of using an 80/20 split, we performed 5-fold cross-validation on all rats. The silhouette scores were averaged within each animal across the five folds, and Figure 6C was updated accordingly. After performing a paired t-test, we still observed a significant difference in silhouette scores between the light and dark conditions.

      (ii) Comment on inter-animal generalization.

      Thank you for this valuable feedback. Although we did not explicitly test inter-animal generalization, it is unlikely that a model trained on data from one rat would perform equally well when classifying data recorded from another animal. This limitation arises from two main factors. First, despite careful efforts to implant electrodes in the same brain region and cortical layer across experiments, it is impossible to align all 32 electrodes to identical coordinates. Consequently, the recorded LFPs are obtained from slightly different locations, which may reflect distinct neural processing. Second, even within the same species, individual animals differ in brain size and neural circuit organization. Thus, even if electrodes could be placed at identical anatomical locations, inter-individual variability in brain structure would still lead to differences in the recorded signals. Because deep learning models are often sensitive to small perturbations in their input data, we believe that robust inter-animal generalization is unlikely without fine-tuning the model using data from the target animal. This comment has been inserted in the Discussion [lines 494-507].

      (2) Writing, figure and terminology improvements (minor):

      (a) Figure 5F-G axis label. Decide on either "attribution score" or "activation amplitude" and use that term consistently in panels, legend, and text (currently, I believe it could be confused with raw signal amplitude).

      We have unified the terminology to "attribution score" and applied this consistently across the panels, legend, and text.

      (b) Throughout the manuscript, use "population-level activity" or "average population dynamics" when discussing LFPs (I believe it is more correct to reserve "population code" for multiple single-unit datasets).

      We agree with the reviewer’s point and have adapted the term "population dynamics" to describe LFP information consistently throughout the manuscript.

      (c) Lines 219-221, state down-sampling to 2 kHz, whereas line 289 mentions 10 kHz. Reconcile these numbers.

      We apologize for the confusion and thank the reviewer for thoroughly reading the manuscript. Our original sampling rate was 30 kHz, and all analyses were performed on data resampled to 10 kHz. The reference to 2 kHz was an error, and we have corrected it.

      (d) Specify the tail of each statistical test mentioned in the manuscript and any multiple-comparison correction used.

      We have specified the tail of each statistical test and any multiple-comparison corrections used in the "Data Analysis" section of the Methods.

      (e) Line 244: "variables (He et al., 2015)" → "variables (He et al., 2015)".

      We have corrected this formatting issue and revised it to "variables (He et al., 2015)".

      (f) Line 253: "one-dimentional" → "one-dimensional".

      We have corrected the spelling error and revised it to "one-dimensional".

      (3) Data and code sharing:

      (a) Consider depositing data and code for the analysis in public open repositories.

      Thank you for your suggestion. We have set up a public GitHub repository to share the code. Since the full dataset is quite large (~400GB), we have uploaded a smaller example dataset for the analysis.

    1. eLife Assessment

      The authors test the hypothesis that gonadal steroid signaling influences the transcriptional development of specific neurons in the mPOA during adolescence, and that such adolescent development of the mPOA is necessary for mating behaviors. The valuable findings are supported by convincing evidence. This work contributes new insight into hormone-sensitive transcriptional profiles within genetically defined neuron clusters in the mPOA during adolescence and will be of interest to systems and molecular neuroscientists and those interested in development, sex differences, and/or hormonal regulation.

    2. Reviewer #2 (Public review):

      Summary:

      An abundant literature documents molecular changes in the rodent hypothalamus that occur during the transition from prepubertal to mature reproductive physiology. Equally well documented is the role of sex steroids and their receptors during this important period of reproductive development, as well as the importance of GABAergic and glutamatergic neurons. The medial preoptic area (MPOA) is known to play a central role in expression of sexually dimorphic reproductive function and previously reported sexually dimorphic patterns of gene expression are consistent with this role. The present manuscript extends this knowledge base and reports the results of a detailed evaluation of transcriptional dynamics in the MPOA during the adolescent transition to maturity with a particular focus on the role of the estrogen receptor gene (Esr1). Both single cell RNA sequencing (scRNseq) and multiplex in situ hybridization methods were employed and the results subjected to detailed computational analyses to demonstrate that the transcriptomic structure of MPOA neurons displays both sex and cell type specific expression profiles. In addition, both hormonal and genetic manipulations of Esr1 signaling during puberty altered the transcriptional profiles of MPOA neurons, and these changes aligned with maturation of hormone-dependent reproductive function. The authors provide this evidence to illustrate Esr1-dependent control of gene regulatory networks required for normal expression of reproductive behaviors expressed during the transition from adolescence to adulthood. The results presented in this manuscript are extensive and represent the most comprehensive evaluation of transcriptomic changes during reproductive maturation to date. The methods appear strong and the results provide a rich data set that will support a good deal of future analysis.

      Strengths:

      (1) The major strength of this manuscript is the extensive set of images and graphs that illustrate molecular changes that occur in MPOA neurons during adolescence, although additional spatial detail as to locations of the source neurons would be welcome in order to place the changes in the proper circuitry context.

      (2) Targeting Esr1 deletion to MPOA GABA neurons is a good choice, given how these cells have been implicated in sexual differentiation of reproductive behavior previously, and the lack of comparable responses in glutamatergic neurons is convincing. The AAV-frtFlex-Cre virus created by the investigators is a most useful tool for such studies. Profiling distinct transcriptomic trajectories in GABA and glutamatergic neurons during reproductive maturation is impressive and leads to some of the best supported conclusions in this paper.

      (3) Cellular and molecular resolution of the transcriptomics data appears excellent, however, because the source tissue for the scRNAseq analysis was obtained by bulk dissection of the MPOA anatomical resolution is limited. This problem is addressed to some extent by careful comparison of scRNAseq results with previously published spatial transcriptomics data. The HM-HCR-FISH analysis clearly documents spatially restricted changes in gene expression, but it is hard to discern where these changes occur based on the images presented or the descriptions included in the Results. The anatomical schematic included in Figure 4 suggests that investigators are not familiar with components of the MPOA (see Allen Mouse Brain Atlas).

      Weaknesses:

      (1) A major conceptual flaw is that the authors do not distinguish between genetically determined sex differences in patterns of gene expression and differences caused by the fact that MPOA neurons are exposed to different endocrine environments in adolescent males and females, which can cause different transcriptional trajectories independent of genetic sex. This issue does not render their results invalid, but their terminology should address the issue in the discussion and "limitations" section. At the very least the endocrine status of "intact females" should be included.

      (2) A major technical flaw is that the MPOA is treated as a functionally distinct brain region (block dissections) with uniform distribution of cell types (FISH data are not illustrated or reported with sufficient spatial detail). Thus, an enormous amount of molecular data is provided that cannot be mapped to distinct neural circuits, thereby limiting the neurobiological impact. This is also a weakness of the FISH data, which is presented with only small regions illustrated without anatomical detail. In fact, some images are compared that appear to illustrate different MPOA structures, although it is impossible to be certain of this due to the lack of morphological landmarks. The analysis of how Esr1 orchestrates regulatory gene networks is impressive and interesting, but the fact that many of the observed transcriptional events occur in neural circuits that do not overlap confounds interpretation.

      (3) The locations of the AAV injections should be characterized because deleting Esr1 in multiple distinct parts of the MPOA will likely confound interpretation. This is especially problematic given the limited number of mice used for parts of the RNAscope analysis.

      (4) Although the focus of these experiments on adolescence is welcome, neither the Introduction nor the Discussion do a good job of placing these studies in the context of what is already known about brain maturation during puberty. It is true that this is very much a results-focused manuscript, but the scholarship can be improved. Simply stating that your results are consistent with previous reports places an undue burden on the reader to go figure out what is new.

      (5) Throughout the manuscript, the authors utilize obscure abbreviations, which often makes reading their text overly cumbersome. This is certainly justified in certain instances where complex names of analytical methods are used repeatedly, but the authors are encouraged to try and simply their use of non-standard abbreviations.

      Comments on revisions:

      The authors have considered issues raised during the initial review. Although there do not appear to be significant changes to analyses, figures or conclusions, the authors have added important revisions listing limitations in study design and methodology that impact interpretation.

    3. Reviewer #3 (Public review):

      The paper identifies effects of gonadal hormones within hormone-responsive GABAergic neurons in the MPOA. Although it is not surprising that hormones have effects on neurons that express hormone receptors, the current paper adds insights with higher cellular and spatial resolution than previous work and focuses on adolescence period. The paper also identifies a major role for Esr1-dependent mechanisms on behavior using an intersectional genetic strategy to ablate Esr1 in GABAergic or glutamatergic neurons in the MPOA.

      The authors have thoughtfully addressed the reviews, in particular by focusing quantitative analyses on Vgat+Esr1+ clusters and adding important technical and conceptual considerations in the limitations section.

      I have one remaining minor concern. I appreciate that the text now defines "transcriptional maturation". However, the term seems inappropriate when describing the "minimal transcriptional changes" in Vgat+hormone RLow clusters, which implies that they are transcriptionally immature. Do the authors mean to imply that transcriptional maturation is observed in Vgat+Esr1+ clusters but not Vgat+hormone RLow clusters? The authors also use the term "hormone-dependent transcriptional dynamics", which I think is more appropriate. For example, hormone-dependent transcriptional dynamics are observed in Vgat+Esr1+ clusters but not Vgat+hormone RLow clusters.

    4. Author Response:

      The following is the authors’ response to the original reviews.

      Public review:

      Reviewer #1 (Public review):

      Weaknesses:

      Two minor comments

      (1) Fig 4 (hormone treatment): In this experiment, testosterone is given to males, yet in Sup Fig 6 it is argued that Esr1 is more influential in driving transcriptional changes compared to AR. Does DHT treatment have the same outcome as testosterone? Or, does estrogen treatment in males have the same outcome as testosterone?

      We agree that to distinguish AR and Esr1 activation by testosterone and converted estrogen respectively is a limitation in our study. We added discussion in the “limitation of the study” section.

      Although HM-HCR experiments showed the bidirectional control of transcriptional progression during adolescence, it is unclear if the facilitation in male by testosterone supplement is via activation of AR or Esr1 or both because testosterone will likely be converted to estrogen in the brain. Future studies using dihydrotestosterone (DHT) and estrogen to males may address this issue.

      (2) Fig 3i: There appears to be an age-dependent transcriptional change in male Vgat HR-low cells. Can the authors comment on age-dependent (hormone-independent) transcriptional changes in males versus females.

      We agree that it is important to clarify hormone dependent changes and age dependent changes. We added pair-wise DE results in Vgat HR low population in the main text. As consistent with trajectory analysis, the number of age-dependent genes were fewer than hormonally associated genes.

      “Pair-wise DEG analysis consistently showed that larger number of DEGs between P35 and P23 in Vgat+Esr1+ (male: 146 genes; female: 162 genes) than Vgat+ hormone R<sup>Low</sup> (male: 26 genes; female: 1 gene).”

      Reviewer #2 (Public review):

      Weaknesses:

      (1) A major conceptual flaw is that the authors do not distinguish between genetically determined sex differences in patterns of gene expression and differences caused by the fact that MPOA neurons are exposed to different endocrine environments in adolescent males and females, which can cause different transcriptional trajectories independent of genetic sex. This issue does not render their results invalid, but their terminology should address the issue in the discussion and "limitations" section. At the very least the endocrine status of "intact females" should be included.

      We agree that this was ideal if perinatal and pubertal dynamics are analyzed within the same study to distinguish these two processes. We added discussion in the “limitation section”.

      “2. Although we have identified hormone/Esr1 dependent transcriptional trajectories during adolescence, the relations and interplay with genetically determined perinatal event, which is earlier and robust, are unclear. Some sex differences during adolescence might be an extension of perinatally established sex differences while others might be unique adolescent changes.”

      (2) A major technical flaw is that the MPOA is treated as a functionally distinct brain region (block dissections) with uniform distribution of cell types (FISH data are not illustrated or reported with sufficient spatial detail). Thus, an enormous amount of molecular data is provided that cannot be mapped to distinct neural circuits, thereby limiting the neurobiological impact. This is also a weakness of the FISH data, which is presented with only small regions illustrated without anatomical detail. In fact, some images are compared that appear to illustrate different MPOA structures, although it is impossible to be certain of this due to the lack of morphological landmarks. The analysis of how Esr1 orchestrates regulatory gene networks is impressive and interesting, but the fact that many of the observed transcriptional events occur in neural circuits that do not overlap confounds interpretation.

      We agree that while MPOA is defined based on brain atlas consistently across samples, the boundary is somewhat less obvious compared to other nuclei (e.g. hippocampus, VHM etc). To minimize the contaminations from adjacent areas, we have restricted quantitative analysis to mostly Vgat+ Esr1+ population which are densely located within the MPOA but not in immediately adjacent areas, except posterior BNST which is readily distinguishable. We added clarification in the method as well as added technical limitation in the discussion below.

      Method

      “To disambiguate the MPOA and adjacent brain regions, quantitative analysis is restricted to Vgat+ Esr1+ neurons and is devoid of posterior BNST.”

      Discussion

      “3. While we have observed robust effect of Esr1-KO in scRNAseq experiment which was further validated with FISH experiment, it is possible that there are further heterogeneous Vgat-Esr1 populations in the MPOA which might be differentially targeted in each virally injected sample. To mitigate this, 3-4 mice were pooled for each sample in scRNAseq experiment and in HCR-FISH experiment, in addition to confirming recombinase RNA expression within the MPOA, we included samples with robust Esr1 deletion in the MPOA. Interestingly, due to the technical challenge, Esr1 deletion tends to be more robust than weakly detected recombinase RNA expression (data not shown).”

      (3) The locations of the AAV injections should be characterized because deleting Esr1 in multiple distinct parts of the MPOA will likely confound interpretation. This is especially problematic given the limited number of mice used for parts of the RNAscope analysis.

      We agree that similar to #2, this is an important matter. For HCR experiment, we only included animal with recombinase RNA (Cre or Flp) expression within MPOA. Although the recombinase expression was sufficient enough to qualitatively determine the hit or miss, the detection was weak and it was challenging to determine the extent of viral spread. Thus, we also used successful Esr1 deletion as an additional inclusion criteria for AAV-Cre-YFP group. We have added inclusion criteria in the method and technical consideration in discussion.

      Method

      “For HCR2, AAV was injected unilaterally so that successful targeting of the MPOA with AAVCre-YFP (detection of recombinase RNA within the MPOA) and the deletion of Esr1 were confirmed for inclusion of samples.”

      Discussion

      “3. While we have observed robust effect of Esr1-KO in scRNAseq experiment which was further validated with FISH experiment, it is possible that there are further heterogeneous Vgat-Esr1 populations in the MPOA which might be differentially targeted in each virally injected sample. To mitigate this, 3-4 mice were pooled for each sample in scRNAseq experiment and in HCR-FISH experiment, in addition to confirming recombinase RNA expression within the MPOA, we included samples with robust Esr1 deletion in the MPOA. Interestingly, due to the technical challenge, Esr1 deletion tends to be more robust than weakly detected recombinase RNA expression (data not shown).”

      (4) Although the focus of these experiments on adolescence is welcome, neither the Introduction nor the Discussion do a good job of placing these studies in the context of what is already known about brain maturation during puberty. It is true that this is very much a results focused manuscript, but the scholarship can be improved. Simply stating that your results are consistent with previous reports places an undue burden on the reader to go figure out what is new.

      We agree that contextualizing our study in the scholarship will clarify the novelty and impacts that this study provides to the community. We have updated the introduction adding a review highlighting puberty associated genomic studies in the brain, which are all bulk (brain region level) as well as the very first puberty scRNAseq study in Human testis.

      “Despite the well-established role of these hormones in shaping behavior, the molecular mechanisms underlying their influence on brain development during adolescence are still limited to brain-region level (bulk)[8]in humans and model organisms and adolescent transcriptional dynamics at single cell resolution in the brain remain poorly understood (but see a pioneering study in the human testis[9]).”

      (5) Throughout the manuscript the authors utilize obscure abbreviations, which often makes reading their text overly cumbersome. This is certainly justified in certain instances where complex names of analytical methods are used repeatedly, but the authors are encouraged to try and simplify their use of non-standard abbreviations.

      We agree that this is helpful for readers to have the reference of abbreviations in handy at single location. We added an “abbreviation” section as a reference for readers.

      Medial preoptic area (MPOA)

      Single-cell RNA sequencing (scRNAseq)

      Estrogen receptor 1 (Esr1)

      GABAergic neurons (Vgat+)

      Glutamatergic neurons (Vglut2+)

      Hybridized chain reaction fluorescent in situ hybridization (HCR-FISH)

      Gonadectomized (GDX)

      Partition-based graph abstraction (PAGA)

      Hormone-associated differentially expressed genes (HA-DEGs)

      Multiplexed error-robust fluorescence in situ hybridization (MERFISH) differential gene expression (DE)

      Differentially expressed genes (DEGs)

      Support vector machine (SVM)

      Manifold Enhancement Latent Dimension (MELD)

      Potential of Heat-diffusion for Affinity-based Trajectory Embedding (PHATE)

      Androgen receptor (AR)

      single-cell regulatory network inference (SCENIC)

      Reviewer #3 (Public review):

      We appreciate reviewer for the constructive comments to improve our manuscript.

      Weaknesses:

      We already know that Esr1 is important within GABAergic but not glutamatergic neurons for mating behavior. However, there is not enough data to support the claim that disrupting Esr1 in glutamatergic MPOA neurons "had no observable effect." The MPOA is involved in many behaviors and physiologies that were not investigated. More assays would be required to report "no observable effect."

      The small number of cells included in the transcriptional studies is a general concern, as noted by the authors. This is a particular concern for conclusions related to the role of adolescence in glutamatergic MPOA neurons. The paper reports 24,627 neurons across all treatment groups, which include 3 time points, 2 sexes, and GDX conditions. It seems likely that not much was detected in the glutamatergic neurons because of insufficient power.

      Esr1 knockout is initiated in adolescence, not restricted to adolescence. Do we know that the effects on mating behavior are due to what is happening in adolescence vs. the function of Esr1 in adults? Are the effects different if Esr1 is knocked out in mature adults? This comparison would be important to demonstrate that adolescence is a critical time window for Esr1 function.

      We agree that 1. the relatively mild effects observed in Glutamatergic neurons may be partially due to the scale of the study, and 2. Esr1 deletion is permanent once induced and it is challenging to distinguish adolescent and adult transcriptional dynamics using existing viral strategies.

      We added discussion in the “limitation” section.

      “4. While we have observed robust transcriptional progression in Vgat<sup>+</sup> Esr1<sup>+</sup> neurons during adolescence, we observed more mild alternations in VgluT2<sup>+</sup> neurons. Although the scale of our study is comparable or exceeds prior scRNAseq studies in MPOA[22,29], future larger studies may have more sensitivity to detect adolescent transcriptional dynamics in VgluT2<sup>+</sup> neurons.”

      “5. Although we demonstrated adolescent transcriptional changes were observed as early as P35, and either hormonal deprivation or Esr1 KO in prior to adolescence prevented the transcriptional progression (arrested transcriptional state even at adult), given the viral incubation time and permanent deletion of Esr1 after viral injection, it is challenging to disambiguate the role of Esr1 during adolescence and adult. Future studies injecting the virus at adult may provide additional insights on the similarity and difference between transcriptional changes during puberty and maintained transcriptional states at adult.”

    1. eLife Assessment

      Using the clownfish model, this study examines how growth, feeding, and agonistic behavior result in socially dominant or subordinate states in size- and age-matched individuals of the clownfish, Amphiprion percula. The authors complement this work with whole-body transcriptomics and find significant variation in genes and gene co-expression modules related to growth and satiety-related pathways, as well as ossification-related genes. They provide solid evidence that emerging dominants grow more, eat more, and behave more aggressively than subordinate or solitary individuals; these phenotypic differences are accompanied by distinct gene expression profiles, including variation in growth- and satiety-related pathways. The work is valuable in advancing our understanding of how the social environment regulates phenotypic change; however, claims regarding the mechanistic role of gene expression are only partially supported by the current analyses.

    2. Reviewer #1 (Public review):

      Summary:

      Overall, this is an interesting and well-written manuscript on a fascinating question in a "charismatic" model system.

      Strengths:

      1) The Introduction is concise, though it might be helpful to the non-specialist reader to learn a bit more about what is known about the social control of somatic growth across diverse species (including humans), which would help to make this work more generally interesting.

      (2) The experiment is well-designed.

      (3) The data collected are comprehensive.

      (4) The complementary analysis of both feeding and aggression/submission data with and without known social roles is a neat idea and compelling!

      Weaknesses:

      (1) I was surprised that the HPA/stress axis was not considered here at all. Wouldn't we expect that subordinates have increased stress axis activation, which in turn could inhibit their growth and aggressive behavior?

      (2) To what extent are growth, food intake, agonistic behavior, and/or gene expression patterns coordinated across P1 vs P2 pairs? The lack of such an analysis seems like a missed opportunity.

      (3) What was the rationale for using whole bodies for the transcriptome analysis? Given the hypotheses, the forebrain or hypothalamus and certain other organ systems (e.g., liver, gonads, skin, etc.) would have been obvious candidate tissues here. I realize that cost is always a consideration, but maybe a focus on the fore-/midbrain could have been prioritized.

      (4) Given the preceding point, why was a fold-change threshold used for assessing DEGs (supplementary Figure 3)? There is no biological justification to ever use a fold-change threshold, especially in bulk RNA-seq analysis. This is particularly true here, where whole bodies were used for RNA-seq analysis, which is a bit unusual. Relatively small cell populations (such as hypothalamic neurons that regulate growth or food intake) may show substantial gene expression variation across social types, yet will be masked by the masses of other cells in the whole body sample. However, gene expression may still vary significantly, albeit the fold-difference may be small. I therefore suggest a reanalysis that omits any fold-change threshold.

      (5) Why is the analysis of color (hue, saturation) buried in the supplementary materials? Based on the hypotheses that motivated the study, color seems just as relevant as food intake, growth, and agonistic behavior, so even if the results are negative, they should be presented in the main paper.

      (6) The Discussion is sometimes difficult to follow. The authors may want to consider including a conceptual graphic that integrates the different aspects of growth and satiety regulation, etc., into a work-in-progress model of sorts, which would also facilitate clearer hypotheses for future research.

    3. Reviewer #2 (Public review):

      In this manuscript, the authors test growth, behavior, and gene expression in pairs of clownfish as they establish social dominance hierarchies, examining patterns of gene expression in these pairs after dominance has been established. The authors show solid evidence that emerging dominant clownfish show increased growth, aggression, and food consumption compared to their submissive or solitary counterparts, eventually adopting distinct gene expression profiles.

      Major Comments:

      (1) The Introduction is comprehensive, but it could be condensed. Likewise, the discussion could be condensed. There is considerable redundancy between the methods, the results, and the legend in Figure 1. The authors should consolidate and remove the redundancy.

      (2) For Figure 3, the authors are showing PC2 and PC3; why is PC1 not shown? There is so much overlap between the three groups in PC2 vs PC3; it seems unlikely that researchers could conclusively identify any individual as belonging to a group based on the expression profile. The ovals shown do not capture all the points within each of the groups, and particularly the grey S oval seems misaligned with the datapoints shown.

      (3) The authors indicate that the 15 replicates exhibiting the greatest size difference between P1 and P2 were selected for gene profiling. Does this mean that each of the P1 and P2 were pairs with each other? Have the authors tried examining the gene expression patterns in a paired manner? E.g., for the pairs that showed the greatest size differences, do they also show the greatest differences in gene expression? Do the P1s show the most extreme differences from P2s that also show the most extreme P2 differences? Perhaps lines on Figure 3A connecting datapoints from the P1 and P2 pairs would be informative.

      (4) For the specific target pathways that are up- and downregulated in the different backgrounds, I recommend that the authors include boxplots (or heatmaps) showing the actual expression values for these targets. Figure 6 shows a heatmap for appetite-related genes, and it would be great to see a similar graph for the metabolism and glycolysis genes; it would also be informative to see similar graphs for hormonal and sexual maturation pathways as well.

      (5) Particularly given that there is a relatively small number of genes enriched in the different rank conditions, I did not understand the need to do the WGCNA module analysis. I thought that an analysis of GO terms across the dataset would have been more meaningful than the GO term analysis shown in Figure 4, which considers only genes assigned to the "brown WGCNA module". This should be simplified or clarified.

      (6) The authors say that they have identified coordinated changes in behaviors and the "underlying gene expression, leading to the emergence" of social roles. This is a little bit misleading, since the gene expression analysis occurred well after the behavioral and phenotypic differences emerged. Presumably, the hormonal and genetic shifts that actually caused the behavioral and phenotypic difference occurred during the weeks during which the experiment was underway, and earlier capture of the transcriptome would presumably reveal different patterns, and ones that would be considered more causative. The authors acknowledge this in 434-435, but it could be emphasized further.

      (7) The authors have measured a number of differences between the different dominance classes of fish. All these differences were measured relative to the other classes, but in my view, the Solitary group was the closest to a baseline control. So I'm not sure that it is fair to say that "P2 and S individuals showed consistent downregulation of these genes and pathways" (line 401). I encourage the authors to emphasize the differences in gene expression from the "perspective" of the P1 individuals compared to the baseline of P2 and S individuals. Line 474 says that "P2 fish showed significant upregulation" of a number of pathways. It should be very clear what that is compared to (compared to P1, presumably?)

      (8) Along the same lines, the authors say in line 514 that subordinates and solitaries strategically downregulate their growth. I'm not convinced that this is the case: I would consider this growth trajectory to be the default and the baseline. I would interpret that under certain social conditions, a P1 dominant pattern of growth, behavior, and gene expression is allowed to emerge.

    4. Reviewer #3 (Public review):

      Summary:

      The authors tested the hypothesis that interactions among size- and age-matched rivals will lead to the emergence of social roles, accompanied by divergence in four aspects of individual phenotypes: growth, feeding behavior, fighting behaviors, and gene expression in clownfish.

      Strengths:

      The data on growth, feeding rate, and fighting behaviors support the authors' claims.

      Weaknesses:

      Gene analysis conducted in this study is not sufficient to clarify how the relevant genes actually regulate growth and behavior.

      The information obtained from whole-body gene expression analysis is very limited. Various gene expression is associated with the regulation of fighting behaviors, food intake, growth, and metabolism, and these genes are regulated differently across tissues, even within a single individual. Gene expression analysis should be performed separately for each tissue.

      Clownfish undergo sex change depending on social status and body size, as the authors mention in the manuscript. Numerous gene expressions are affected by sex change. It is unclear how this issue was addressed.

    5. Author Response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Overall, this is an interesting and well-written manuscript on a fascinating question in a"charismatic" model system.

      Strengths:

      (1) The Introduction is concise, though it might be helpful to the non-specialist reader to learn a bit more about what is known about the social control of somatic growth across diverse species (including humans), which would help to make this work more generally interesting.

      (2) The experiment is well-designed.

      (3) The data collected are comprehensive.

      (4) The complementary analysis of both feeding and aggression/submission data with and without known social roles is a neat idea and compelling!

      Thank you for the positive feedback!

      Here, we investigate phenotypic plasticity associated with the adoption of social roles in the clown anemonefish, with strategic growth being just one aspect of that plasticity. Strategic growth, also known as social control of growth, is a fascinating form of adaptive phenotypic plasticity, whereby individuals modify their growth and size in response to fine-scale changes in social conditions (Buston & Clutton-Brock, 2022). In cooperative breeding systems with high reproductive skew, particularly fishes and mammals (possibly including humans), individuals have been shown to i) increase growth/size on the acquisition of dominant status (Dengler-Crish & Catania, 2007; Johnston et al., 2021; Thorley et al., 2018; Van Schaik & Van Hooff, 1996; Walker & McCormick, 2009), ii) increase growth/size when paired with size matched reproductive rivals (Huchard et al., 2016; Reed et al., 2019; this study), and iii) decrease growth/size to avoid conflict (Buston, 2003; Heg et al., 2004; Wong et al., 2007). While strategic growth is fascinating and clearly occurring in this study, we show coordinated changes of multiple aspects of the phenotype as fish adopt social roles. Therefore, we deliberately framed the Introduction broadly to avoid biasing the reader toward viewing growth as the sole or main driver.

      Weaknesses:

      (1) I was surprised that the HPA/stress axis was not considered here at all. Wouldn't we expect that subordinates have increased stress axis activation, which in turn could inhibit their growth and aggressive behavior?

      We also expected to see the HPA/stress axis activated in subordinates, which is why we carried out a targeted exploration of genes known to play a role in this axis. We did not find any genes that were significantly differentially expressed. We believe that there could be two explanations for this. First, from a methodological perspective, it could be due to our use of a whole-body RNA-seq, which may have masked this signal. Alternatively, the stress axis might play a more complex role than just acting as a simple on/off switch for reduced growth. Its activation may peak when competition over size is at its highest (during week one) or, conversely, it may peak later and help maintain reduced growth once hierarchies are firmly established (particularly after the dominant individual reaches its maximum size). To understand the role of the stress axis, future studies should observe how its activation varies over time. We acknowledge that the absence of a stress‑axis signal and its potential explanations were not clearly discussed in the original manuscript, in the revised version, we will address this issue.

      (2) To what extent are growth, food intake, agonistic behavior, and/or gene expression patterns coordinated across P1 vs P2 pairs? The lack of such an analysis seems like a missed opportunity.

      We had a similar thought. Specifically, we were interested in testing the hypothesis that the final size ratio of pairs, which is indicative of the amount of conflict remaining, would predict gene expression. We examined gene expression within pairs to test for coordinated changes and repeated the analysis, accounting for the pair size ratio. In both cases, we found no clear or consistent pattern within pairs. We will consider including these figures in the Supplementary Materials document.

      (3) What was the rationale for using whole bodies for the transcriptome analysis? Given the hypotheses, the forebrain or hypothalamus and certain other organ systems (e.g.,liver, gonads, skin, etc.) would have been obvious candidate tissues here. I realize that cost is always a consideration, but maybe a focus on the fore-/midbrain could have been prioritized.

      We decided to use whole-body samples for this initial transcriptomic analysis to capture a broad view of gene-expression differences while keeping sequencing costs and sample requirements manageable. We agree with the reviewer that future work should explore specific tissues sampled from individuals at multiple time points to disentangle transcriptomic differences across tissue types.

      (4) Given the preceding point, why was a fold-change threshold used for assessing DEGs (supplementary Figure 3)? There is no biological justification to ever use a fold-change threshold, especially in bulk RNA-seq analysis. This is particularly true here, where wholebodies were used for RNA-seq analysis, which is a bit unusual. Relatively small cell populations (such as hypothalamic neurons that regulate growth or food intake) may show substantial gene expression variation across social types, yet will be masked by the masses of other cells in the whole body sample. However, gene expression may still vary significantly, albeit the fold-difference may be small. I therefore suggest a reanalysis that omits any fold-change threshold.

      We thank the reviewer for this important point, and agree that an arbitrary fold‑change cutoff is inappropriate/unnecessary. It should be noted that this fold-change cut-off was only used in this single figure, and all other analyses used p-values from the entire dataset. We will remove the fold‑change threshold cutoff and correct Supplementary Figure 3, and any corresponding text.

      (5) Why is the analysis of color (hue, saturation) buried in the supplementary materials?Based on the hypotheses that motivated the study, color seems just as relevant as food intake, growth, and agonistic behavior, so even if the results are negative, they should be presented in the main paper.

      We agree that color can be an important social signal, so we included color measurements in our experimental design. However, after careful consideration of the color results, we decided that our experimental timing and husbandry changes introduced multiple confounding factors, preventing us from drawing confident conclusions. Specifically, our fish were ≈1 month old at the transfer from larval to experimental tanks and had already begun to deepen their orange hue, before our experiment. (In the wild, they would settle at two weeks of age, prior to the deepening of the orange hue). Once individuals attain a certain hue, it seems that color development can be halted, but not reversed. The transfer also involved changes in lighting, tank background, and diet, factors known to strongly affect coloration. Our results show a uniform shift in orange hue and saturation across social groups, suggesting that these confounding factors might have dominated changes in hue.

      For transparency, we report the color data in the Supplementary Materials, but we caution against drawing any strong conclusions. In the revised manuscript, we will recommend that future work include a targeted experiment to robustly test for the effect of the adoption of social roles on coloration or the effect of coloration on the adoption of social roles.

      (6) The Discussion is sometimes difficult to follow. The authors may want to consider including a conceptual graphic that integrates the different aspects of growth and satiety regulation, etc., into a work-in-progress model of sorts, which would also facilitate clearer hypotheses for future research.

      Thank you for flagging that parts of the Discussion are a bit difficult to follow. In the revised manuscript, we will work to improve readability of the Discussion. We also appreciate the suggestion of including a conceptual schematic. We will consider whether adding such a graphic will add value to this manuscript or future manuscripts.

      Reviewer #2 (Public review):

      In this manuscript, the authors test growth, behavior, and gene expression in pairs of clownfish as they establish social dominance hierarchies, examining patterns of gene expression in these pairs after dominance has been established. The authors show solid evidence that emerging dominant clownfish show increased growth, aggression, and food consumption compared to their submissive or solitary counterparts, eventually adopting distinct gene expression profiles.

      Major Comments:

      (1) The Introduction is comprehensive, but it could be condensed. Likewise, the discussion could be condensed. There is considerable redundancy between the methods, the results,and the legend in Figure 1. The authors should consolidate and remove the redundancy.

      Thank you for flagging that parts of the manuscript could be condensed, we will work on this as we revise the manuscript.

      (2) For Figure 3, the authors are showing PC2 and PC3; why is PC1 not shown? There is so much overlap between the three groups in PC2 vs PC3; it seems unlikely that researchers could conclusively identify any individual as belonging to a group based on the expression profile. The ovals shown do not capture all the points within each of the groups, and particularly the grey S oval seems misaligned with the datapoints shown.

      We understand the concern raised by the reviewer about the overlap among points in the PCA. We have explored PC1-PC3 and found that PC2 and PC3 showed the clearest, statistically significant clustering by social position, while PC1 did not capture any variation due to social position. We have explored whether other factors might be masking differences, such as genetic relatedness, tank effects, total read count per sample, and found that none of these factors explained sample clustering. Regarding the ellipses shown around the points, they were not intended to capture all points, but rather they show the estimated 95% multivariate t-distribution for that given social group. We will make sure this is clearly explained in the figure legend, and Methods section. In addition, in the revised version, we will show PC1 and PC2, and PC1 and PC3, in the Supplements for transparency.

      (3) The authors indicate that the 15 replicates exhibiting the greatest size difference between P1 and P2 were selected for gene profiling. Does this mean that each of the P1and P2 were pairs with each other? Have the authors tried examining the gene expression patterns in a paired manner? E.g., for the pairs that showed the greatest size differences,do they also show the greatest differences in gene expression? Do the P1s show the most extreme differences from P2s that also show the most extreme P2 differences? Perhaps lines on Figure 3A connecting datapoints from the P1 and P2 pairs would be informative.

      Yes, “15 replicates exhibiting the greatest size difference between P1 and P2 were selected for gene profiling” refers to pairs of P1 and P2, we will make sure this is clearly stated in the revised Methods. Yes, we have explored gene expression data considering the size difference between pairs, and found that it showed no clear differences in gene expression patterns (see earlier response to Reviewer #1). We will consider including these figures in the Supplementary Materials document, as well as adding a version of Figure 3A that clearly shows information on pairs, as suggested by the reviewer.

      (4) For the specific target pathways that are up- and downregulated in the different backgrounds, I recommend that the authors include boxplots (or heatmaps) showing the actual expression values for these targets. Figure 6 shows a heatmap for appetite-related genes, and it would be great to see a similar graph for the metabolism and glycolytic genes; it would also be informative to see similar graphs for hormonal and sexual maturation pathways as well.

      We have explored genes across a broad set of metabolic pathways (glycolysis, TCA cycle, lactic fermentation, PDH complex, cholesterol biosynthesis, fatty-acid synthesis, and beta-oxidation) and show all metabolic genes that showed significant differential expression between P1, P2, and S in Figure 6. Overall, very few metabolism-associated genes were significantly differentially expressed, which is why we decided to combine appetite-regulation and metabolism-associated genes into a single figure (Figure 6). In the revised version, we will ensure that Figure 6 clearly shows the gene sets associated with appetite and metabolism.

      We also examined hormonal pathways (glucocorticoid and thyroid signaling), but did not find genes in these pathways that were significantly differentially expressed. Finally, we would like to clarify that our samples consist of two-month-old juvenile individuals that are sexually immature —under ideal conditions, clown anemonefish can mature in one to two years, but they can also remain sexually immature for a decade or more (Buston & García, 2007) — which is why we did not observe distinct molecular signatures of sexual maturation. We recognize that the sentence at line 520 may be misleading, as we did not identify any gene expression signature that we could confidently associate with signs of sexual maturation. We will make sure that these are clearly stated in the revised version of the manuscript.

      (5) Particularly given that there is a relatively small number of genes enriched in the different rank conditions, I did not understand the need to do the WGCNA module analysis. I thought that an analysis of GO terms across the dataset would have been more meaningful than the GO term analysis shown in Figure 4, which considers only genes assigned to the "brown WGCNA module". This should be simplified or clarified.

      To clarify, GO enrichment analysis does not establish correlations with traits, it only describes which functions or pathways are over-represented in a given gene set. That is why we began by using WGCNA to define gene sets (modules) that are correlated to phenotypes. Our primary rationale for WGCNA was to identify modules of co-expressed genes that show significant statistical correlation with the phenotypes of interest (social role: P1, P2, S; growth; and food intake). Pairwise differential expression analysis (Figure 3B) identified a few hundred significantly differentially expressed genes, but those tests treat genes independently and are not able to help us link coordinated changes of co-expressed genes to phenotypes of interest. Because WGCNA is blind to traits, it first identifies groups of co-expressed genes, which can help resolve gene expression patterns.

      We therefore ran WGCNA on the rlog-transformed dataset to identify modules of co-expressed genes that show significant correlation with phenotypes of interests. For every module that showed such a correlation, we performed GO enrichment and carefully evaluated the resulting GO enrichment trees (see Supplementary Figs. 4–5). The brown module was highlighted in the main text because it was one of the modules with a significant correlation to growth, and its associated GO enrichment showed clear growth-related signals that were not identified in the pairwise differential expression analysis results.

      (6) The authors say that they have identified coordinated changes in behaviors and the"underlying gene expression, leading to the emergence" of social roles. This is a little bit misleading, since the gene expression analysis occurred well after the behavioral and phenotypic differences emerged. Presumably, the hormonal and genetic shifts that actually caused the behavioral and phenotypic difference occurred during the weeks during which the experiment was underway, and earlier capture of the transcriptome would presumably reveal different patterns, and ones that would be considered more causative.The authors acknowledge this in 434-435, but it could be emphasized further.

      We appreciate the reviewer raising this point. In the updated version of the manuscript, we will revise wording to convey that food intake, agonistic behavior, size and growth, and gene expression are all changing continuously, in response to each other and in response to social feedback. An underappreciated aspect of this system (and likely many other systems) is that phenotype (including transcriptome) influences the outcome of social interactions, and the outcome of social interactions influences the phenotype (including the transcriptome). Earlier capture of the transcriptome would reveal different levels of gene expression, reflecting the state of the system at that moment in time.

      (7) The authors have measured a number of differences between the different dominance classes of fish. All these differences were measured relative to the other classes, but in my view, the Solitary group was the closest to a baseline control. So I'm not sure that it is fair to say that "P2 and S individuals showed consistent downregulation of these genes and pathways" (line 401). I encourage the authors to emphasize the differences in gene expression from the "perspective" of the P1 individuals compared to the baseline of P2and S individuals. Line 474 says that "P2 fish showed significant upregulation" of a number of pathways. It should be very clear what that is compared to (compared to P1, presumably?)

      We agree with the reviewer that solitary individuals are the most intuitive baseline. Indeed, the experimental design included solitary fish because we expected they would serve as a useful control. Without social restraint, we anticipated they would show unrestricted growth, feeding, behavior, and associated gene‑expression patterns, similar to dominants.

      We initially ran analyses using solitaries as the baseline, but after examining the results, which showed subordinate‑like characteristics for the solitary individuals, we concluded that solitary individuals are not an ecologically appropriate control for this context. Removing juveniles from a social context and housing them in isolation may be stressful and can affect physiology and behavior in ways that do not reflect a natural baseline. From a life‑history standpoint, solitary living is not the typical state for A. percula.

      For these reasons, we reanalysed the dataset using the dominant (P1) as the reference to enable more ecologically meaningful comparisons (this choice was somewhat arbitrary, subordinates could also have been used as the reference). Given that gene expression is relative, we interpret results from both the dominant (P1) and subordinate (P2) perspectives in the Discussion to provide a complete view. We will clarify wording throughout the manuscript to make it clear that everything is relative (e.g., revising Line 474).

      (8) Along the same lines, the authors say in line 514 that subordinates and solitaries strategically downregulate their growth. I'm not convinced that this is the case: I would consider this growth trajectory to be the default and the baseline. I would interpret that under certain social conditions, a P1 dominant pattern of growth, behavior, and gene expression is allowed to emerge.

      We respectfully disagree with the idea that a single baseline/reference growth trajectory exists for any individual of this species. Growth of individuals is entirely social context-dependent: neither fast nor slow growth represents an inherent baseline. When two size‑matched juveniles meet and compete to establish dominance, accelerated growth is the expected trajectory. By contrast, juveniles joining an existing hierarchy are expected to exhibit reduced growth, which minimizes conflict and facilitates their social integration. Unlike species that show non socially mediated growth trajectories, clown anemonefish do not have a context‑independent growth rate, rather, individuals constantly readjust their growth according to their immediate social environment.

      Therefore, growth trajectories must be considered from the perspective of all group members, because they emerge from interactions among individuals rather than reflecting an intrinsic baseline. In this study, we were interested in the establishment of dominance hierarchy and how individuals adjust their phenotypes during this process. By experimentally pairing size‑matched rivals, both individuals are initially expected to pursue the dominant trajectory, and thus neither individual represents a default state. Instead, the outcome reflects a social decision, after which both individuals reinforce their emerging social roles through coordinated changes.

      Reviewer #3 (Public review):

      Summary:

      The authors tested the hypothesis that interactions among size- and age-matched rivals will lead to the emergence of social roles, accompanied by divergence in four aspects of individual phenotypes: growth, feeding behavior, fighting behaviors, and gene expression in clownfish.

      Strengths:

      The data on growth, feeding rate, and fighting behaviors support the authors' claims.

      Thank you for the positive feedback!

      Weaknesses:

      Gene analysis conducted in this study is not sufficient to clarify how the relevant genes actually regulate growth and behavior.

      The information obtained from whole-body gene expression analysis is very limited.Various gene expression is associated with the regulation of fighting behaviors, food intake, growth, and metabolism, and these genes are regulated differently across tissues,even within a single individual. Gene expression analysis should be performed separately for each tissue.

      We understand the reviewer’s concern about whole‑body transcriptomes and agree that tissue‑specific sampling would provide greater resolution of the mechanisms linking gene expression to growth, agonistic behaviors, and food intake. For this initial study, however, we deliberately chose whole‑body samples to capture a broad, unbiased view of gene expression differences while keeping sequencing costs and sample requirements manageable. We explicitly acknowledge the resulting interpretational limits in the Discussion (lines 464; 529–533), and suggest in the last paragraph that the patterns reported here should be used to build on in future studies exploring targeted, tissue‑specific hypotheses.

      Clownfish undergo sex change depending on social status and body size, as the authors mention in the manuscript. Numerous gene expressions are affected by sex change. It is unclear how this issue was addressed.

      We thank the reviewer for raising this point. Sex change and sexual maturation can indeed drive major transcriptional shifts in clown anemonefish, but our experiment did not encompass such a life‑history transition. All individuals in this experiment were juveniles (≈1 month old at the start, ≈2 months old at the end) and were sexually immature at these ages. Clown anemonefish reach sexual maturation around one to two years under ideal conditions, can delay sexual maturation for years under normal conditions (Buston & García, 2007), and sex change in the genus Amphiprion is known to take over ~5 months (Moyer & Nakazono, 1978). Accordingly, individuals in this study were not sexually mature, and sex change was not biologically plausible over the five-week experimental period of our study. We recognize that the sentence at line 520 may be misleading, as we did not identify any gene expression signature that we could confidently associate with signs of sexual maturation. We will make sure that it is clearly stated that the fish in this study were sexually immature in the revised version.

      References:

      Buston, P. (2003). Forcible eviction and prevention of recruitment in the clown anemonefish. Behavioral Ecology, 14(4), 576–582. https://doi.org/10.1093/beheco/arg036

      Buston, P. M., & García, M. B. (2007). An extraordinary life span estimate for the clown anemonefish Amphiprion percula. Journal of Fish Biology, 70(6), 1710–1719. https://doi.org/10.1111/j.1095-8649.2007.01445.x

      Buston, P., & Clutton-Brock, Tim. (2022). Strategic growth in social vertebrates (WITH REVIEWER COMMENTS). Trends in Ecology & Evolution, 37(8), 694–705. https://doi.org/10.1016/j.tree.2022.03.010

      Dengler-Crish, C. M., & Catania, K. C. (2007). Phenotypic plasticity in female naked mole-rats after removal from reproductive suppression. THE JOURNAL OF EXPERIMENTAL BIOLOGY.

      Heg, D, Bender, N, & Hamilton, I. (2004). Strategic growth decisions in helper cichlids. Proceedings of the Royal Society of London. Series B: Biological Sciences, 271(suppl_6). https://doi.org/10.1098/rsbl.2004.0232

      Huchard, E, English, S, Bell, M B. V., Thavarajah, N, & Clutton-Brock, T. (2016). Competitive growth in a cooperative mammal. Nature, 533(7604), 532–534. https://doi.org/10.1038/nature17986

      Johnston, R A., Vullioud, P, Thorley, J, Kirveslahti, H., Shen, L., Mukherjee, S., Karner, C. M., Clutton-Brock, T, & Tung, J (2021). Morphological and genomic shifts in mole-rat ‘queens’ increase fecundity but reduce skeletal integrity. eLife, 10, e65760. https://doi.org/10.7554/eLife.65760

      Moyer, J. T., & Nakazono, A. (1978). Protandrous Hermaphroditism in Six Species of the Anemonefish Genus Amphiprion in Japan (No. 2). The Ichthyological Society of Japan. https://doi.org/10.11369/jji1950.25.101

      Reed, C., Branconi, R., Majoris, J., Johnson, C., & Buston, P. (2019). Competitive growth in a social fish. Biology Letters, 15(2), 20180737. https://doi.org/10.1098/rsbl.2018.0737

      Thorley, J, Katlein, N, Goddard, K, Zöttl, M, & Clutton-Brock, T. (2018). Reproduction triggers adaptive increases in body size in female mole-rats. Proceedings of the Royal Society B: Biological Sciences, 285(1880), 20180897. https://doi.org/10.1098/rspb.2018.0897

      Van Schaik, C P., & Van Hooff, J A. R. A. M. (1996). Toward an understanding of the orangutan’s social system. In Linda F. Marchant, Toshisada Nishida, & William C. McGrew (Eds.), Great Ape Societies (pp. 3–15). Cambridge University Press. https://doi.org/10.1017/CBO9780511752414.003

      Walker, S P. W., & McCormick, M I. (2009). Sexual selection explains sex-specific growth plasticity and positive allometry for sexual size dimorphism in a reef fish. Proceedings of the Royal Society B: Biological Sciences, 276(1671), 3335–3343. https://doi.org/10.1098/rspb.2009.0767

      Wong, M. Y. L., Buston, P. M., Munday, Philip L., & Jones, Geoffrey P. (2007). The threat of punishment enforces peaceful cooperation and stabilizes queues in a coral-reef fish. Proceedings of the Royal Society B: Biological Sciences, 274(1613), 1093–1099. https://doi.org/10.1098/rspb.2006.0284

    1. eLife Assessment

      In this important study, Bready et al. investigate how a highly conserved long-range enhancer mediates neural-specific SOX2 regulation during neural differentiation using human neural stem cells. This study has broad appeal to developmental neuroscience; however, the data remain incomplete given the need for homozygous enhancer knockouts and biological replicates in the scRNAseq assays.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, the authors examine how a developmentally regulated cis-regulatory element controls SOX2 expression during neural differentiation of human stem cells. The results suggest that this highly conserved long-range enhancer mediates neural-specific SOX2 regulation and offer insight into the role of promoter-enhancer contacts in this process. Although the findings are interesting, several limitations need to be addressed.

      Strengths:

      A central question in developmental biology is how genes are regulated in a context-dependent manner. SOX2, a major pluripotency factor, is expressed in diverse tissues during development, and therefore understanding the mechanisms that control its spatiotemporal expression is critical. This study addresses this important question by examining the functional relevance of a neural-specific, developmentally regulated SOX2 enhancer and its associated promoter-enhancer contacts in driving gene expression during human neural development. Using multiple model systems and techniques, the authors test the requirement of this enhancer by analyzing SOX2 expression in mutant lines, providing evidence for its role in this process.

      Weaknesses:

      A key limitation of the study is the absence of data from homozygous SOX2 enhancer deletion, which leaves the analysis incomplete and tempers the conclusions that can be drawn. Furthermore, the suitability of teratomas as a model system is questionable, given their limited capacity to recapitulate the spatial patterning, regional specification, and organized developmental processes characteristic of the human forebrain. Finally, the manuscript remains largely descriptive with little mechanistic insight.

    3. Reviewer #2 (Public review):

      Summary:

      The authors use a combination of genomics, genome conformation assays, and CRISPR-mediated deletion to study the transcriptional regulation of the SOX2 gene in human neural stem cells (hNSCs).

      Strengths:

      The authors show that two distal elements, located ~550kb downstream of the SOX2 gene, are important for SOX2 transcription in hNSC. They investigate both the deletion of these elements in established hNSCs and in hNSCs generated by differentiation of human pluripotent stem cells, suggesting these elements are important in both the establishment and maintenance of SOX2 expression in hNSCs.

      Weaknesses:

      Homologous elements have been studied in the mouse genome and have conserved function in mouse NSCs, yet these findings are not mentioned. Inclusion of biological replicates for the scRNA-seq and replicate CRISPR-deleted clones would strengthen the study.

    4. Author Response:

      eLife Assessment

      In this important study, Bready et al. investigate how a highly conserved long-range enhancer mediates neural-specific SOX2 regulation during neural differentiation using human neural stem cells. This study has broad appeal to developmental neuroscience; however, the data remain incomplete given the need for homozygous enhancer knockouts and biological replicates in the scRNAseq assays.

      We thank the expert reviewers and eLife editors Drs. Eade and White for complementing our work and deeming it an “important study” of “broad appeal to developmental neuroscience”. We also acknowledge some of the limitations of our work, including the lack of homozygous deletion of the enhancer element. As we detail below, we tried tirelessly to identify human embryonic stem cell (hESC) clones with homozygous deletions but were unable to. As we speculate in the discussion, this failure may represent a biological property of the enhancer element (possibly an essentiality manifested even in hESCs), or a technical limitation related to the large size (2.7 kb) of the genomic element targeted for deletion. We also clarify that every scRNAseq assay included cells from multiple teratomas.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this study, the authors examine how a developmentally regulated cis-regulatory element controls SOX2 expression during neural differentiation of human stem cells. The results suggest that this highly conserved long-range enhancer mediates neural-specific SOX2 regulation and offer insight into the role of promoter-enhancer contacts in this process. Although the findings are interesting, several limitations need to be addressed.

      Strengths:

      A central question in developmental biology is how genes are regulated in a context-dependent manner. SOX2, a major pluripotency factor, is expressed in diverse tissues during development, and therefore understanding the mechanisms that control its spatiotemporal expression is critical. This study addresses this important question by examining the functional relevance of a neural-specific, developmentally regulated SOX2 enhancer and its associated promoter-enhancer contacts in driving gene expression during human neural development. Using multiple model systems and techniques, the authors test the requirement of this enhancer by analyzing SOX2 expression in mutant lines, providing evidence for its role in this process.

      We thank the reviewer for highlighting the significance of our work in the field of developmental biology.

      Weaknesses:

      A key limitation of the study is the absence of data from homozygous SOX2 enhancer deletion, which leaves the analysis incomplete and tempers the conclusions that can be drawn. Furthermore, the suitability of teratomas as a model system is questionable, given their limited capacity to recapitulate the spatial patterning, regional specification, and organized developmental processes characteristic of the human forebrain. Finally, the manuscript remains largely descriptive with little mechanistic insight.

      We appreciate the reviewer’s disappointment with lack of data from a homozygous SOX2 enhancer deletion. We too felt disappointed when we started genotyping our hESC clones. In fact, we spent a year screening multiple hESC clones for a homozygous deletion but were unable to find one. We performed several assays to better characterize the heterozygous clones, including Sanger sequencing, whole-genome sequencing (WGS) and fluorescent in situ hybridization (FISH). All assays pointed in the direction of hemizygous deletion. We do not understand the reasons for the absence of homozygous deletion clones. One possibility is that homozygous deletion of the enhancer is selected against in hESCs, thus preventing growth of colonies. Another possibility is the technical challenge of achieving a large deletion (2.7 kb) in hESCs. We also entertained the possibility of the excised enhancer being excised from the genome but retained as extrachromosomal (ec) DNA, thus producing the hemizygous genotype. However, several assays, such as FISH and PCR diagnostics, argued against this possibility.

      The teratoma assay was chosen as an in vivo metric of spontaneous differentiation of hESCs into the three germ layers, because our overarching hypothesis was that perturbing the enhancer element and 3D chromatin loop regulating SOX2 transcription would impair specification of neuroectodermal precursors. We believe that teratomas offer an opportunity to allow pluripotent cells to declare any predilections toward germ layers in unbiased fashion. Importantly, we did not rely solely on teratomas to assess effects of our genomic perturbations on specification of neuroectoderm, but also pursued cerebral organoids as an orthogonal approach focused on the tissue of interest, the central nervous system.

      Our work does not only describe an important mechanism for regulation of SOX2 transcription in the transition from pluripotency to neuroectodermal specification, but also provides mechanistic insight into the question of whether the developmentally co-regulated activation of the enhancer and formation of the 3D chromatin loop are dependent on each other. Our findings indicate that the two processes occur independently of each other, as evidenced by the fact that the enhancer is uncoupled from chromatin folding, as occurs when the adjacent CTCF motif is deleted. This finding raises the possibility that enhancer activation occurs through yet to be determined transcriptional events, and that establishment of the local 3D chromatin architecture helps fine-tune its influences in the Topologically Associating Domain (TAD) of interest.

      We are further pursuing mechanisms that regulate activation of the enhancer within neuroectodermal lineages and may explain its actions on genomic elements other than the SOX2 locus within the relevant TAD. We are also investigating reasons explaining why hemizygous enhancer deletion produces stronger phenotypes than deletion of the CTCF motif that helps stabilize the 3D chromatin loop.

      Reviewer #2 (Public review):

      Summary:

      The authors use a combination of genomics, genome conformation assays, and CRISPR-mediated deletion to study the transcriptional regulation of the SOX2 gene in human neural stem cells (hNSCs).

      Strengths:

      The authors show that two distal elements, located ~550kb downstream of the SOX2 gene, are important for SOX2 transcription in hNSC. They investigate both the deletion of these elements in established hNSCs and in hNSCs generated by differentiation of human pluripotent stem cells, suggesting these elements are important in both the establishment and maintenance of SOX2 expression in hNSCs.

      We thank the reviewer for appreciating the importance of this regulatory mechanism in the establishment and maintenance of SOX2 expression in the human neural lineage.

      Weaknesses:

      Homologous elements have been studied in the mouse genome and have conserved function in mouse NSCs, yet these findings are not mentioned. Inclusion of biological replicates for the scRNA-seq and replicate CRISPR-deleted clones would strengthen the study.

      We appreciate the recommendation of the reviewer to better acknowledge prior work in mouse neural development. We will ensure full acknowledgment of these studies in the revised manuscript.

      We also appreciate the suggestion for biological replicates in our scRNA-seq assays. We clarify that each scRNA-seq arose from combining multiple teratomas from each experimental group, thus ensuring that findings reflect reproducible biology rather than isolated findings from single teratomas. This clarification will be emphasized in the revised manuscript.

      Finally, we absolutely agree with the reviewer that more CRISPR-deleted clones would have strengthened the study. Unfortunately, we realized that characterization of each clone takes multiple years and addition of more clones would have made the study too lengthy.

    1. eLife Assessment

      This fundamental work substantially advances our understanding of short-term plasticity mechanisms by providing evidence for release-independent low-frequency synaptic depression that reflects a redistribution of vesicles within the readily releasable pool, via a reduction in docking site occupancy due to vesicle undocking. The evidence supporting this model is convincing, with rigorous electrophysiological and computational analysis. The work will be of broad interest to cellular neuroscientists and synaptic physiologists.

    2. Reviewer #1 (Public review):

      Summary:

      In this work, the authors investigate the mechanisms of low-frequency synaptic depression at cerebellar parallel fiber to interneuron synapses using unitary recordings that allow direct quantification of synaptic vesicle release. They show that sparse stimulation can induce robust synaptic depression even in the absence of substantial vesicle consumption, and that this depressed state is rapidly reversed when stimulation frequency is increased. To account for these observations, the authors propose a model in which low-frequency depression reflects a redistribution of vesicles within the readily releasable pool, in particular, a reduction in docking site occupancy due to vesicle undocking.

      Strengths:

      I found the experimental work to be of high quality throughout. The use of simple synapse recordings to count individual vesicle release events is particularly powerful in this context and allows questions to be addressed that are difficult to approach with more conventional approaches. The demonstration that low-frequency depression can occur independently of prior vesicle release, together with the rapid recovery observed during high-frequency stimulation, places strong constraints on possible underlying mechanisms and represents a clear strength of the study.

      The modeling framework is clearly laid out and helps organize a broad set of observations across stimulation frequencies. Several of the experimental tests appear well-motivated by the model, including the recovery train experiments, the analysis of failures, and the use of doublet stimulation. Taken together, the data provide a coherent phenomenological description of low-frequency depression and its relationship to vesicle availability within the readily releasable pool.

      Weaknesses:

      While the experimental results are strong, the manuscript would benefit from rebalancing the strength of the mechanistic conclusions drawn from the modeling in light of its limitations. The framework is clearly useful and provides a coherent interpretation of the data, but it is not uniquely constrained by the experimental observations, and alternative models or interpretations could plausibly account for the findings. The use of different model regimes concatenated across time, with substantially different parameter values, highlights the abstract nature of the approach. For these reasons, the model seems best presented as one plausible explanatory framework rather than a definitive biological mechanism. Clarifying the distinction between data-driven observations and model-based inferences would help readers assess which conclusions are strongly supported and which remain more speculative.

      The interpretation of the Ca2+-related experiments would benefit from more cautious wording. The absence of detectable changes in presynaptic Ca2+ signals does not exclude more localized or subtle Ca2+-dependent mechanisms, and conclusions regarding Ca2+ independence should therefore be framed accordingly. In addition, while low-frequency depression is still observed at reduced extracellular Ca2+, these experiments appear less diagnostic of the specific model-derived mechanism emphasized elsewhere in the manuscript - namely, a selective reduction in docking-site occupancy - and should be discussed with appropriate qualification in the text.

      Major points:

      (1) Clarify and qualify mechanistic claims derived from the model.

      Throughout the manuscript, changes in model parameters are at times described as if they directly reflected underlying physiological mechanisms. As a result, the conceptual distinction between experimentally observed phenomena, model-derived variables, and biological interpretation is not always clear. Several conclusions in the Results and Discussion are phrased as mechanistic statements, although they rest on assumptions intrinsic to the modeling framework. The authors should systematically review the text and explicitly distinguish between (i) experimentally observed changes in synaptic responses and (ii) inferences about vesicle docking states or transitions within the model.

      In particular, statements implying that vesicle undocking is the mechanism underlying low-frequency depression should be rephrased to reflect that this is an interpretation within the proposed framework rather than a uniquely demonstrated biological process. For example, statements such as "Low-frequency depression is caused by synaptic vesicle undocking" should be replaced with formulations such as "Within the framework of our model, low-frequency depression is accounted for by a redistribution of synaptic vesicles away from docking sites" or "Our results are consistent with a model in which changes in vesicle docking-state occupancy contribute to low-frequency depression."

      A particularly problematic example is the statement that "these experiments further confirm that LFD only involves a decrease in δ, without accompanying changes in ρ or IP size." Here, an experimentally defined phenomenon (LFD) is directly equated with changes in model-derived variables. Such statements should be revised to make clear that δ, ρ, and IP size are inferred quantities within the model, and that the experimental data are interpreted through this framework rather than directly confirming changes in these parameters. Similarly, over-generalizing statements such as "Undocking therefore represents the key mechanism controlling short-term depression across stimulation frequencies" should be softened to reflect that this conclusion emerges from the model rather than from direct experimental evidence.

      (2) Address the biological interpretation of time-dependent model regimes.

      The model relies on distinct parameter regimes applied at different time points, with some transitions effectively suppressed in certain regimes. While this approach captures the data well, its biological interpretation remains unclear. The authors should either (i) expand the discussion to outline plausible biological processes that could give rise to such regime changes (for example, calcium-dependent modulation of transition rates or activity-dependent changes in vesicle state stability), or (ii) more explicitly frame this aspect of the model as a descriptive abstraction rather than a mechanistic proposal. This further underscores the need to clearly separate the descriptive role of the model from claims about underlying biological mechanisms.

      (3) Reframe conclusions drawn from calcium-related experiments.

      The calcium imaging data demonstrate no detectable changes in the measured presynaptic calcium signals under the tested conditions, but they do not rule out that calcium signals contribute in ways undetectable by the assay. Conclusions should therefore be revised to reflect this limitation, avoiding statements that exclude a role for calcium-dependent mechanisms. Wording such as "we did not detect evidence for..." would be more appropriate than conclusions implying the absence of an effect.

      Similarly, while low-frequency depression is still observed at reduced extracellular calcium (1.5 mM Ca²⁺), the specific mechanistic signature emphasized elsewhere in the manuscript - namely a selectively reduced first response during a high-frequency recovery train - is no longer apparent. These experiments should therefore be discussed as consistent with the proposed framework, but not as providing independent support for a selective reduction in docking-site occupancy. Explicitly acknowledging this limitation would improve clarity and avoid over-interpreting these data.

      (4) Soften interpretations based on non-significant comparisons.

      In several places, comparisons that do not reach statistical significance are used to argue for equivalence between conditions (for example, comparisons involving failure versus non-failure trials or different LFD conditions). These conclusions should be revised to emphasize the limits of statistical power and framed as a lack of evidence for a difference rather than evidence of independence.

    3. Reviewer #2 (Public review):

      Summary:

      Silva and co-workers exploit their previously established methods of analyzing release events at single parallel fiber to molecular layer interneuron synapses. They observed synaptic depression at low transmission frequencies (< 5 Hz), which rapidly recovers during high-frequency transmission. Analysis of the time course of low-frequency depression revealed an initial rapid and a slow linearly increasing time course. Strikingly, the initial depression occurred even in the absence of preceding release, arguing against vesicle depletion as the underlying mechanism.

      Strengths:

      The main strength of the study is the careful demonstration of an interesting synaptic phenomenon challenging the classical vesicle-centered interpretation of synaptic depression.

      Weaknesses:

      No major weaknesses were identified by this reviewer.

      The finding of release-independent synaptic depression is important and would have widespread implications. Therefore, some more analyses to increase the confidence in these findings could be performed.

      My concern is whether rundown could explain the findings. If the rate of failures in s1 increases and at the same time the amplitude decreases during the experiments, an apparent depression in s2 could arise. The Supplementary Figure 5A addresses run-down, but the figure is not easy to understand, and, as far as I understood, it does not address the question of whether the release-independent depression could be caused by a rundown. To address this, the analysis of Figure 5 could be repeated by investigating the failure rate and amplitude separately or by analyzing the 1st and 2nd half of the recordings separately.

    4. Reviewer #3 (Public review):

      Summary:

      The manuscript builds on the observation that, at some synapses, low-frequency stimulation causes synaptic depression, which can be reversed by subsequent high-frequency stimulation. Such low-frequency depression (LFD) cannot be easily explained by the depletion of a single vesicle pool. Here, Silva and colleagues propose a model of activity-dependent vesicle trafficking to explain LFD at synapses between cerebellar granule cells and molecular layer interneurons.

      Strengths:

      Overall, LFD is interesting and worthy of examination, and the authors provide new experimental results that are of the high quality expected from this group.

      Weaknesses:

      The study proposes a novel model of vesicle trafficking that is not explained by known biological mechanisms, and the manuscript does not adequately compare or discuss alternative models.

      I have several concerns about how the authors interpret the data. First, the manuscript's primary conceptual advance is the idea that LFD involves vesicle undocking, rather than depletion. However, most experiments were performed under conditions that promote vesicle depletion (3 mM extracellular Ca2+). When experiments were repeated in physiological Ca2+, there appeared to be little or no LFD (stats are not provided). Second, the RS/DS/DU/undocking model, though not outside the realm of possibility, is not readily explained by known mechanisms and is only loosely supported by experimental findings. Third, when simulating LFD, the authors do not compare alternative models and use inappropriate language to imply that a model fit represents the truth (e.g., "the finding of identical experimental and simulated values confirms that the undocking mechanism accounts for LFD"). Finally, the model is presented in an overly complicated manner. The sheer amount of terms and nomenclature makes the manuscript confusing and difficult to read. Overall, the manuscript would benefit from added experiments and more statistics, a better justification and evaluation of the model, and more nuanced language.

      Major concerns:

      (1) Most experiments were performed under conditions that exacerbate depletion

      In order to attribute LFD to vesicle undocking rather than depletion, it is important to show LFD under conditions where depletion is minimal. As mentioned above, the authors only report significant LFD in elevated extracellular Ca2+. In a small number of experiments performed in more physiological Ca2+ (1.5 mM), there is no depression after a single stimulus, and it is not clear that there was statistically significant depression during a low-frequency train. Several studies cited in support of LFD share this problem:

      • Abrahamsson et al., (2007) recorded from Schaffer collaterals in 4 mM Ca, 3-4X physiological Ca2+.

      • Doussau et al., (2010) recorded from aplysia synapses in 3X Ca compared to seawater.

      • Rudolph et al., (2011) is cited as an example of LFD. However, this study performed experiments at high release probability cerebellar climbing fibers, and reported depression that increased monotonically with

      stimulation frequency, so it does not resemble the phenomenon studied in this paper. Lin et al., (2022) also largely describe monotonic depression at the calyx.

      The authors note that their results differ from those of Atluri and Regehr, but do not mention that a possible reason for the difference is the increased release probability in their experiments.

      The authors should provide statistics for the data obtained in 1.5 mM Ca, and discuss why LFD is increased in conditions that also elevate vesicle release probability.

      (2) Lack of biological mechanisms supporting the model

      The model is presented without compelling biological support. The evidence in support of vesicle undocking comes from experiments by the Watanabe lab, which showed fewer-than-expected docked vesicles under EM when cultured synapses were stimulated immediately prior to high-pressure freezing. Kusick et al were careful to note that these vesicles may have been lost to fusion.

      The putative undocking Kusick describes is immediate (< 5 ms after stimulation), and was not shown to be Ca2+ sensitive. This manuscript describes "calcium-dependent undocking" that proceeds from 10 ms - 200 ms. Multiple studies from the Watanabe lab show that a single stimulus lowers the number of docked vesicles, and subsequently, there is a transient redocking of vesicles that can be blocked by EGTA or Syt7 knockout.

      I also question the rationale for the authors' model that 2 vesicles are coupled in series to a single release site. Previous papers from this lab cited EM studies from frog and neuromuscular that showed filamentous connections between vesicles (do these synapses show LFD?). Here, the authors primarily cite their previous models to support their arguments. I encourage them to continue searching for ultrastructural evidence for 2-vesicle-docking-units and to cite such studies.

      (3) Comparison to other vesicle models

      The authors use overly assertive language to suggest that the model proves a mechanism. "Altogether, these results indicate that the slow phase of LFD ... reflects a δ decrease without significant changes in pr, in ρ or in IP size". Simulating data does not conclusively "indicate" the underlying mechanism, but the authors could state their data can be "explained by a model where..".

      However, LFD does not require activity-dependent undocking. Instead, the phenomenon has been explained by high-release probability, paired with an activity-dependent increase in either docking or release probability (Chiu and Carter, 2024; Doussau et al., 2017). Does the new model do a better job of replicating some facet of the data? If multiple models can explain the same data, how can we determine which model is correct? The "Alternative Presynaptic Depression Mechanisms" should be expanded to discuss these issues.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this work, the authors investigate the mechanisms of low-frequency synaptic depression at cerebellar parallel fiber to interneuron synapses using unitary recordings that allow direct quantification of synaptic vesicle release. They show that sparse stimulation can induce robust synaptic depression even in the absence of substantial vesicle consumption, and that this depressed state is rapidly reversed when stimulation frequency is increased. To account for these observations, the authors propose a model in which low-frequency depression reflects a redistribution of vesicles within the readily releasable pool, in particular, a reduction in docking site occupancy due to vesicle undocking.

      Strengths:

      I found the experimental work to be of high quality throughout. The use of simple synapse recordings to count individual vesicle release events is particularly powerful in this context and allows questions to be addressed that are difficult to approach with more conventional approaches. The demonstration that low-frequency depression can occur independently of prior vesicle release, together with the rapid recovery observed during high-frequency stimulation, places strong constraints on possible underlying mechanisms and represents a clear strength of the study.

      The modelling framework is clearly laid out and helps organize a broad set of observations across stimulation frequencies. Several of the experimental tests appear well-motivated by the model, including the recovery train experiments, the analysis of failures, and the use of doublet stimulation. Taken together, the data provide a coherent phenomenological description of low-frequency depression and its relationship to vesicle availability within the readily releasable pool.

      We thank the Reviewer for his positive assessment of our work.

      Weaknesses:

      While the experimental results are strong, the manuscript would benefit from rebalancing the strength of the mechanistic conclusions drawn from the modelling in light of its limitations. The framework is clearly useful and provides a coherent interpretation of the data, but it is not uniquely constrained by the experimental observations, and alternative models or interpretations could plausibly account for the findings. The use of different model regimes concatenated across time, with substantially different parameter values, highlights the abstract nature of the approach. For these reasons, the model seems best presented as one plausible explanatory framework rather than a definitive biological mechanism. Clarifying the distinction between data-driven observations and model-based inferences would help readers assess which conclusions are strongly supported and which remain more speculative.

      The interpretation of the Ca<sup>2+</sup>-related experiments would benefit from more cautious wording. The absence of detectable changes in presynaptic Ca<sup>2+</sup> signals does not exclude more localized or subtle Ca<sup>2+</sup>-dependent mechanisms, and conclusions regarding Ca<sup>2+</sup> independence should therefore be framed accordingly. In addition, while low-frequency depression is still observed at reduced extracellular Ca<sup>2+</sup>, these experiments appear less diagnostic of the specific model-derived mechanism emphasized elsewhere in the manuscript - namely, a selective reduction in docking-site occupancy - and should be discussed with appropriate qualification in the text.

      Concerning Ca<sup>2+</sup> signals, the Reviewer is right. While we found no change in Ca<sup>2+</sup> signalling apart from a slow Ca<sup>2+</sup> accumulation during long trains at 1 Hz, the possibility of an undetected change cannot be excluded. We have added a word of caution in this direction on p. 11. Concerning the 1.5 mM Ca<sup>2+</sup> experiments, the Reviewer presumably alludes to the first recovery train (yellow) point in Supplementary Fig. 2C. This is also the last point (s11) of the slow train at 0.5 Hz because no delay at all was interposed between the slow train and the recovery train. We have now included one more experiment (with a present total number n = 6), and we have corrected Fig. S2C accordingly. In the new version the depression measured for s4-s10 vs s1 during the 0.5 Hz trains is 0.69 +/- 0.05 (p = 0.00058, paired one-tail t-test). The ratio of the s1 value of the recovery train compared to control s1 is 0.83 +/- 0.08 (p = 0.028, paired one-tail t-test).

      Major points:

      (1) Clarify and qualify mechanistic claims derived from the model.

      Throughout the manuscript, changes in model parameters are at times described as if they directly reflected underlying physiological mechanisms. As a result, the conceptual distinction between experimentally observed phenomena, model-derived variables, and biological interpretation is not always clear. Several conclusions in the Results and Discussion are phrased as mechanistic statements, although they rest on assumptions intrinsic to the modelling framework. The authors should systematically review the text and explicitly distinguish between (i) experimentally observed changes in synaptic responses and (ii) inferences about vesicle docking states or transitions within the model.

      In particular, statements implying that vesicle undocking is the mechanism underlying low-frequency depression should be rephrased to reflect that this is an interpretation within the proposed framework rather than a uniquely demonstrated biological process. For example, statements such as "Low-frequency depression is caused by synaptic vesicle undocking" should be replaced with formulations such as "Within the framework of our model, low-frequency depression is accounted for by a redistribution of synaptic vesicles away from docking sites" or "Our results are consistent with a model in which changes in vesicle docking-state occupancy contribute to low-frequency depression."

      A particularly problematic example is the statement that "these experiments further confirm that LFD only involves a decrease in δ, without accompanying changes in ρ or IP size." Here, an experimentally defined phenomenon (LFD) is directly equated with changes in model-derived variables. Such statements should be revised to make clear that δ, ρ, and IP size are inferred quantities within the model, and that the experimental data are interpreted through this framework rather than directly confirming changes in these parameters. Similarly, overgeneralizing statements such as "Undocking therefore represents the key mechanism controlling short-term depression across stimulation frequencies" should be softened to reflect that this conclusion emerges from the model rather than from direct experimental evidence.

      As suggested, we clarify the distinction in the revised version between experimental data and modelling, and we refrain from making definitive statements on underlying cellular mechanisms.

      (2) Address the biological interpretation of time-dependent model regimes.

      The model relies on distinct parameter regimes applied at different time points, with some transitions effectively suppressed in certain regimes. While this approach captures the data well, its biological interpretation remains unclear. The authors should either (i) expand the discussion to outline plausible biological processes that could give rise to such regime changes (for example, calcium-dependent modulation of transition rates or activity-dependent changes in vesicle state stability), or (ii) more explicitly frame this aspect of the model as a descriptive abstraction rather than a mechanistic proposal. This further underscores the need to clearly separate the descriptive role of the model from claims about underlying biological mechanisms.

      We thank the Reviewer for drawing our attention to this important point. Below 10 ms, rate constants are largely determined by the large amplitude, fast decaying Ca<sup>2+</sup> signal occurring near voltage-dependent Ca<sup>2+</sup> channels (‘Ca<sup>2+</sup> nanodomain’). After 10 ms, the rate constants depend on the low amplitude, slowly decaying Ca<sup>2+</sup> signals averaged over the entire varicosity (‘volume-averaged Ca<sup>2+</sup>’). We explain this better in the revised version (Materials and Methods, p. 21).

      (3) Reframe conclusions drawn from calcium-related experiments.

      The calcium imaging data demonstrate no detectable changes in the measured presynaptic calcium signals under the tested conditions, but they do not rule out that calcium signals contribute in ways undetectable by the assay. Conclusions should therefore be revised to reflect this limitation, avoiding statements that exclude a role for calcium-dependent mechanisms. Wording such as "we did not detect evidence for..." would be more appropriate than conclusions implying the absence of an effect.

      Similarly, while low-frequency depression is still observed at reduced extracellular calcium (1.5 mM Ca<sup>2+</sup>), the specific mechanistic signature emphasized elsewhere in the manuscript - namely a selectively reduced first response during a high-frequency recovery train - is no longer apparent. These experiments should therefore be discussed as consistent with the proposed framework, but not as providing independent support for a selective reduction in docking-site occupancy. Explicitly acknowledging this limitation would improve clarity and avoid overinterpreting these data.

      This has been discussed above (‘weaknesses’).

      (4) Soften interpretations based on non-significant comparisons.

      In several places, comparisons that do not reach statistical significance are used to argue for equivalence between conditions (for example, comparisons involving failure versus non-failure trials or different LFD conditions). These conclusions should be revised to emphasize the limits of statistical power and framed as a lack of evidence for a difference rather than evidence of independence.

      We have attended this point in the revised version.

      Reviewer #2 (Public review):

      Summary:

      Silva and co-workers exploit their previously established methods of analyzing release events at single parallel fiber to molecular layer interneuron synapses. They observed synaptic depression at low transmission frequencies (< 5 Hz), which rapidly recovers during high-frequency transmission. Analysis of the time course of low-frequency depression revealed an initial rapid and a slow linearly increasing time course. Strikingly, the initial depression occurred even in the absence of preceding release, arguing against vesicle depletion as the underlying mechanism.

      Strengths:

      The main strength of the study is the careful demonstration of an interesting synaptic phenomenon challenging the classical vesicle-centered interpretation of synaptic depression.

      We thank the Reviewer for his positive assessment of our work.

      Weaknesses:

      No major weaknesses were identified by this reviewer.

      The finding of release-independent synaptic depression is important and would have widespread implications. Therefore, some more analyses to increase the confidence in these findings could be performed.

      My concern is whether rundown could explain the findings. If the rate of failures in s1 increases and at the same time the amplitude decreases during the experiments, an apparent depression in s2 could arise. The Supplementary Figure 5A addresses run-down, but the figure is not easy to understand, and, as far as I understood, it does not address the question of whether the release-independent depression could be caused by a rundown. To address this, the analysis of Figure 5 could be repeated by investigating the failure rate and amplitude separately or by analyzing the 1st and 2nd half of the recordings separately.

      The Reviewer makes a very important point that had escaped our attention. If the responses were declining over the course of an experiment, near the end of the recordings, a high proportion of failures would be associated with a weak response to the second AP. This could distort the relation between initial failures and amount of LFD, perhaps to the point of indicating LFD after failures when there were none. As suggested by the Reviewer, we tested this possibility by examining the stability of the synaptic responses during experiments. We found a mean s<sub>1</sub> value of 0.87 ± 0.13 for the first half of the experiments used in Fig. 5, and of 1.10 ± 0.17 for the second half (p > 0.05, n = 10). This analysis shows that there was no rundown during these experiments. We show in Author response image 1 a plot of s1 as a function of the number of experiments. These plots do not suggest any artefactual correlation between failures, mean s1, and rundown.

      Author response image 1.

      Plot of s1 as a function of train number for the experiments of Fig. 5. In response to a request of Reviewer 2, this figure illustrates the evolution of s1 values as a function of train number for the experiments used to produce Figure 5. In each experiment, about 20 s1 values were obtained at two ISIs (either 10 ms and 500 ms, or 800 ms and 1600 ms). The figure shows two examples of s1 values as a function of train number (these values fluctuate widely between 0 and 3), and the average across cells and ISI values. There is no indication of a rundown of S1 values as a function of train number

      Reviewer #3 (Public review):

      Summary:

      The manuscript builds on the observation that, at some synapses, low-frequency stimulation causes synaptic depression, which can be reversed by subsequent high-frequency stimulation. Such low-frequency depression (LFD) cannot be easily explained by the depletion of a single vesicle pool. Here, Silva and colleagues propose a model of activity-dependent vesicle trafficking to explain LFD at synapses between cerebellar granule cells and molecular layer interneurons.

      Strengths:

      Overall, LFD is interesting and worthy of examination, and the authors provide new experimental results that are of the high quality expected from this group.

      Weaknesses:

      The study proposes a novel model of vesicle trafficking that is not explained by known biological mechanisms, and the manuscript does not adequately compare or discuss alternative models.

      I have several concerns about how the authors interpret the data. First, the manuscript's primary conceptual advance is the idea that LFD involves vesicle undocking, rather than depletion. However, most experiments were performed under conditions that promote vesicle depletion (3 mM extracellular Ca<sup>2+</sup>). When experiments were repeated in physiological Ca<sup>2+</sup>, there appeared to be little or no LFD (stats are not provided). Second, the RS/DS/DU/undocking model, though not outside the realm of possibility, is not readily explained by known mechanisms and is only loosely supported by experimental findings. Third, when simulating LFD, the authors do not compare alternative models and use inappropriate language to imply that a model fit represents the truth (e.g., "the finding of identical experimental and simulated values confirms that the undocking mechanism accounts for LFD"). Finally, the model is presented in an overly complicated manner. The sheer amount of terms and nomenclature makes the manuscript confusing and difficult to read. Overall, the manuscript would benefit from added experiments and more statistics, a better justification and evaluation of the model, and more nuanced language.

      We respectfully disagree with these sweeping criticisms, as described in more detail below.

      Major concerns:

      (1) Most experiments were performed under conditions that exacerbate depletion

      In order to attribute LFD to vesicle undocking rather than depletion, it is important to show LFD under conditions where depletion is minimal. As mentioned above, the authors only report significant LFD in elevated extracellular Ca<sup>2+</sup>. In a small number of experiments performed in more physiological Ca<sup>2+</sup> (1.5 mM), there is no depression after a single stimulus, and it is not clear that there was statistically significant depression during a low-frequency train. Several studies cited in support of LFD share this problem:

      - Abrahamsson et al., (2007) recorded from Schaffer collaterals in 4 mM Ca, 3-4X physiological Ca<sup>2+</sup>.

      - Doussau et al., (2010) recorded from Aplysia synapses in 3X Ca compared to seawater.

      - Rudolph et al., (2011) is cited as an example of LFD. However, this study performed experiments at high release probability cerebellar climbing fibers, and reported depression that increased monotonically with stimulation frequency, so it does not resemble the phenomenon studied in this paper. Lin et al., (2022) also largely describe monotonic depression at the calyx.

      The Reviewer suggests that LFD may only occur under non-physiological conditions, if the release probability has been increased by artificially elevating the extracellular Ca<sup>2+</sup>. The implication is that LFD is at best a curiosity with little or no significance for brain signalling. We disagree with this point of view for several reasons.

      Concerning the statement ‘In order to attribute LFD to vesicle undocking rather than depletion, it is important to show LFD under conditions where depletion is minimal’: This is the purpose of the analysis shown in Fig. 5.

      The statement ‘the authors only report significant LFD in elevated extracellular Ca<sup>2+</sup>’ is inaccurate. Fig. S2C shows a clear LFD in 1.5 mM Ca<sup>2+</sup>, as acknowledged by Reviewer 1 (‘low-frequency depression is still observed at reduced extracellular Ca<sup>2+</sup>’). However, we failed to provide a p-value for the depression in the initial version of the paper (p = 0.004, n = 5, with this data set; paired t-test, one-tailed). In the revised version, we document the 1.5 mM results more extensively, including the incorporation of the results of an additional experiment, and an explicit statistical analysis of the data (p = 0.00058, n = 6; paired t-test, one-tailed).

      Concerning the statement ‘there is no depression after a single stimulus’: We find that the onset kinetics of LFD is slower in 1.5 Ca<sup>2+</sup> than in 3 Ca<sup>2+</sup> (respectively 1.8 ISI and 0.51 ISI, Fig. 2C and Fig. S2C). This explains that the PPR is not significantly <1 in 1.5 Ca<sup>2+</sup> without implying any weakening of extent of LFD at steady state.

      As explained in the manuscript (p. 5), in a previous work, we developed a method to ascribe changes in SV pools, within the RS/DS model, with specific modifications of s1, s2 and s5-s8 during test 100 Hz trains (Tran et al., 2022). This method was developed in 3 mM Ca<sup>2+</sup> conditions, and for this reason, we performed most experiments for the present work in 3 mM Ca<sup>2+</sup>.

      Chiu and Carter (2024) demonstrated LFD in neocortical synapses; they performed their study in 1.2 mM Ca<sup>2+</sup>, not in elevated Ca<sup>2+</sup>.

      Rudolph et al. (2011) showed low frequency depression not only in elevated external Ca<sup>2+</sup>, but also in 0.5 mM Ca<sup>2+</sup>. While Rudolph et al. (2011) did not make an explicit link between their observations and LFD, there is no reason to doubt that these observations are an example of LFD. They showed a biphasic depression when switching the stimulation frequency from 0.05 Hz to 2 Hz. In one of the founding papers of LFD, Doussau et al. (2010) describe a biphasic depression when switching the stimulation frequency from 0.025 Hz to 1 Hz; the Fig. 1 of the two papers (Rudolph 2011 and Doussau 2010) are strikingly similar.

      Lin et al. (2022) would probably not agree with the statement that the depression at the calyx is ‘largely monotonic’, as they stress the finding of quasi-constant depression between 5 and 50 Hz.

      The authors note that their results differ from those of Atluri and Regehr, but do not mention that a possible reason for the difference is the increased release probability in their experiments.

      In fact, we clearly listed the difference in external Ca<sup>2+</sup> as a likely source of the discrepancy by saying ‘This discrepancy presumably stems from differences in experimental conditions (room temperature, stimulation of multiple presynaptic PFs and 2 mM external Ca<sup>2+</sup> concentration in the previous work, vs. near-physiological temperature, single presynaptic stimulation and 3 mM external Ca<sup>2+</sup> here)’.

      The authors should provide statistics for the data obtained in 1.5 mM Ca, and discuss why LFD is increased in conditions that also elevate vesicle release probability.

      See our comments above: the revised version includes the requested statistics. On p. 6 of the manuscript, we do provide an explanation for the apparent lack of LFD at 1.5 Ca<sup>2+</sup> and 2 Hz, namely a superimposition of LFD with facilitation. At 1.5 Ca<sup>2+</sup> and 0.5 Hz, our LFD numbers are not weaker than at 3 mM Ca<sup>2+</sup> and 0.5 Hz of 1 Hz.

      Altogether, it is correct that many LFD experiments have been carried out in high release probability synapses, and/or under conditions of elevated Ca<sup>2+</sup>. However, the reasons underlying these choices are diverse (in our case, to build on the previous SV pool analysis developed in Tran et al. 2022 in 3 Ca<sup>2+</sup> conditions) and do not imply a limitation to the phenomenon. LFD is present in physiological conditions for low-to-moderate release probability synapses (as shown in our work), and altogether, there is no reason to dismiss LFD as nonphysiological.

      (2) Lack of biological mechanisms supporting the model

      The model is presented without compelling biological support. The evidence in support of vesicle undocking comes from experiments by the Watanabe lab, which showed fewerthanexpected docked vesicles under EM when cultured synapses were stimulated immediately prior to high-pressure freezing. Kusick et al were careful to note that these vesicles may have been lost to fusion.

      The Watanabe lab showed an SV deficit at docking sites at times ranging from about 100 ms to several seconds (Kusick et al., 2020, their Fig. 5E). This corresponds to the ISI values where we see paired-pulse depression. In their Summary, Kusick et al. raise the possibility of SV fusion as an alternative to undocking at the 100 ms time point. But, the same issue had previously been considered in Miki et al., 2018 with other techniques (their Fig. 2d), where it was shown that the SV deficit seen in paired-pulse experiments could not be explained by fusion. This leaves undocking as the most likely explanation, at least in our preparation. We have added a new paragraph on p. 14 to clarify this point.

      The putative undocking Kusick describes is immediate (< 5 ms after stimulation), and it was not shown to be Ca<sup>2+</sup> sensitive. This manuscript describes "calcium-dependent undocking" that proceeds from 10 ms - 200 ms. Multiple studies from the Watanabe lab show that a single stimulus lowers the number of docked vesicles, and subsequently, there is a transient redocking of vesicles that can be blocked by EGTA or Syt7 knockout.

      This is not an accurate description of the Kusick results or of our results. In the Kusick paper, the SV deficit seen at <5 ms after stimulation is attributed to exocytosis, not to undocking. Clearly, it is Ca<sup>2+</sup> dependent. Our manuscript describes potential calcium-dependent undocking not during the time 10 ms- 150 ms, during which our undocking rate is assumed to be calcium-independent, but starting at 150 ms, and lasting a few hundred ms thereafter.

      I also question the rationale for the authors' model that 2 vesicles are coupled in series to a single release site. Previous papers from this lab cited EM studies from frog and neuromuscular that showed filamentous connections between vesicles (do these synapses show LFD?). Here, the authors primarily cite their previous models to support their arguments. I encourage them to continue searching for ultrastructural evidence for 2-vesicle-docking-units and to cite such studies.

      It is important to remember that our sequential two-step model was not based on EM data, but on a series of functional data including variance-mean analysis of summed SV release numbers; covariance analysis among subsequent SV release numbers; analysis of release latencies as a function of stimulus number during an AP train; analysis of SV release numbers under conditions of very high release probability. We note that the phenomenon of Ca<sup>2+</sup>-dependent docking that we proposed based on these observations has been consistent with flash-and-freeze or zap-and-freeze results from several laboratories. Concerning potential filamentous connections between SVs and the AZ plasma membrane at a distance of several 10s of nm, this has been seen not only in frog or mice neuromuscular junctions, but also at brain synapses (ex: Siksou et al., Journal of Neuroscience 2007; Cole et al., Journal of Neuroscience 2016; Fernandez-Busnadiego, Journal of Cell Biology 2010; 2013).

      (3) Comparison to other vesicle models

      The authors use overly assertive language to suggest that the model proves a mechanism. "Altogether, these results indicate that the slow phase of LFD ... reflects a δ decrease without significant changes in pr, in ρ or in IP size". Simulating data does not conclusively "indicate" the underlying mechanism, but the authors could state their data can be "explained by a model where..".

      Please see our response above to a similar point by Reviewer 1.

      However, LFD does not require activity-dependent undocking. Instead, the phenomenon has been explained by high-release probability, paired with an activity-dependent increase in either docking or release probability (Chiu and Carter, 2024; Doussau et al., 2017). Does the new model do a better job of replicating some facet of the data? If multiple models can explain the same data, how can we determine which model is correct? The "Alternative Presynaptic Depression Mechanisms" should be expanded to discuss these issues.

      We could not find statements in the Chiu and Carter paper or in the Doussau et al. paper explaining LFD ‘by high-release probability, paired with an activity-dependent increase in either docking or release probability’. As far as we can see, Chiu and Carter do not propose any specific mechanism for LFD, beyond saying that depression and facilitation must be separate. Doussau et al. (their Fig. 6) clearly frame their interpretation in a sequential two-step model. As in the preceding Miki et al. paper (which they cite extensively), they assume a rapid (a few ms), Ca-dependent transition between their ‘reluctant pool’ and their ‘fully-releasable pool’, respectively homologous to RS and DS. Thus, the Doussau et al. interpretation is close to that presented in our present work, even though significant differences exist. An important difference is that Doussau et al. did not use simple synapses, so that they did not have access to key synaptic parameters such as the number of docking sites or the release probability per docking site. Consequently, the model in Doussau et al. does not have the same level of detail as ours. The revised version explains better the differences and similarity between the models of Doussau et al. and that exposed in our work (new paragraph on p. 14).

    1. eLife Assessment

      Mechanical transduction channels of sensory hair cells possess lipid scramblase activity. Membrane lipid disruption resulting from mechanical transduction is thought to be restored by flippase activities. This fundamental study provides compelling evidence that ATP8B1, a P4-ATP flippase and its subunit TMEM30B, are key in mediating this restorative function in outer hair cells of the mammalian cochlea.

    2. Reviewer #1 (Public review):

      Sensory hair cells of the inner ear convert mechanical sound vibrations into electrical signals through mechano-electrical transduction (MET), a process critically dependent on the specialized organization and lipid composition of their plasma membrane. Although the protein components of the MET complex are relatively well characterized, the role of the lipid environment remains poorly understood and often overlooked. Recent discoveries that core MET proteins TMC1 and TMC2 function as lipid scramblases, disrupting membrane lipid asymmetry, expose a significant gap in our understanding of how lipid homeostasis is regulated in hair cells and how membrane dynamics influence MET function.

      In this study, the authors address this gap by identifying the P4-ATPase ATP8B1 and its chaperone TMEM30B as essential regulators of membrane lipid asymmetry in outer hair cells. They also generated HA-tagged knock-in mice to precisely localize the P4-ATPase ATP8B1 and its chaperone TMEM30B within outer hair cells, demonstrating their enrichment in stereocilia, and convincingly demonstrate that loss of these proteins causes phosphatidylserine externalization, hair cell degeneration, and hearing loss in mouse models, phenocopying defects observed in TMC1 mutant mice with constitutive scrambling activity. While these findings establish lipid flippase pathways as critical for hair cell survival and auditory function, they also raise important questions about the precise mechanisms linking lipid asymmetry disruption to MET dysfunction and hair cell pathology.

      Overall, the data convincingly support the conclusion that ATP8B1-TMEM30B flippase activity is required to maintain stereocilia lipid asymmetry and auditory function. The study substantially advances understanding of how lipid homeostasis intersects with MET. However, several points require clarification to ensure that localization claims and mechanistic interpretations are fully supported by the presented data.

      Revisions considered essential by this reviewer are:

      (1) Figure 1D.<br /> The authors should clarify how the qPCR data were normalized and specify the reference (housekeeping) genes used. This information is necessary to evaluate the robustness and comparability of the gene expression data.

      (2) Figure 1F.<br /> The lack of F-actin staining at the hair cell base raises the possibility that the permeabilization conditions may have limited antibody access to certain membrane regions. This is especially important given that the authors used a gentle permeabilization agent such as saponin to preserve membrane integrity. Because the authors conclude that ATP8B1 and TMEM30B are localized "almost exclusively to OHC bundles and the apical membrane, with minimal staining in the remaining plasma membrane," (line 128). Including co-labeling with a plasma membrane marker or more comprehensive F-actin visualization of lateral and basal regions would help ensure that the restricted localization is biological rather than technical. In the absence of such controls, the localization claim may be somewhat overstated and should be tempered accordingly.

      (3) Figure 7B.<br /> Although quantification of ATP8B1-HA intensity at the bundle appears similar between WT and Cib2 KO samples, the representative image suggests that some bundles lack detectable labeling. To better capture phenotype variability, it would be helpful to include an additional quantification showing the fraction or number of bundles with detectable ATP8B1-HA signal in Cib2 KO mice.

      (4) Lines 346-349.<br /> The manuscript suggests that IHCs lack stereocilia-enriched P4-ATPases. However, this conclusion is not directly supported by the presented data. The authors should either provide supporting localization or expression data for other P4-ATPases or soften the statement to indicate that no stereocilia-enriched P4-ATPases were detected under the conditions examined.

      Recommendations:

      (5) The authors convincingly demonstrate that TMEM30B loss results in ATP8B1 mislocalization. While not essential to the central conclusions, examining TMEM30B localization in ATP8B1 KO hair cells would clarify whether this interdependence is reciprocal, as described for other P4-ATPase-CDC50 complexes.

      (6) Lines 359-374.<br /> The discussion of Annexin V labeling is careful and balanced. This paragraph would benefit from referencing other studies that showed minimal Annexin V labeling in healthy P6 organ of Corti, reinforcing that robust PS externalization in the present study is pathological rather than developmental.

      (7) Lines 392-399.<br /> The proposed feedback model linking MET activity and ATP8B1-TMEM30B localization is compelling. The discussion could be strengthened by noting that in TMC1/2 double knockout hair cells, PS externalization is not observed, consistent with the idea that flippase activity becomes critical specifically when scrambling occurs. The mislocalization observed in Cib2 KO hair cells further supports the coupling between TMC-mediated scrambling and flippase-mediated membrane restoration.

    3. Reviewer #2 (Public review):

      Summary:

      Prior work identified TMEM30B (knockout mice) as well as ATP8B1 (human genetics and mouse model), ATP8A2 (knockout mice), and ATP811A (human genetics) as relevant for hearing. The authors also reasoned that, given the recent discovery of TMC1 and TMC2's dual function as mechanotransduction channels of the inner ear and as lipid scramblases, a counterpart flippase should be in the sensory hair-cell stereocilia bundle where mechanotransduction happens. They use CRISPR/CAS to modify the endogenous mouse genes and add an HA tag at the N-terminus of the ATP8B1, ATP8A1, ATP8A2, and ATP11A proteins. Their experiments with these mice unambiguously localized ATP8B1 at the base of outer hair cell stereocilia bundles. Knockout of ATP8B1 results in loss of outer hair cells, deficient auditory function (ABR), and degeneration of outer hair cell stereocilia bundles. Similarly, hair cells from genetically modified mice with endogenous HA-tagged TMEM30B proteins show localization of this protein to outer hair cell stereocilia bundles. TMEM30B knock-out mice phenocopy the ATP8B1 knock-out model. Interestingly, the authors show that annexing V staining precedes hair cell loss in ATP8B1 and TMEM30B knockout mice and that proper localization of these proteins is lost in mice that lack CIB2, a protein essential for hair cell mechanotransduction.

      Strengths:

      (1) Use of knock-in HA-tagged proteins, rather than antibody staining, to unambiguously localize ATP8B1 and TMEM30B.

      (2) Systematic characterization of auditory function (ABR), hair cell loss, and hair-cell stereocilia bundle morphology.

      (3) Advances our understanding of the role played by lipid homeostasis in auditory function.

      (4) Reports on mouse models that will be helpful to further understand the mechanistic role played by ATP8B1 and TMEM30B in normal hearing and hereditary deafness.

      Weaknesses:

      (1) Are the HA tags causing any functional issues? Function and localization of tagged proteins can sometimes be compromised. It would be good to know, for each knock-in model (TMEM30B, ATP8B1, ATP8A1, ATP8A2, and ATP11A ), whether the HA-tagged protein is causing any issues with the mice and particularly with hearing (ABRs). Are these mice normal? Can they hear? These data are missing.

      (2) Following on the point above, is it possible that ATP8B1-HA is well localized, but localization for the other three flippases (ATP8A1-HA, ATP8A2-HA, and ATP11A-HA) is compromised by the tag? Is this potential mislocalization causing any functional phenotypes? (ABRs of point 1). I find it surprising that there are flippases only in outer hair cells, and only formed by ATP8B1. A possible explanation is that the tag is interfering with trafficking. If so, there should be a phenotype (ABRs), although this might be masked by redundancy among these flippases or caused by systemic issues (admittedly difficult to sort out). Given that this manuscript will likely become foundational, and that there is evidence that at least two of the other flippases are involved in hearing loss, it would be good to provide more information about the mice and HA-tagged proteins in the other knock-ins (ATP8A1-HA, ATP8A2-HA, and ATP11A-HA). Depending on the data available for the knock-ins, the authors may want to discuss these scenarios and soften the statement indicating that inner-hair cells may lack flippase activity altogether.

      (3) Expression of ATP8B1 at P0 (Figure 1D), when there should not be protein in outer hair cells yet, seems high. Does this mean that other cells in the cochlea also express ATP8B1? Is this a concern?

      (4) Fluorescence scales in Figure 6 B and D and Figure 7 B and D are very different. So are the values for WT. One would expect that the WT would be similar in all cases (at least within the same compartments), given that the methods section indicates that "All images were collected using identical acquisition parameters, including zoom and laser power, across genotypes". If WT shows such variability, how can we compare?

    4. Author Response:

      Summary of Planned Revisions:

      We will clarify the qPCR methodology and interpretation to address potential misunderstandings.

      We will assess hearing in the generated HA-tagged mouse lines and, where appropriate, include a properly powered ABR analysis in the revised manuscript.

      We will address concerns regarding the z-stack in Figure 1f.

      We will include additional quantification for Figure 7B to strengthen the analysis.

      We will revise the relevant statement to read: “No IHC stereocilia-enriched P4-ATPases were detected under the conditions examined.”

      While we appreciate the suggestion to examine TMEM30B localization on the ATP8B1 KO background, this is not feasible within a reasonable timeframe; we will clarify this limitation in the manuscript.

      We will incorporate relevant prior work (e.g., George and Ricci, 2026) demonstrating minimal Annexin V labeling prior to P6 and lack of PS externalization in TMC1/2 double knockout models.

      We will clarify that hearing thresholds for TMEM30B-HA and ATP8B1-HA lines will be addressed in this study, while additional HA-tagged flippase lines (ATP8A1, ATP8A2, ATP11A) are part of ongoing work to be reported separately.

      We will soften statements regarding HA-tag insertion and clarify that, to our knowledge, localization and function are not disrupted, while acknowledging this as a potential limitation.

      We will revise the Methods section to clarify differences in fluorescence measurements across experiments.

      In addition to the experiments in response to reviewer’s suggestions, we will add the following data that we have generated while the paper was in review:

      Distortion product otoacoustic emission (DPOAEs) of the Atp8b1 KO and Tmem30b KO mice. Consistent with OHC function, their DPOAEs thresholds were elevated.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) Figure1D.

      The authors should clarify how the qPCR data were normalized and specify the reference (housekeeping) genes used. This information is necessary to evaluate the robustness and comparability of the gene expression data.

      We thank the reviewer for this comment. qPCR data were normalized to GAPDH as the reference (housekeeping) gene. We will clarify this in the Methods section to ensure transparency and reproducibility.

      (2) Figure 1F.

      The lack of F-actin staining at the hair cell base raises the possibility that the permeabilization conditions may have limited antibody access to certain membrane regions. This is especially important given that the authors used a gentle permeabilization agent such as saponin to preserve membrane integrity. Because the authors conclude that ATP8B1 and TMEM30B are localized "almost exclusively to OHC bundles and the apical membrane, with minimal staining in the remaining plasma membrane," (line 128). Including co-labeling with a plasma membrane marker or more comprehensive F-actin visualization of lateral and basal regions would help ensure that the restricted localization is biological rather than technical. In the absence of such controls, the localization claim may be somewhat overstated and should be tempered accordingly.

      We appreciate this important point. The image shown represents a single z-slice from a larger stack, and the hair cell body lies outside the plane of this section. To clarify this, we will revise the figure presentation. Specifically, we can provide the full z-stack (already available via OSF) and/or replace the image with a resliced whole-mount view to better visualize the full cellular context.

      In terms of the possibility that the lack of staining in the hair cell’s plasma membrane might be due to insufficient antibody penetrance, we routinely perform Prestin (located in OHC plasma membrane) staining after saponin-mediated permeabilization and have never experienced antibody accessibility issues. Nevertheless, we will perform co-labeling for Prestin and include in the new submission.

      (3) Figure 7B.

      Although quantification of ATP8B1-HA intensity at the bundle appears similar between WT and Cib2 KO samples, the representative image suggests that some bundles lack detectable labeling. To better capture phenotype variability, it would be helpful to include an additional quantification showing the fraction or number of bundles with detectable ATP8B1-HA signal in Cib2 KO mice.

      We thank the reviewer for this suggestion. To better capture variability, we will include an additional quantification measuring the fraction of hair cell bundles with detectable ATP8B1-HA and TMEM30B-HA signal per field of view. This analysis will complement the existing intensity-based quantification.

      (4) Lines 346-349

      The manuscript suggests that IHCs lack stereocilia-enriched P4-ATPases. However, this conclusion is not directly supported by the presented data. The authors should either provide supporting localization or expression data for other P4-ATPases or soften the statement to indicate that no stereocilia-enriched P4-ATPases were detected under the conditions examined.

      We agree with the reviewer and will revise this statement to read: “No IHC stereocilia-enriched P4-ATPases were detected under the conditions examined.”

      Recommendations:

      (5) The authors convincingly demonstrate that TMEM30B loss results in ATP8B1 mislocalization. While not essential to the central conclusions, examining TMEM30B localization in ATP8B1 KO hair cells would clarify whether this interdependence is reciprocal, as described for other P4-ATPase-CDC50 complexes.

      We appreciate this insightful suggestion. However, performing this experiment would require generating a compound mouse line (crossing TMEM30B-HA into the ATP8B1 knockout background), which is not feasible within the revision timeframe. Additionally, the lack of a robust commercial antibody for TMEM30B further complicates this approach. We will note this as a future direction in the revised manuscript.

      (6) Lines 359-374.

      The discussion of Annexin V labeling is careful and balanced. This paragraph would benefit from referencing other studies that showed minimal Annexin V labeling in healthy P6 organ of Corti, reinforcing that robust PS externalization in the present study is pathological rather than developmental.

      We thank the reviewer for this suggestion and will incorporate relevant prior work, including George and Ricci (2026), which demonstrates minimal Annexin V labeling prior to P6, and further supports our interpretation.

      (7) Lines 392-399.

      The proposed feedback model linking MET activity and ATP8B1-TMEM30B localization is compelling. The discussion could be strengthened by noting that in TMC1/2 double knockout hair cells, PS externalization is not observed, consistent with the idea that flippase activity becomes critical specifically when scrambling occurs. The mislocalization observed in Cib2 KO hair cells further supports the coupling between TMC-mediated scrambling and flippase-mediated membrane restoration.

      We agree and will expand the discussion to include that TMC1/2 double knockout hair cells do not exhibit phosphatidylserine externalization, supporting the idea that flippase activity becomes critical in the context of scrambling.

      Reviewer #2 (Public review):

      Weaknesses:

      (1) Are the HA tags causing any functional issues? Function and localization of tagged proteins can sometimes be compromised. It would be good to know, for each knock-in model (TMEM30B, ATP8B1, ATP8A1, ATP8A2, and ATP11A), whether the HA-tagged protein is causing any issues with the mice and particularly with hearing (ABRs). Are these mice normal? Can they hear? These data are missing.

      We thank the reviewer for raising this important point. In this study, we will focus on TMEM30B-HA and ATP8B1-HA mouse lines, while additional HA-tagged flippase lines (ATP8A1, ATP8A2, ATP11A) are part of ongoing work to be reported separately.

      Both TMEM30B-HA and ATP8B1-HA mice are viable and exhibit normal breeding and aging. Preliminary (pilot) ABR measurements indicate wild-type–like hearing thresholds. We agree that this is important and will attempt to raise sufficient mouse numbers (in the time given) for a properly powered ABR analysis in the revised manuscript.

      (2) Following on the point above, is it possible that ATP8B1-HA is well localized, but localization for the other three flippases (ATP8A1-HA, ATP8A2-HA, and ATP11A-HA) is compromised by the tag? Is this potential mislocalization causing any functional phenotypes? (ABRs of point 1). I find it surprising that there are flippases only in outer hair cells and only formed by ATP8B1. A possible explanation is that the tag is interfering with trafficking. If so, there should be a phenotype (ABRs), although this might be masked by redundancy among these flippases or caused by systemic issues (admittedly difficult to sort out). Given that this manuscript will likely become foundational, and that there is evidence that at least two of the other flippases are involved in hearing loss, it would be good to provide more information about the mice and HA-tagged proteins in the other knock-ins (ATP8A1-HA, ATP8A2-HA, and ATP11A-HA). Depending on the data available for the knock-ins, the authors may want to discuss these scenarios and soften the statement indicating that inner-hair cells may lack flippase activity altogether.

      We appreciate this concern. To our knowledge, the HA tag does not appear to disrupt localization or function of the tagged proteins. However, we agree that this cannot be fully excluded. We will therefore soften our conclusions about IHC flippases and clarify that additional flippases (ATP8A1, ATP8A2, ATP11A) are under investigation and will be described in a separate study.

      (3) Expression of ATP8B1 at P0 (Figure 1D), when there should not be protein in outer hair cells yet seems high. Does this mean that other cells in the cochlea also express ATP8B1? Is this a concern?

      We thank the reviewer for this observation. We interpret the elevated signal at P0 as reflecting transcription preceding detectable protein expression. While expression in other cochlear cell types is possible, we have not observed detectable ATP8B1 localization outside hair cells using the HA-tagged model. We will clarify this point in the manuscript.

      (4) Fluorescence scales in Figure 6 B and D and Figure 7 B and D are very different. So are the values for WT. One would expect that the WT would be similar in all cases (at least within the same compartments), given that the methods section indicates that "All images were collected using identical acquisition parameters, including zoom and laser power, across genotypes". If WT shows such variability, how can we compare?

      We appreciate the need for clarification. Identical acquisition parameters were maintained within each experiment used for direct comparison (e.g., within a given panel). However, different panels (e.g., Figures 6B vs. 6D) were acquired on different days using different imaging settings.

      We will revise the Methods section to explicitly state this and clarify that comparisons are intended only within panels, not across experiments.

    1. eLife Assessment

      This important study examines the stability and compensatory plasticity in the retinotopic mapping in patients with congenital achromatopsia. It provides convincing evidence for a stable mapping of the visual field in V1, alongside changes of the readout from V1 into V3, which shows revised receptive field location and size. This paper would be of interest to scientists studying the visual system, brain plasticity, and development.

    2. Reviewer #1 (Public review):

      Summary:

      This paper examines plasticity in early cortical (V1-V3) areas in an impressively large number of rod monochromats (individuals with achromatopia). The paper examines three things:

      (1) Cortical thickness. It is now well established that early complete blindness leads to increases in cortical thickness. This paper shows increased thickness confined to the foveal projection zone within achromats. This paper replicates work by Molz (2022) and Lowndes (2021), but the detailed mapping of cortical thickness as a function of eccentricity and the inclusion of higher retinotopic areas is particularly elegant.

      (2) Failure to show largescale reorganization of early visual areas using retinotopic mapping. This is a replication of a very recent study of Molz et al. but I believe, given anatomical variability, the larger n in this study, and how susceptible pRF findings are to small changes in procedure, this replication is also of interest.

      (3) Connective field modelling, examining the connections between V3-V1. The paper finds changes in the pattern of connections, and smaller connective fields in individuals with achromatopsia than normally sighted controls, and suggests that these reflect compensatory plasticity, with V3 compensating for the lower resolution V1 signal in individuals with achromatopsia.

      This is a carefully done study (both in terms of data collection and analysis) that is an impressive amount of work.

      *Effects of eye-movements

      The authors have carried out the eye-movement analyses I asked of them. Unfortunately, in 4 individuals they couldn't calibrate the eyetracker (it's impressive they managed in 10). I think this means that 4 of 13 (since a different participant was excluded from head motion) individuals weren't included in correlation analyses. Limiting the correlation analysis to individuals with better fixation has obvious issues. I'd recommend redoing (or additionally including) stats using non-parametric measures while classifying these 4 as having fixation instability of 3 (i.e. greater instability than the participant with the worst fixation who was successfully calibrated).

      *Interpreting pRFs

      The paper would be strengthened by a little more explicit clarity about what pRFs represent and how that affects their interpretation of their findings as plasticity vs. non-plasticity (I know the authors are aware of this, but I think it would be helpful for readers who are less experienced in pRFs). In the introduction it would be helpful to point out that pRFs represent the collective response of a large population of neurons, and as a result pRF estimates can vary depending on which population of neurons that stimulus drives.

      For example, imagine for the sake of argument that rods only project to V1 neurons with larger receptive fields. If one measured pRFs in a control observer under phototopic vs. scotopic conditions one would see smaller pRFs in the photopic conditions. This wouldn't represent 'plasticity' - it would represent the fact that the firing neurons contributing to the pRF signal are a slightly different population because of a change in the stimulus content. This is of course exactly what you see in 2C. And indeed, the authors make this identical point ". In the non-selective condition, the smaller pRFs in controls are in line with the higher spatial resolution of the<br /> cone system, which is not active in the achromat group." But this point would be clearer if more of the conceptual underpinnings were made explicit in the introduction (or at this point in the paper).

      Shifts in which population of neurons drive your pRFs can explain main of the more puzzling results in the paper without detracting from your main conclusions. For example, in 2D, I don't think it's differences in S/N driving your results (pRFs are at least theoretically meant to be robust to S/N). If smaller RFs 'drop out' under low luminance and these smaller RFs also tend to be more central, then one would expect the control results of 1D. And I think a similar argument might even be made for the smaller difference in the rod monochromats.

      It would be possible to make the point of Figure 4B more simply if Figure 4B was replaced by additional Panels in Figure 2 simply showing V3 pRF sizes/eccentricity distributions. That would make the point that you don't see the same expansion in pRF sizes in V3 in a way that is just as clear, and is closer to the data.

      *Interpreting cRFs

      Similarly, I think the paper would be improved with more clarity about the underlying signal in CF modeling. Once again, I appreciate that the authors are familiar with this, but it will help the reader in interpretation. (And I do believe thinking carefully about this may alter your interpretations). CF receptive fields 'find' the region in V1 that best predict the V3 signal in a given voxel. In resting state this likely represents a combination of:

      (1) visually driven signal - correlations that may or may not reflect connectivity but represent the fact that regions that represent the same region of visual space will be active at the same time.

      (2) global bilaterally symmetrical signal consisting of enhanced correlations between iso-eccentric regions (Raemaekers et al., 2014), which may arise from vasculature that symmetrically stems from the posterior cerebral artery (Tong et al., 2013; Tong and Frederick, 2014).

      (3) intrinsic neural fluctuations that are more strongly correlated between connected neurons. These are likely quite weak compared to the other contributions.

      I think if you ignore 2, (which is not likely to differ between rod mono and controls) and model 1 and 3, you might well see shifts in CFs towards the boundary of the scotoma - essentially the CF's location will be biased towards the region of V1 that has stronger correlations - which = the region which has a visual signal.

      I do find convincing the argument that you don't see the same shift in controls in the rod-selective condition. So I think the results of 4A are fine. But a little more clarity about 'what's under the hood' in CF modeling would be nice.

      *Interpreting the relationship between pRFs and cRFs

      So there's something here that confuses me. We are all agreed that V3 pRF sizes are similar across RM and control. V1 pRFs are larger in RM. It feels intuitive that smaller CFs would compensate but I can't make it make sense to myself when I think it through. Each pRF represents a combination of receptive field location scatter and bandwidth. You want to argue that eccentricity mapping looks pretty normal, so there's no reason to think increased rf scatter, and I can believe that (though I do think this assumption should be discussed explictly).

      So far I think we agree.

      But let's think about what drives a CF during visual stimulation ... Specifically lets think about 'the pRF of the CF' (the region of visual space represented by the cluster of voxels in the CF). If pRFs for individual voxels in V1 are big, then the pRF for the CF is also going to be large. But we know that pRFs for V3 are normal size. So, the V3 CF will 'find' a smaller number of voxels in V1, in order to try to find the 'correct sized' CF pRF. Note that this explanation is very similar to yours. But doesn't require ANY 'intrinsic' connectivity. It's really just assuming the whole thing is driven by the visual signal and the CF size is determined by the ratio of the pRF sizes in V3 vs. V1.

      One possible solution would be to regress out the visual stimulus and redo this analysis based on the residuals.

    3. Reviewer #3 (Public review):

      Summary:

      This study addresses a long-standing question in visual neuroscience concerning how the human visual system balances stability and plasticity when sensory input is altered from early in life. Using achromatopsia as a model of lifelong cone deprivation, the authors examine whether early visual cortex undergoes retinotopic reorganization to compensate for the absence of foveal cone input, or whether canonical retinotopic organization is largely preserved. By combining fMRI-based population receptive field (pRF) mapping with connective field (CF) modelling, the authors characterize changes across multiple hierarchical stages of visual processing.

      The main findings indicate that primary visual cortex (V1) shows no systematic remapping of the foveal projection zone, whereas extrastriate cortex, particularly V3, exhibits altered patterns of sampling from V1. The authors interpret these results as evidence for hierarchical adaptation, whereby downstream readout mechanisms adjust to make more efficient use of degraded rod-mediated input while preserving early-stage retinotopic organization.

      Strengths:

      A major strength of this work is the use of silent substitution to generate rod-selective stimuli. This approach enables a principled comparison between achromats and typically sighted controls by isolating rod-driven responses in both groups. In doing so, the study overcomes a key limitation of prior work, where differences in cortical organization could often be confounded by differences in photoreceptor class rather than reflecting neural reorganization per se. The inclusion of a rod-driven baseline in controls provides an important reference for distinguishing long-term adaptation from transient or stimulus-driven effects.

      Another notable strength is the integration of CF modelling alongside conventional pRF mapping. While pRF analyses alone suggest enlarged receptive fields in V1, consistent with reduced spatial resolution, the CF analysis offers a more mechanistic account by revealing changes in how V3 samples information from the V1 surface. This multi-level modelling approach moves beyond descriptive accounts of cortical map structure and provides a framework for interpreting how downstream areas may adjust their integration strategies under conditions of altered input.

      Weaknesses:

      Although the study is methodologically strong, the central claims regarding stability and compensatory plasticity require clearer conceptual framing and stronger empirical support. Stability is primarily defined as the absence of large-scale retinotopic remapping in V1, yet the presence of significantly enlarged V1 pRFs indicates substantial tuning-level plasticity at the input stage; distinguishing topographic stability from functional reorganization would therefore strengthen the interpretation. Moreover, the proposed compensatory mechanism raises a signal-processing concern, as reduced downstream sampling (smaller CFs in V3) cannot restore spatial information lost due to coarse upstream representations, and may instead limit integration. The mechanistic link between altered CF properties and normalization of extrastriate pRFs is not directly tested, as group differences are not shown to covary across individuals or visual field locations. Finally, the interpretation of these changes as compensatory implies functional benefit, yet no behavioral or performance measures are provided to establish that the observed reorganization preserves or enhances visual function, leaving open whether these effects reflect adaptive optimization or passive downstream consequences of altered input.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper examines plasticity in early cortical (V1-V3) areas in an impressively large number of rod monochromats (individuals with achromatopia). The paper examines three things:

      (1) Cortical thickness. It is now well established that early complete blindness leads to increases in cortical thickness. This paper shows increased thickness confined to the foveal projection zone within achromats. This paper replicates the work by Molz (2022) and Lowndes (2021), but the detailed mapping of cortical thickness as a function of eccentricity and the inclusion of higher visual areas is particularly elegant.

      (2) Failure to show largescale reorganization of early visual areas using retinotopic mapping. This is a replication of a very recent study by Molz et al. but I believe, given anatomical variability (and the very large n in this study) and how susceptible pRF findings are to small changes in procedure, this replication is also of interest.

      (3) Connective field modelling, examining the connections between V3-V1. The paper finds changes in the pattern of connections, and smaller connective fields in individuals with achromatopsia than normally sighted controls, and suggests that these reflect compensatory plasticity, with V3 compensating for the lower resolution V1 signal in individuals with achromatopsia.

      Strengths:

      This is a carefully done study (both in terms of data collection and analysis) that is an impressive amount of work. I have a number of methodological comments but I hope they will be considered as constructive engagement - this work is highly technical with a large number of factors to consider.

      Weaknesses:

      (1) Effects of eye-movements

      I have some concerns with how the effects of eye-movements are being examined. There are two main reasons the authors give for excluding eye-movements as a factor in their results. Both explanations have limitations.

      (a) The first is that R2 values are similar across groups in the foveal confluence. This is fine as far as it goes, but R2 values are going to be low in that region. So this shows that eyemovements don't affect coverage (the number of voxels that generate a reliable pRF), but doesn't show that eye-movements aren't impacting their other measures.

      We agree with the reviewer that eye movements could affect pRF measures. We have now also included data for all participants where we were able to obtain eye tracking measures and directly tested this relationship. Relevant results are copied below.

      Recap of results: 1) as expected gaze was less stable in achromats than controls, 2) achromats with more stable gaze did not show more activation in the scotoma projections zone, which we might have observed if fixation instability masks signals in this region 3) Gaze instability was not correlated with pRF size and eccentricity across V1 in achromats. We note that the relationship between nystagmus and visual sampling is complex - patients experience a stable image and may sample only during a specific phase of the eye movement. It is therefore not inherently clear if and how nystagmus affects pRF size.

      Relevant Manuscript text incorporating these analyses is copied below.

      To quantify eye movement, we used the following methods added to the manuscript:

      “Fixation stability

      Participants’ gaze was tracked throughout all pRF mapping runs. Collecting reliable gaze data from individuals with nystagmus is a challenge because out of the box calibration procedures mostly fail without stable fixation. To account for this, we implemented a post-hoc custom calibration procedure (Tailor et al., 2021). The eye-tracker was first precalibrated on a typically sighted individual. Then, before every other run, we collected gaze data from a 5-point fixation task (at fixation and above, below, left, and right of fixation at 5 eccentricity). This data allowed us to subsequently map the patient's recorded gaze coordinates to their precise locations on the screen. In 10 out of the 14 achromats we acquired reliable enough data to assess fixation stability.

      Calibration data processing: We first removed the first 0.5 seconds for each fixation location to allow for fixation to arrive on the target. We then performed (a) blink removal, (b) filtered out time points with eye movement velocity outliers (±2SD), and (c) filtered out any positions >3SDs to the left or right of the mean fixation location, and >1SD above or below. We took the median of the remaining gaze measurements as an approximate fixation estimate. The resulting 5 median fixation locations were used to fit an affine transformation that remapped the recorded gaze positions into screen space. 

      Quantifying fixation stability: after applying the transformation of the post-hoc calibration, data was filtered for blinks and extreme velocities (<2SD). For each functional run, fixation instability was measured as the standard deviation of gaze x-positions across 1second windows. Measures were then averaged across the two run repeats.”

      We report the resulting new fixation data results as follows:

      Results (coverage section):

      “Another potential confound in our findings is fixation instability. In pRF mapping, which is usually conducted under photopic (cone-dominant) conditions, unstable fixation can cause a signal drop in the foveal projection zone. As expected due to nystagmus, the achromatopsia group showed higher fixation instability compared to controls (rodselective: t<sub>(9.08)</sub>=-3.19, p=0.01; non-selective: t<sub<(9.41)</sub>=-4.88, p<0.001 degrees-offreedom corrected for unequal-variance; see Supplement Figure S2a). However, several lines of evidence suggest this instability cannot fully account for the lack of "filling in" in achromats. First, within the achromat group, we found no correlation between fixation stability and coverage (rod-selective: spearman-r<sub>(8)</sub> = -0.36, p=0.31; non-selective spearman-r<sub>(8)</sub>=0.07,p=0.85); Individuals with more stable, control-like fixation did not show more signal inside the scotoma (see Supplement 2). Second, in adults with achromatopsia, typically with less severe nystagmus (Kohl et al., 1993), two recent studies also found absence of filling in (Anderson et al., 2024; Molz et al., 2023).

      So, while we cannot fully exclude nystagmus masking foveal signals in the cortex of some patients, this converging evidence from structural and functional MRI measures across different studies and groups, strongly suggests that the deprived cortex does not substantially ‘fill in’ with peripheral rod inputs in achromatopsia.”

      Results (pRF size + eccentricity):

      “Larger pRFs indicate that neuronal populations in achromats’ V1 cortex, combine information across larger areas in visual space than in typically sighted controls. This could reflect true neural tuning differences as well as be driven by larger eye movement. However, fixation instability in achromats do not significantly correlate with pRF size in our sample (rod-selective: spearman-r<sub>(8)</sub> = -0.41, p=0.24; non-selective spearman-r<sub>(8)</sub>=0.37,p=0.29)

      It has been shown that fitting artefacts around scotoma edges, can give rise to similar outward eccentricity shifts (Binda et al., 2013). However, when accounting for fitting artefacts around the foveal scotoma edge by modelling the rod-free zone during pRF fitting, pRF size and eccentricity differences remain unchanged (see Supplement 3). Finally, we found no significant correlations between gaze stability and the eccentricity shift (rod-selective: spearman-r<sub>(8)</sub> = 0.58, p=0.08; non-selective spearman-r<sub>(8)</sub>=0.09,p=0.8, Supplement 4D)

      Together, these analyses reveal subtle differences in how V1 of achromats responds to rod signals outside the foveal zone, which are consistent with results from other studies (Molz et al. 2023, Anderson et al. 2024). While we found no direct evidence that these are being driven by confounding factors such as eye-movements or fitting artefacts, more work is needed to understand the underlying processes that give rise to these shifts.”

      The following text has been added to Supplement 2

      “As expected, achromats showed significant higher fixation instability compared to controls (as reported in the main text). We found no significant correlation between fixation instability and either coverage, pRF size, eccentricity in achromats. Results of Spearman R correlations in both rod- and non-selective conditions are reported in the figure. We note that the relationship between nystagmus and visual sampling is complex- patients experience a stable image and may sample only during specific eyemovement phases. It is therefore not fully clear if and how nystagmus should give rise to altered pRFs.”

      (b) The authors don't see a clear relationship between coverage and fixation stability. This seems to rest on a few ad hoc examples. (What happens if one plots mean fixation deviation vs. coverage (and sets the individuals who could not be calibrated as the highest value of calibrated fixation deviation. Does a relationship then emerge?).

      In any case, I wouldn't expect coverage to be particularly susceptible to eye-movements. If a voxel in the cortex entirely projects to the scotoma then it should be robustly silent. The effects of eye-movements will be to distort the size and eccentricity estimates of voxels that are not entirely silent.

      There are many places in the paper where eye-movements might be playing an important role. 

      Examples include the larger pRF sizes observed in achromats. Are those related to fixation instability?

      We thank the reviewer for their comment. As detailed in our previous response, we have now extracted fixation instability data from additional patients and have expanded our discussion of its potential effects throughout the manuscript.

      Given that fixation instability is expected to increase pRF size by a fixed amount, that would explain why ratios are close to 1 in V3 (Figure 4).

      We agree with the reviewer’s point, that the ratio change on its own is not strong evidence of compensation, this analysis was meant to complement the CF result. The plot in Figure 4 is intended to reconcile the connective field (CF) and pRF results. Its purpose is to illustrate that even though larger pRFs in achromats might seem counterintuitive alongside their smaller V3 CF sizes, the pRF data do not contradict the CF findings but they are in fact consistent with one another. We also agree that there are alternative explanations for the differences in pRF size, such as fixation stability, and we have now added this point to the text.

      Results (CF size):

      “To understand how this finer cortical sampling in V3 (smaller connective fields) impacts visual processing, we consider its effect on population receptive fields (pRFs). In V1, pRF sizes in achromats were significantly larger than in controls for both stimulus conditions, indicating coarser spatial tuning at the cortical input stage (Figure 4C, left). By selectively sampling from a smaller area of the V1 surface (smaller CFs), V3 can effectively compensate for this coarser input. If so, this process should result in a relative normalisation of pRF size in V3 compared to V1 (Figure 4C, right).

      To test this prediction, we plotted the ratio of pRF sizes between achromats and controls, where a value of 1 indicates parity between the groups (Figure 4B). As our compensatory connective field hypothesis predicts, the ratio was closer to 1 in V3 than in V1 across both stimulus conditions, confirming the pRF size difference was significantly reduced at the higher cortical stage. Together this shows converging evidence across the two models (pRF and CF) of hierarchical refinement as a possible compensatory mechanism, where V3's altered connectivity helps to normalize the processing of degraded sensory input from V1.”

      Discussion:

      “The hierarchical reorganisation observed in V3 is unlikely to be driven by fixation instability. Connective field (CF) estimates are robust to eye movements (Tangtartharakul et al., 2023), because they are anchored to V1 inputs rather than absolute screen position. Considered alone, the pRF results could alternatively be explained by eye movements introducing a fixed size offset that affects smaller V1 pRFs more strongly than those in V3. While we found no evidence for this relationship between pRF size and gaze measures in our patients, we cannot fully rule out the possibility. Nevertheless, the internal consistency between the CF and pRF measures provides a more parsimonious account; that sampling across the hierarchy accounts for coarser tuning at the input stage.”

      (2) Topography

      The claim of no change in topography is a little confusing given that you do see a change in eccentricity mapping in achromats. 

      Either this result is real, in which case there *is* a change in topography, albeit subtle, or it's an artifact. 

      Perhaps these results need a little bit of additional scrutiny. 

      One reason for concern is that you see different functions relating eccentricity to V1 segments depending on the stimulus. That almost certainly reflects biases in the modelling, not reorganization - the curves of Figure 2D are exactly what Binda et al. predict. 

      Another reason for concern is that I'm very surprised that you see so little effect of including/not including the scotoma - the differences seem more like what I'd expect from simply repeating the same code twice. (The quickest sanity check is just to increase the size of the estimated scotoma to be even bigger?).

      We thank the reviewer for their comment. We have double-checked our scotoma modelling, confirming its correct implementation. The results of the scotoma modelling are not identical to the full one, just similar (see below).

      Previous studies on “artificial scotomas” (such as the one reported by Binda et al.) have shown mixed results. While Binda and colleagues found that modelling artificial scotomas normalised pRF shifts, others found no effect (Haak et al. 2012, Prabhakaran et al. 2020). Notably, the rodfree zone in achromatopsia is considerably smaller (~0.5° radius) than most tested artificial scotomas. Moreover, it is unclear whether scotoma modelling is beneficial in clinical populations as artificial scotomas (screen-based masking) are not equivalent to retinal scotomas from inactive photoreceptors. A recent achromatopsia study (Anderson et al. 2024) also found no change in pRF estimates with scotoma modelling.

      In our scotoma analyses, we found meaningful differences only in the non-selective condition in controls where cones in the rod-free zone are stimulated - which would be the main expected effect of this modelling exercise (see below). In all other conditions (rod-selective in controls, both conditions in achromats), only rods are stimulated, we found no difference in coverage, eccentricity or pRF size when modelling the scotoma likely because the foveal signal is weak/absent, and did not contribute much to pRF estimates in the unmasked analyses.

      This means we cannot account for the eccentricity shift as an edge effect with this scotoma model – but we remain cautious about interpreting it as real. This is because first, as we mention in the paper, in the non-selective condition, which has a higher signal-to-noise ratio, the eccentricity estimates in achromats match those of the control group's rod system. Second, it is still possible that the observed shift is an artefact of modelling that was not accounted for by the approach of scotoma modelling.

      Our claim of "no change in topography" specifically referred to the absence of "filling-in" as measured by cortical coverage - the percentage of activated tissue regardless of fitted parameters. However, to avoid confusing given the eccentricity and pRF size results we now rephrased our claim.

      Abstract:

      “Cortical input stages (V1) exhibited high stability, with input-deprived cortex showing no retinotopic remapping and exhibiting structural hallmarks of deprivation.”

      Results (pRF eccentricity):

      “It has been shown that fitting artefacts around scotoma edges, can give rise to similar outward eccentricity shifts (Binda et al., 2013). However, when accounting for fitting artefacts around the foveal scotoma edge by modelling the rod-free zone during pRF fitting, pRF size and eccentricity differences remain unchanged (see Supplement 3). Finally, we found no significant correlations between gaze stability and the eccentricity shift (rod-selective: spearman-r<sub>(8)</sub> = 0.58, p=0.08; non-selective spearman-r<sub>(8)</sub>=0.09,p=0.8, Supplement 4D)

      Together, these analyses reveal subtle differences in how V1 of achromats responds to rod signals outside the foveal zone, which are consistent with results from other studies (Molz et al. 2023, Anderson et al. 2024). While we found no direct evidence that these are being driven by confounding factors such as eye movements or fitting artefacts, more work is needed to understand the underlying processes that give rise to these shifts.”

      To better illustrate the effect of scotoma modelling text has been added to Supplement 3:

      “Studies on artificial scotomas, where part of the visual field is masked, suggest that pRF estimates of eccentricity and size can be biased by fitting scotoma-edge artefacts, and that these can be mitigated by modelling the scotoma in the pRF fitting procedure (e.g., Binda et al. 2013).

      We therefore repeated the pRF modelling procedure with the rod-scotoma being modelled as a black oval mask (1.25°x0.9°) over the stimulus aperture model. As expected, a visible difference between the two models is only apparent in the nonselective condition in controls where the cones in the rod-free zone are being stimulated. In all the other conditions (rod-selective in controls, and both stimulation conditions in achromats) only the rods are stimulated, therefore the masked stimulus still matches the retinal activation, and no major differences can be observed. Performing the same statistical tests applied to the full model in the main text yields equivalent results of equivalent coverage in the rod-selective condition, with equivalent coverage across groups(t(47) = 0.78, p=0.43, BF10=0.31) and controls show a higher coverage in the non-selective stimulation condition compared to achromats (Mann U(52)=141, p<0.01; unequal variance, reverted to non-parametric).

      This consistency in pRF properties when modelling the rod scotoma, is in line with previous results from scotoma modelling; While Binda and colleagues found that this normalised pRF shifts, others found no effect (Haak et al. 2012, Prabhakaran et al. 2020). Notably, the rod-free zone in achromatopsia is considerably smaller (~0.5° radius) than most tested artificial scotomas, and as artificial scotomas (screen-based masking) are not equivalent to retinal scotomas from inactive photoreceptors, it is unclear how artificial scotoma findings generalise to clinical populations. Our results are in line with a recent achromatopsia study (Anderson et al. 2024) which also found no change in pRF estimates with scotoma modelling.”

      I'd also look at voxels that pass an R2>0.2 threshold for both the non-selective and selective stimulus. Are the pRF sizes the same for both stimuli? Are the eccentricity estimates? If not, that's another clear warning sign.

      Comparable results were obtained when using higher R2 thresholds. These results are now included in Supplement 6.

      (3) Connective field modelling

      Let's imagine a voxel on the edge of the scotoma. It will tend to have a connective field that borders the scotoma, and will be reduced in size (since it will likely exclude the cortical region of V1 that is solely driven by resting state activity). This predicts your rod monochromat data. The interesting question is why this doesn't happen for controls. One possibility is that there is topdown 'predictive' activity that smooths out the border of the scotoma (there's some hint of that in the data), e.g., Masuda and Wandell.

      One thing that concerns me is that the smaller connective fields don't make sense intuitively. When there is a visual stimulus, connective fields are predominantly driven by the visual signal. In achromats, there is a large swath of cortex (between 1-2.5 degrees) which shows relatively flat tuning as regards eccentricity. The curves for controls are much steeper, See Figure 2b. This predicts that visually driven connective fields should be larger for achromats. So, what's going on?

      The reviewer raises interesting points about the interpretation of our connective field results. The possibility of differential top-down modulation between controls and achromats is intriguing, however it is not supported by the data, if top-down modulation is activating foveal V1 in controls then we shouldn’t see a drop in the amount of significant vertices sampling from the fovea in the rod-selective condition compared to the non-selective, but in fact we do see quite a large drop in the amount of significant vertices in that area in the rod-selective condition. Therefore, at the moment we do not think there is strong basis to assume our data could be explained by achromats lacking top-down predictive activity in the scotoma area that is present in controls.

      Regarding the concern about smaller CFs seeming counterintuitive given the flat eccentricity tuning in achromats' V1: we believe there is not a straightforward prediction from pRF properties to CF sizes. The relationship between V1 pRF characteristics and V3 CF sampling is complex and not well-established in the literature, and the two can be decoupled to some degree. For instance, in our data, controls show flat V1 pRF sizes in the rod-selective condition (similar to achromats), yet their V3 CF sizes maintain the typical eccentricity-dependent increase seen in the non-selective condition. This suggests that CF size patterns don't simply mirror V1 pRF properties or visual stimuli responses.

      Importantly, CF modelling fundamentally differs from pRF analysis in how it might be affected by scotomas. Unlike pRF analysis where a scotoma creates a "silent" region in visual space, in CF modelling the deprived cortex remains physically present and continues generating neural signals (albeit not visually-driven ones). If V3-V1 connectivity were anatomically fixed, V3 would continue sampling from deprived V1 regions even if they do not produce visual-driven signals. A change in this sampling pattern, as we see in our data, is therefore evidence for plasticity.

      Our data support this interpretation. First, in achromats, the CF size pattern observed cannot be easily explained by scotoma-edge artefacts. V3 vertices sampling from the immediate vicinity of the scotoma (1°-3°) show CF sizes comparable to controls. The effect is only significant further away from the scotoma (4°-6°).

      Second, to assess how the presence of a scotoma affects CF measure we can compare the two conditions in the controls, since the rod-selective condition has a scotoma present and the nonselective condition does not. For this purpose, we performed an additional analysis, quantifying on a vertex-by-vertex level the differences in CF fitted parameters between the two stimulation conditions across V1. See results below. In achromats there are no systematic shifts between the stimulation conditions, as expected as both are rod-driven. In controls, this analysis reveals only subtle shifts (~0.45° in the rod-selective condition). CF size has also changed slightly although not significantly different from that observed in achromats. These shifts are much smaller than the CF size and eccentricity differences between controls and achromats, so we consider it unlikely that our findings are driven by scotoma artefacts.

      Author response image 1.

      Results (CF size):

      “The significant CF size differences are unlikely to be a model-fitting bias around a scotoma edge, as V3 vertices sampling from the immediate vicinity of the scotoma (1°3°) show CF sizes comparable to controls. The significant reduction in CF size occurs only further in the periphery (4°-6°), in regions that are primarily stimulus-driven.

      To understand how this finer cortical sampling in V3 (smaller connective fields) impacts visual processing, we consider its effect on population receptive fields (pRFs). In V1, pRF sizes in achromats were significantly larger than in controls for both stimulus conditions, indicating coarser spatial tuning at the cortical input stage (Figure 4C, left). By selectively sampling from a smaller area of the V1 surface (smaller CFs), V3 can effectively compensate for this coarser input. If so, this process should result in a relative normalisation of pRF size in V3 compared to V1 (Figure 4C, right).

      To test this prediction, we plotted the ratio of pRF sizes between achromats and controls, where a value of 1 indicates parity between the groups (Figure 4B). As our compensatory connective field hypothesis predicts, the ratio was closer to 1 in V3 than in V1 across both stimulus conditions, confirming the pRF size difference was significantly reduced at the higher cortical stage. Together this shows converging evidence across the two models (pRF and CF) of hierarchical refinement as a possible compensatory mechanism, where V3's altered connectivity helps to normalize the processing of degraded sensory input from V1.”

      Discussion (added paragraph):

      “The hierarchical reorganisation observed in V3 is unlikely to be driven by fixation instability. Connective field (CF) estimates are robust to eye movements (Tangtartharakul et al., 2023), because they are anchored to V1 inputs rather than absolute screen position. Considered alone, the pRF results could alternatively be explained by eye movements introducing a fixed size offset that affects smaller V1 pRFs more strongly than those in V3. While we found no evidence for this relationship between pRF size and gaze measures in our patients, we cannot fully rule out the possibility. Nevertheless, the internal consistency between the CF and pRF measures provides a more parsimonious account; that sampling across the hierarchy accounts for coarser tuning at the input stage.”

      The beta parameter is not described (and I believe it can alter connective field sizes).

      In Author response image 2, we plot the beta parameter of the pRF modelling in V1 with no R<sup>2</sup> filtering, error bars are 95% CIs:

      Author response image 2.

      The reviewer did not specify how beta might alter connective field sizes. We assume he meant that as in pRF mapping, the slope of activity from deprived to non-deprived cortex will artefactually create a CF model fit with smaller CF sizes. To test this, we calculated the slope of beta values between 0° and 3° in each participant in the rod-selective condition, as this range includes the scotoma and the area at the edge of the scotoma. We then used the slope as a covariate in an ANCOVA when comparing the CF sizes across groups in each sampled V1 segment. Accounting for the beta slope of V1 did not change the reported results. This analysis still shows smaller CF sizes in V3 in the rod-selective conditions between 4°-6° eccentricity – these differences remain significant (p<0.001 for 4°-5° and p<0.05 for 5°-6° when comparing achromats vs controls).

      Similarly, it's possible to get very small connective fields, but there wasn't a minimum size described in the thresholding.

      CF sizes were fit with a grid fit. Possible values were [0.5,1,2,3,4,5,7,10]. Therefore, the minimum size is 0.5. Filtering out the smallest connective field sizes does not change the results:

      Author response image 3.

      I might be missing something obvious, but I'm just deeply confused as to how the visual maps and the connectome maps can provide contradictory results given that the connectome maps are predominantly determined by the visual signal. Some intuition would be helpful.

      We agree that this appears counterintuitive, and now added further clarification. The two models (pRF and CF) fundamentally differ in what they measure and how they relate to visual processing. V1 pRF sizes reflect the relationship between neural activity and visual stimuli - essentially how much of a visual stimulus drives a voxel's response - while V3 CF sizes reflect how V3 samples from the V1 cortical surface, indicating how many V1 voxels contribute to a V3 voxel's activity.

      The measures constrain each other, as a V3 voxel's pRF size is expected to match the pooling of its connected V1 inputs. But they can be decoupled: A V3 voxel could sample from a small area of V1 cortex (a small CF in mm) that happens to represent a large area of visual space if those V1 voxels have large pRFs. The aim of Figure 4B is to clarify that the measures are consistent with one another even though they diverge in direction. In achromats, where V1 voxels have larger pRFs (coarser spatial resolution), V3 appears to compensate by sampling more selectively from V1 via smaller CF sizes. Theoretically, this should reduce the pRF size difference between controls and patients in V3, a prediction that our data supports.

      Results (CF size):

      “To understand how this finer cortical sampling in V3 (smaller connective fields) impacts visual processing, we consider its effect on population receptive fields (pRFs). In V1, pRF sizes in achromats were significantly larger than in controls for both stimulus conditions, indicating coarser spatial tuning at the cortical input stage (Figure 4C, left). By selectively sampling from a smaller area of the V1 surface (smaller CFs), V3 can effectively compensate for this coarser input. If so, this process should result in a relative normalisation of pRF size in V3 compared to V1 (Figure 4C, right).

      To test this prediction, we plotted the ratio of pRF sizes between achromats and controls, where a value of 1 indicates parity between the groups (Figure 4B). As our compensatory connective field hypothesis predicts, the ratio was closer to 1 in V3 than in V1 across both stimulus conditions, confirming the pRF size difference was significantly reduced at the higher cortical stage. Together this shows converging evidence across the two models (pRF and CF) of hierarchical refinement as a possible compensatory mechanism, where V3's altered connectivity helps to normalize the processing of degraded sensory input from V1.”

      Discussion (added paragraph):

      “The hierarchical reorganisation observed in V3 is unlikely to be driven by fixation instability. Connective field (CF) estimates are robust to eye movements (Tangtartharakul et al., 2023), because they are anchored to V1 inputs rather than absolute screen position. Considered alone, the pRF results could alternatively be explained by eye movements introducing a fixed size offset that affects smaller V1 pRFs more strongly than those in V3. While we found no evidence for this relationship between pRF size and gaze measures in our patients, we cannot fully rule out the possibility. Nevertheless, the internal consistency between the CF and pRF measures provides a more parsimonious account; that sampling across the hierarchy accounts for coarser tuning at the input stage.”

      Some analyses might also help provide the reader with insight. For example, doing analyses separately on V3 voxels that project entirely to scotoma regions, project entirely to stimulusdriven regions, and V3 voxels that project to 'mixed' regions.

      We agree that it is important to plot the connective field dynamics across the scotoma region.

      In Figure 4A we split the V3 vertices based on the V1 area they sample from. Therefore the 0°-1° would be considered as mainly sampling from the “scotoma” region and the higher the eccentricity is, the less “scotoma” it includes. The V3 vertices that have a significantly smaller CF size compared to controls are those sampling from mostly if not entirely stimulusdriven regions 4°-5° and 5°-6°. We are not sure how further binning the data by within, across and outside scotoma would be more informative.

      However, in Author response image 4, we plot in more details the distribution of CF sizes sampling from a V1 segment clearly inside and clearly outside the scotoma. The top figure shows the CF size distribution of V3 vertices that sample from a V1 0°-1° segment, where V1 is deprived of input due to the rod scotoma. In achromats, there is a clear drop in vertices with a very small (0.5) CF size. The bottom figure shows the distribution of V3 vertices that sample from the V1 4°-5° segment which falls outside the scotoma and shows a significant difference in CF size across the groups. Here in achromats you can see a drop in larger V3 CF sizes sampling from the V1 region, and an increase in smaller ones (note that this further addresses a previous concern that connective field differences across groups are solely driven by very small CFs).

      Author response image 4.

      Following the reviewer’s comment we have added the following statement in the results section discussing CF size:

      “The significant CF size differences are unlikely to be a model-fitting bias around a scotoma edge, as V3 vertices sampling from the immediate vicinity of the scotoma (1°3°) show CF sizes comparable to controls. The significant reduction in CF size occurs only further in the periphery (4°-6°), in regions that are primarily stimulus-driven.”

      The finding that pRF sizes are larger in achromats by a constant factor as a function of eccentricity is what differences in eye-movements would predict. It would be worth examining the relationship between pRF sizes and fixation stability.

      We found no relationship between fixation stability and pRF size in V1, although as we explain in response to an earlier point, this does not fully exclude the reviewers alterative explanation, which we now add to the discussion.

      Discussion:

      “The hierarchical reorganisation observed in V3 is unlikely to be driven by fixation instability. Connective field (CF) estimates are robust to eye movements (Tangtartharakul et al., 2023), because they are anchored to V1 inputs rather than absolute screen position. Considered alone, the pRF results could alternatively be explained by eye movements introducing a fixed size offset that affects smaller V1 pRFs more strongly than those in V3. While we found no evidence for this relationship between pRF size and gaze measures in our patients, we cannot fully rule out the possibility. Nevertheless, the internal consistency between the CF and pRF measures provides a more parsimonious account; that sampling across the hierarchy accounts for coarser tuning at the input stage.”

      Reviewer #2 (Public review):

      Summary:

      The authors inspect the stability and compensatory plasticity in the retinotopic mapping in patients with congenital achromatopsia. They report an increased cortical thickness in central (eccentricities 0-2 deg) in V1 and the expansion of this effect to V2 (trend) and V3 in a cohort with an average age of adolescents.

      In analyzing the receptive fields, they show that V1 had increased receptive field sizes in achromats, but there were no clear signs of reorganization filling in the rod-free area. In contrast, V3 showed an altered readout of V1 receptive fields. V3 of achromats oversampled the receptive fields bordering the rod-free zone, presumably to compensate and arrive at similar receptive fields as in the controls.

      These findings support a retention of peripheral-V1 connectivity, but a reorganization of later hierarchical stages of the visual system to compensate for the loss, highlighting a balance between stability and compensation in different stages of the visual hierarchy.

      Strengths:

      The experiment is carefully analyzed, and the data convey a clear and interesting message about the capacities of plasticity. 

      Weaknesses:

      The existence of unstable fixation and nystagmus in the patient group is alluded to, but not quantified or modeled out in the analyses. The authors may want to address this possible confound with a quantitative approach.

      We have responded to this in the “Recommendations for the authors” section of this reviewer, as they included a more detailed description of these points there.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) I think the term rod monochromats should be included early in the paper since it's a more intuitive term to describe this population.

      We agree with the reviewer that the term “rod monochromats” is more intuitive as it clarifies the retinal source of the disease but have chosen the term achromats for consistency with a wide literature of published work in this group, including our own and our close collaborators’. To clarify, in the first mention of the group as achromats in the introduction we have now added this term:

      “Achromatopsia (also known as rod monochromacy) causes cone photoreceptors in the retina to be inactive from birth (Aboshiha et al., 2014).”

      (2) The paper essentially contains two definitions of 'eccentricity'. One (atlas/segments) comes from the Benson atlas and the other (functional) comes from pRF mapping. It would be good to make this distinction terminology clearer earlier in the paper. It would also be good to use more consistent terminology. I assume 'sampled atlas V1 eccentricity' in 3A is the same as 'V1 segment' in 1A?

      For consistency we have now referred to these as V1 segment and sampled V1 segment in the figures when describing the atlas-based definition, and eccentricity for the measured pRF-based eccentricity.

      (3) The 'stability vs. plasticity' framing in the introduction could be tightened slightly.

      We have made the following changes following the reviewer’s comment:

      “In the visual domain, the focal point of the debate on plasticity and stability has hinged on the extent to which retinal input deprivation can drive local reorganisation in early visual cortex, for example, for deprived tissue to take on inputs from spared retinal locations (Adams et al., 2007; Baker et al., 2005, 2008; Baseler et al., 2002, 2011; Calford et al., 2005; Dilks et al., 2009; Dumoulin & Knapen, 2018; Ferreira et al., 2016; Goesaert et al., 2014; Haak et al., 2015; Molz et al., 2023; Ritter et al., 2019; Schumacher et al., 2008). In reality visual impairment is a more global phenomenon, affecting all levels of visual processing, with complex dynamics beyond constricted local retinocortical projection zones(Carvalho et al., 2019).”

      (4) Figure 1A, define the x axis as degrees.

      We have now added the ° sign to all the tick labels indicating Benson map eccentricity.

      (5) Figure 2B, is there room for pictures of the silent substitution/standard stimulus

      We have now added images in a Supplement 5 to avoid cluttering the main Figure 2B

      (6) Figure 2

      Panel A has a slightly weird organization. The reader is supposed to compare the square symbols to each other, and the circles to each other, why not organize the figure so they are adjacent in the graph (i.e. non selective control, non-selective achromat, selective control, selective achromat)? That also helps the reader orient that in the non-selective conditions you have almost complete pRF coverage. 

      We have taken on the reviewer’s suggestion and changed the order.

      In the inset, maybe use empty symbols? That's the traditional way to say that the square/circle applies to both red and black.

      We prefer the current format.

      Figure 2C - the symbols change to circles? Why not keep the symbols of A?

      We have now changed the symbols of 2C&D.

      I'd put the non-selective maps above the selective maps?

      We appreciate the feedback but prefer to keep it as it is, as we feel the critical point is conveyed by the rod maps.

      (7) 'We propose a new hierarchical model of neural adaptation'. These ideas are hardly new. There are also other models, that would explain your data (cumulative plasticity) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5953572/

      We thank the reviewer for the reference. We have now cited it in our discussion and removed the word “new” form the mentioned sentence.

      “Therefore, there is theoretically broader scope for experience-dependent reweighting of inputs (Beyeler et al., 2017; Makin & Krakauer, 2023) and to optimise use of inputs that are still available, more reliable, or more relevant in the impaired system. Conversely, higher-order visual areas may appear more plastic simply because they integrate the cumulative effects of learning from multiple lower stages (Beyeler et al., 2017).”

      We propose a hierarchical model of neural adaptation…” [deleted the word new]

      (8) Line 508. No image of the stimulus is contained in the paper

      Corrected

      (9) Line 620. I believe the Figure is 1B, not 1C.

      Corrected

      (10) Figure 4A. CF Size - add mm2 to the axes.

      Corrected

      Reviewer #2 (Recommendations for the authors):

      I am not an expert on pRF mapping, and as such, I am unsure how to relate to pRF mapping performed in patients with unstable fixation (not quantified, but referred to) and nystagmus, such as the achromatic population here. Since the majority of the results hinge on this analysis, I would appreciate more data about the differences between the groups. Supplement 2, which is meant to speak to this, shows only the data from 3 typical participants, and in itself is not evidence for "no correlation between stable fixation and enhanced foveal". Additionally, I'd appreciate a clear methods explanation of how the authors address these confounds; this is too important a concern to be left for the discussion section.

      We agree with the reviewer that eye movements could affect pRF measures. We have now also included data for all participants where we were able to obtain eye tracking measures and directly tested this relationship. Relevant results are copied below.

      Recap of results: 1) as expected gaze was less stable in achromats than controls, 2) achromats with more stable gaze did not show more activation in the scotoma projections zone, which we might have observed if fixation instability masks signals in this region 3) Gaze instability was not correlated with pRF size and eccentricity across V1 in achromats. We note that the relationship between nystagmus and visual sampling is complex - patients experience a stable image and may sample only during a specific phase of the eye movement. It is therefore not inherently clear if and how nystagmus affects pRF size.

      Relevant Manuscript text incorporating these analyses is copied below.

      To quantify eye movement, we used the following methods added to the manuscript:

      “Fixation stability

      Participants’ gaze was tracked throughout all pRF mapping runs. Collecting reliable gaze data from individuals with nystagmus is a challenge because out of the box calibration procedures mostly fail without stable fixation. To account for this, we implemented a post-hoc custom calibration procedure (Tailor et al., 2021). The eye-tracker was first precalibrated on a typically sighted individual. Then, before every other run, we collected gaze data from a 5-point fixation task (at fixation and above, below, left, and right of fixation at 5 eccentricity). This data allowed us to subsequently map the patient's recorded gaze coordinates to their precise locations on the screen. In 10 out of the 14 achromats we acquired reliable enough data to assess fixation stability.

      Calibration data processing: We first removed the first 0.5 seconds for each fixation location to allow for fixation to arrive on the target. We then performed (a) blink removal, (b) filtered out time points with eye movement velocity outliers (±2SD), and (c) filtered out any positions >3SDs to the left or right of the mean fixation location, and >1SD above or below. We took the median of the remaining gaze measurements as an approximate fixation estimate. The resulting 5 median fixation locations were used to fit an affine transformation that remapped the recorded gaze positions into screen space.

      Quantifying fixation stability: after applying the transformation of the post-hoc calibration, data was filtered for blinks and extreme velocities (<2SD). For each functional run, fixation instability was measured as the standard deviation of gaze x-positions across 1second windows. Measures when then averaged across the two run repeats.”

      Results (coverage section):

      “Another potential confound in our findings is fixation instability. In pRF mapping, which is usually conducted under photopic (cone-dominant) conditions, unstable fixation can cause a signal drop in the foveal projection zone. As expected due to nystagmus, the achromatopsia group showed higher fixation instability compared to controls (rodselective: t<sub>(9.08)</sub>=-3.19, p=0.01; non-selective: t<sub<(9.41)</sub>=-4.88, p<0.001 degrees-offreedom corrected for unequal-variance; see Supplement Figure S2a). However, several lines of evidence suggest this instability cannot fully account for the lack of "filling in" in achromats. First, within the achromat group, we found no correlation between fixation stability and coverage (rod-selective: spearman-r<sub>(8)</sub> = -0.36, p=0.31; non-selective spearman-r<sub>(8)</sub>=0.07,p=0.85); Individuals with more stable, control-like fixation did not show more signal inside the scotoma (see Supplement 2). Second, in adults with achromatopsia, typically with less severe nystagmus (Kohl et al., 1993), two recent studies also found absence of filling in (Anderson et al., 2024; Molz et al., 2023).

      So, while we cannot fully exclude nystagmus masking foveal signals in the cortex of some patients, this converging evidence from structural and functional MRI measures across different studies and groups, strongly suggests that the deprived cortex does not substantially ‘fill in’ with peripheral rod inputs in achromatopsia.”

      Results (pRF size + eccentricity):

      “Larger pRFs indicate that neuronal populations in achromats’ V1 cortex, combine information across larger areas in visual space than in typically sighted controls. This could reflect true neural tuning differences as well as be driven by larger eye movement. However, fixation instability in achromats do not significantly correlate with pRF size in our sample (rod-selective: spearman-r<sub>(8)</sub> = -0.41, p=0.24; non-selective spearman-r<sub>(8)</sub>=0.37,p=0.29)

      It has been shown that fitting artefacts around scotoma edges, can give rise to similar outward eccentricity shifts (Binda et al., 2013). However, when accounting for fitting artefacts around the foveal scotoma edge by modelling the rod-free zone during pRF fitting, pRF size and eccentricity differences remain unchanged (see Supplement 3). Finally, we found no significant correlations between gaze stability and the eccentricity shift (rod-selective: spearman-r<sub>(8)</sub> = 0.58, p=0.08; non-selective spearman-r<sub>(8)</sub>=0.09,p=0.8, Supplement 4D)

      Together, these analyses reveal subtle differences in how V1 of achromats responds to rod signals outside the foveal zone, which are consistent with results from other studies (Molz et al. 2023, Anderson et al. 2024). While we found no direct evidence that these are being driven by confounding factors such as eye-movements or fitting artefacts, more work is needed to understand the underlying processes that give rise to these shifts.”

      The following text has been added to Supplement 2

      “As expected, achromats showed significant higher fixation instability compared to controls (as reported in the main text). We found no significant correlation between fixation instability and either coverage, pRF size, eccentricity in achromats. Results of Spearman R correlations in both rod- and non-selective conditions are reported in the figure. We note that the relationship between nystagmus and visual sampling is complex- patients experience a stable image and may sample only during specific eyemovement phases. It is therefore not fully clear if and how nystagmus should give rise to altered pRFs.”

      The field connectivity analysis similarly seems to be used only on task data from the same design; if it was replicated from resting-state data, that would be a good way to show consistency which is independent of measures requiring fixation. 

      We agree that resting-state data would be valuable; however, we did not collect such data in these individuals due to time limitations. Instead, we demonstrate the consistency and reliability of our results by replicating our findings across two different stimulation conditions (rod-selective and non-selective), which differ in luminance, contrast and signal amplitude in both groups and for controls also in the photoreceptors involved. The convergence of results across these distinct visual conditions strengthens our confidence in the reliability of the observed effects. Also, notably, CF estimates have been shown to be robust to large eye movements, and therefore also to differences in fixation stability across groups (Tangtartharakul et al., 2023).

      The authors may want to contextualize their findings in relation to what reorganization exists in cases of late-onset loss of part of the visual field on one hand (stroke recovery), and in the case of complete blindness from early life on the other, as both speak to different levels of plasticity the visual system is capable of.

      We thank the reviewer for their comment and have added a new paragraph discussing this topic.

      Discussion:

      “Our findings on hierarchical adaptation have broader implications for other visual disorders, depending on their timing and nature. For instance, a central scotoma acquired in adulthood, as in macular degeneration, may not trigger the same V3 sampling shifts (Haak et al., 2016), suggesting a sensitive window for this form of plasticity, after which connective fields remain more stable. This also raises questions about congenital blindness, where the absence of any driving input could lead to weakening or repurposing of hierarchical connections (Saccone et al., 2024). Moreover, principles may differ between a deprived but structurally intact cortex, as in retinal dystrophies, and a physically damaged cortex, as in stroke. In the latter, more extensive reorganisation may be required to sample effectively from surviving, and potentially disparate, regions of V1. Perceptual training effects in stroke rehabilitation may reflect such dynamics (Cavanaugh et al., 2025; Elshout et al., 2021).”

      A more minor point: Can the authors clarify what the dark adaptation is used for, and provide the supplementary analysis showing that the duration difference for some of the participants didn't impact the results (stated but not shown).

      The dark adaptation period before the rod-selective condition allowed rod photoreceptors to recover from bleaching caused by prior mesopic light exposure, ensuring optimal rod sensitivity under scotopic conditions. To verify that our 15-minute adaptation period was sufficient, we tested 10 control participants with an extended 45-minute adaptation period. As we found no differences in the resulting rod maps between standard and extended adaptation protocols, these participants were combined with the main control group for all analyses. Author response image 5 are the plots for the two dark adaptation periods.

      Author response image 5.

    1. eLife Assessment

      This valuable study presents a hierarchical computational model that integrates locomotion, navigation, and learning in Drosophila larvae. The evidence supporting the model is convincing, as it qualitatively replicates empirical behavioral data. While some simplifications in neuromechanical representation and sensory-motor integration are limiting factors, the reported modular framework will be of interest for computational modeling of biological movement and adaptive behavior.

    2. Reviewer #1 (Public review):

      Summary:

      The paper presents a three-layered hierarchical model for simulating Drosophila larva locomotion, navigation, and learning. The model consists of a basic locomotory layer that generates crawling and turning using a coupled-oscillator framework, incorporating intermittency in movement through alternating runs and pauses. The intermediate layer enables navigation by allowing larvae to actively sense and respond to odor gradients, facilitating chemotaxis. The adaptive learning layer integrates a spiking neural network model of the Mushroom Body, simulating associative learning where larvae modify their behavior based on past experiences. The model is validated through simulations of free exploration, chemotaxis, and odor preference learning, demonstrating close agreement with empirical behavioral data. This modular framework provides a valuable advance for modeling of larva behavior.

      Strengths:

      Every modeling paper requires certain assumptions and abstractions. The main strength of this paper lies in its modular and hierarchical approach to modeling behavior, making connections to influential theories of motor control in the brain. The authors also provide a convincing discussion of the experimental evidence supporting their layered behavioral architecture. This abstraction is valuable, offering researchers a useful conceptual framework and marking a significant step forward in the field. Connections to empirical larval movement are another major strength.

      Weaknesses:

      While the model represents a conceptual advance in the field, some of its assumptions and choices fall behind state-of-the-art approaches. One limitation is the paper's simplified representation of larval neuromechanics, in which the body is reduced to a two-segment structure with basic neural control. Another limitation is the absence of an explicit neuromuscular control system, which would better capture the role of segmental central pattern generators (CPGs) and neuronal circuits in regulating peristalsis and turning in Drosophila larvae. Many detailed neuromechanical models, as cited by the authors, have already been published. These abstractions overlook valuable experimental studies that detail segmental dynamics during crawling and the larval connectome.

      The strength of the model could also be its weakness. The model follows a subsumption architecture, where low-level behaviors operate autonomously while higher layers modulate them. However, this approach may underestimate the complexity of real neural circuits, which likely exhibit more intricate feedback mechanisms between sensory input and motor execution.

    3. Reviewer #2 (Public review):

      The paper proposes a hierarchically layer approach to larval locomotion, chemotaxis and learning. The model consists of a basic locomotor layer with two coupled oscillators, one for crawls and one for turns. The intermediate layer modulates the frequency and amplitude of tunings to enables chemotaxis. The higher layer, integrates a spiking neural network model of the Mushroom Body to modify the door valence in response to experience as during learning.

      The model is compared to experimental data with a good degree of agreement. This modular framework provides a valuable advance for modeling larva behavior.

      Strengths:

      A novel multilayer level model that reflects current thinking of the neuronal organisation of motor control. The model is very useful to investigate the neuronal architecture of central pattern generators<br /> and higher order motor control circuits that could be linked to larval connectome data.

      Weaknesses:

      All the limitations of the model are discussed and therefore the paper perfectly fits its purpose.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      We are happy to read that this reviewer considers the proposed behavioral architecture ‘a significant step forward in the field’, and that she/he recognizes the strengths of our work in the modular and hierarchical approach that provides connections to influential theories of motor control in the brain, in the experimental evidence it is based on, and in the valuable abstractions that we have chosen for the larval behavioral modeling.

      The reviewer raises important points about the simplifications we have made, both conceptually and in the specific implementation of larval behaviors. Our main goal in this study is to introduce a conceptual framework that integrates agent-based modeling with systems neuroscience models in a modular fashion. To serve this purpose, we aimed for a minimal yet representative implementation at the motor layer of the architecture, calibrated to larval locomotion kinematics. This choice enables efficient simulation while allowing us to test top-down modulation and adaptive mechanisms in higher layers without the computational overhead of a full neuromechanical model. In addition to chemotaxis, we have recently used this simplified approach to model thermotaxis in larvae (Kafle et al., 2025, iScience, DOI: https://doi.org/10.1016/j.isci.2025.112809).

      The reviewer notes the absence of explicit segmental neuromuscular control or central pattern generators (CPGs). We deliberately abstracted from these mechanisms, representing the larval body as two segments with basic kinematic control, to focus on reproducing overall locomotor patterns. This bisegmental simplification, which we illustrate in Supplemental Video “Bisegmental larva-body simplification”, retains the behavioral features relevant to our current aims. However, the modular structure of the framework means that more detailed neuromechanical models—incorporating CPG dynamics or connectome-derived circuit models—can be integrated in future work without altering the architecture as a whole.

      We fully agree that real neural circuits are more complex than a strict subsumption architecture implies. In the Drosophila larva, there is clear evidence for ascending sensory feedback from the motor periphery to premotor and higher brain circuits, as well as neuromodulatory influences. These add layers of complexity beyond the predominantly descending control in our present model. At the same time, both larval and adult connectome data show that across-level descending and ascending connections are sparse compared to the dense within-layer connectivity. We see value in casting our model as a hierarchical control system precisely to make the strengths and limitations of such an abstraction explicit. The revised manuscript will include further discussion of these points.

      In summary, our design choices reflect a trade-off: by limiting the biological detail in the lower layers, we gain computational efficiency and maintain a clear modular structure that can host models at different levels of abstraction. This ensures that the architecture remains both a tool for immediate behavioral simulation and a scaffold for integrating richer neural and biomechanical models as they become available.

      Reviewer #2 (Public review):

      We thank the reviewer for recognizing the novelty of our locomotory model, particularly the implementation of peristaltic strides based on our new analyses of empirical larval tracks, and for providing constructive feedback that will help us improve the manuscript.

      The reviewer highlights the need for clearer explanations of the chemotaxis and odor preference modules. We expand these sections in the revised manuscript with more explicit descriptions of model structure, parameterization, and calibration. As mentioned above, we have also prepared a separate preprint dedicated to the larvaworld Python package, which contains detailed implementation notes and hands-on tutorials that allow users to adapt or extend individual modules.

      Regarding the comparison to empirical behavior in chemotaxis, our present analysis is indeed primarily qualitative. However, we would like to emphasize that the temporal profile of odor concentration at the larval head in our simulations matches that measured in Gomez-Marin et al. (Nature Comm., 2011, DOI: https://doi.org/10.1038/ncomms1455) using only one additional free parameter, while all parameters of the basic locomotory model had been fitted to a separate exploration dataset before and were kept fixed in the chemotaxis experiments. In addition to the simulation of chemotaxis in the present paper, we recently used larvaworld in a practical model application to estimate a species-specific parameter of thermotaxis from experiments across different drosophilids (Kafle et al., 2025, iScience, DOI: https://doi.org/10.1016/j.isci.2025.112809).

      The preference index in our simulations was computed using the same definition as in the established experimental group assay for larval memory retention, enabling a direct quantitative comparison between simulated and empirical results. Variability in the simulated outcomes arose naturally from inter-individual differences in body length and locomotory parameters, derived from real larval measurements, as well as from the random initial orientation of each individual in the arena. These factors contributed to variation in individual tracks and ultimately produced preference index values that closely matched those observed experimentally. In the revised manuscript, we also discuss handedness, as highlighted by the reviewer, as another meaningful expression of inter-individual variability in Drosophila larvae and insects more generally.

      Finally, we acknowledge the reviewer’s concern about the scalability and broader applicability of the model. While the present paper focuses on three specific behavioral paradigms (exploration, chemotaxis, odor preference), the modular structure of the architecture is designed for flexibility: modules at any layer can be exchanged for more detailed or alternative implementations, and new sensory modalities or behaviors can be integrated without redesigning the system. The larvaworld package, associated codebase, and documentation are openly available to encourage adoption and adaptation by the larval research community.

      Reviewer #3 (Public review):

      This public review provides an excellent account of our central aim to build an easily configurable, well-documented platform for organism-scale behavioral simulation and we are happy to read that the reviewer considers this an excellent goal.

      We thank the reviewer for her/his account of our well-organized code using contemporary Python tooling. We are currently further improving code readability and code documentation, and we will release a new version of the larvaworld Python package. We further agree with the reviewer’s assessment that understanding the model calibration currently requires reading of the appendix. For the revised manuscript we thus aim at improving our description of all calibration and modeling steps along the way. We will also make sure to improve the description of the experimental datasets used for calibration.

      We recognize that our description of the paper’s scientific contribution could be clearer. In revision, we will sharpen the Introduction and Discussion to highlight our main contributions:

      (1) Promoting a shift from isolated neural circuit modeling to integrated agent-based simulations in realistic environments.

      (2) Proposing the layered behavioral architecture, adopting the subsumption paradigm for modular integration.

      (3) Providing the larvaworld software as a ready-to-use, extensible modeling platform.

      (4) Implementing an empirically calibrated locomotory model and demonstrating its integration with navigation and learning modules in replicated behavioral paradigms.

      We agree with the reviewer that the next challenge is to integrate the empirically based behavioral simulations presented here with functional brain models capable of reproducing or predicting experimental findings at the level of cellular neurophysiology, including the effects of cell-type-specific manipulations such as gene knock-down or optogenetic activation/inhibition. However, based on our experience with systems-level modeling, we deliberately invested in behavioral simulation because functional models of the nervous system—including our own—often lack translation into simulated agent behavior. In many cases, model output is limited to one or more variables that can at best be interpreted as a behavioral bias, and most often represents an “average animal” that fails to capture inter-individual differences. By linking our spiking mushroom body model to behavioral simulations in a group of individual agents during memory retention tests (Figure 6C,D), we were able to achieve a first successful direct comparison between simulated and experimental behavior metrics—in this case, the behavioral preference index reported in Jürgensen et al. (iScience, 2024, DOI:

      https://doi.org/10.1016/j.isci.2023.108640).

      Finally, we reiterate that the layered behavioral architecture is designed to promote a modular modeling paradigm. Our adoption of a subsumption architecture does not conflict with the concept of behavioral primitives; on the contrary, the notion that such primitives follow (semi-)autonomous motor programs and can be combined into more complex behaviors was the starting point for our implementation of the architecture in the fly larva. In our view, a genuinely contradictory paradigm for neural control of behavior would require a non-modular, strictly non-hierarchical organization of the nervous system and, by extension, of behavioral control.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      See public review for main points. To summarize, I find the conceptual framework of the paper very valuable and an important advance. However, in this age of data, I would have expected that the authors would make an effort to build more realistic models that could relate directly to neural data (including connectome and activity) and muscular dynamics at the segmental level.

      This point is addressed in detail in our public review response. In brief, we agree that a segmental neuromechanical model informed by connectome data would provide richer mechanistic insight. However, such an approach would greatly increase complexity and reduce accessibility. Our aim here is to present a coarse-grained, kinematic-level framework that is modular, extensible, and designed to accommodate models at different levels of abstraction. Importantly, extensions that incorporate realistic neuromechanics or connectome-derived circuits can be readily integrated, provided they conform to the modular principles of the proposed behavioral architecture.

      The authors do not cite figures in order or appearance, which makes it hard to read.

      This has been corrected. Figures are now cited in the correct order throughout the revised manuscript.

      I would explain the model in more detail in the main text. Currently, the model is introduced through Figure 1 in an abstract way. It is really hard to make the connection between this figure to the nuts-and-bolts of neuromechanics. And, I believe, for this paper, the details of the modeling matter and are not just technical points to be hidden in the appendix. The video (video 1) is not helpful.

      We have restructured the Model section to provide more detail directly in the main text, moving explanations that were previously confined to the Appendix. This includes explicit description of the locomotory oscillator model, the intermittency module, and their empirical calibration. At the same time, we retained mathematical and implementation details in Materials & Methods to keep the reading flow accessible. Additionally, we expanded the caption of Video 1 and clarified in the text what it illustrates, making the video more informative.

      Modeling choices lead to further weaknesses. While the model can replicate observed locomotory patterns, it does not fully explain the underlying neurobiological mechanisms that govern behavioral intermittency. For example, the crawl-bend interference mechanism, while capturing observed phase-dependent attenuation of turning, is implemented in a simplified, statistical manner rather than being derived from detailed neuromuscular dynamics. The intermittent locomotion model, which generates alternating runs and pauses, relies on log-normal distributed stridechains but does not explicitly model neural mechanisms responsible for switching between movement states.

      We agree with this point. A fully mechanistic implementation of crawl-bend interference would require a detailed segmental neuromechanical model, which we deliberately refrained from integrating in order to keep the current study tractable and focused on a coarse-grained, kinematic-level description. Likewise, the intermittency module is currently based on data-fitted distributions of stridechains and pause durations, without explicit modeling of the neural mechanisms responsible for switching between these states. To our knowledge, these mechanisms remain unresolved, though alternative approaches have been suggested, for example, an artificial neural network model of intermittency (Sakagiannis et al., 2020). To ensure this limitation is transparent to the reader, we now explicitly state it in a newly added “Limitations of the study” subsection in the Discussion.

      We also highlight that the behavioral architecture is designed to be extensible, so that future work may incorporate such mechanistic models when available, while preserving the modular framework.

      I am curious about why the authors chose to model the mushroom body with much more realism than other modules.

      We clarified that this choice was not due to a bias in modeling depth, but to demonstrate the modularity and flexibility of the architecture. The mushroom body (MB) model we integrated was developed in our previous work as a biologically realistic spiking neural network. By incorporating it into the current framework, we show that models of very different abstraction levels – from simple statistical oscillators to detailed spiking networks – can coexist and interact under the same architecture. This rationale is now explicitly stated in the Discussion.

      Reviewer #2 (Recommendations for the authors):

      The manuscript from Sakagiannis et al. proposes a novel model for locomotion and foraging in Drosophila. Their ambition is to make a unified model that will incorporate distinct layers of complexity to describe and predict the locomotor behaviour of a larva, during exploration, chemotaxis and even learning. The paper fails in doing so, starting with a rather interesting exploratory model and becoming less and less convincing as it progresses, with thinner (chemotaxis) and thinner (learning) experimental and theoretical support. The model for chemotaxis is extremely simplified compared to the work of other laboratories. The associative learning paradigm is taken from another paper from the same research group and is not sufficiently explained. In its current form, the paper is of very limited theoretical and practical value. The analysis is insufficient to judge the overall quality and scalability of the model. It is hard to know if the model could be adopted by others in the larval community more widely in other animals. Would it be flexible and robust enough to be used to model other behavioural conditions?

      We appreciate this critical perspective. Our aim is not to present a final, fully parameterized model of all larval behaviors, but to introduce a flexible, modular behavioral architecture that integrates models at different levels of abstraction and can be expanded by the community. To support adoption, we have revised the manuscript to highlight the availability of the framework as a Python package (larvaworld), supplemented with documentation, tutorials, and code examples. This makes it easier for other researchers to reuse, extend, and test the architecture under additional behavioral conditions. We also explicitly refer to modeling studies that have adopted the proposed framework and the locomotory model itself.

      Below, we address the reviewer’s points layer by layer.

      (1) Exploratory behaviour. The strongest part of the paper. The authors propose a new method to analyse locomotion. They take into consideration the instantaneous linear and angular velocity. They assume the existence of two oscillators, which is really interesting. They incorporate the distribution of pauses duration and number of the strides. The incorporation of the strides is very exciting. They do not include handedness with has already been studied and incorporated in a mode for exploration they seem to have missed (Wosniack et al 2022). Figure 4 shows the dispersion. At first glance, it is very obvious that the model larvae do not behave like the animal. The distance they move from the centre is wider (Figure 4A). What is measured in dispersion (Figure 4B)? Just the distance travelled during 40s? A better measure of the similarities or differences between the model and real larvae would be interesting, such as analysing the Mean Square Displacement. Would the model be good if compared to the long-term exploratory behaviour from Sims et al. 2020, that the author previously used?

      The authors should convince the readers that their model is better, or at least as good than the ones already available.

      We thank the reviewer for these constructive suggestions. In the revised manuscript we now reference and discuss handedness, citing Wosniack et al. (2022, eLife), and highlight its potential role as an additional axis of individual variability. We also clarified the distance metrics used in Figure 4: dispersal denotes the Euclidean distance from the origin at the end of the trajectory, while pathlength denotes the cumulative distance travelled. Since larvae typically encounter the arena boundary within the first 40 seconds of exploration, dispersal is shown only over this interval.

      With respect to the reviewer’s suggestion of using mean-squared displacement (MSD), we now explicitly describe the relation between dispersal and MSD. Dispersal is an individual-level displacement measure from which population-level metrics such as MSD can be directly derived.

      Regarding long-term exploration, we agree that extended trajectories—as reported by Sims et al. (2020) over timescales of up to one hour—constitute a valuable complementary regime. Our experimental dataset is limited to 3-minute recordings in a bounded Petri dish, which constrains the accessible timescales of dispersal analysis. We now explicitly note in the Results that comparison to long-horizon datasets such as Sims et al. (2020) represents an important future direction that will require larger or unbounded arenas.

      Together, these revisions strengthen the presentation of the exploration results and clarify how our model relates to established statistical measures of larval foraging behaviour.

      (2) Chemotaxis. The chemotaxis model is so briefly explained in the result section that it is hard to understand. A modulation of the frequency and amplitude of lateral oscillator as a function of the concentration? The authors cannot differentiate between weathervaning and turning in this model (at least I can't understand how). What happened with the distribution of pauses and the directions of turns in Figure 5? The authors do not use real behavioural data to contract their model. How do we know that the parameters they have used reflect the larval behaviour? For example: what is the success rate for larvae to reach the area of high concentration? How close do they get? What is the length of the tracks from start to a target area of high concentration? Where are the calibration data for chemotaxis? This information is critical to understand the model, it needs to be shown in the result section. The authors mention an 8.9uM peak concentration. Of what?

      The model is oversimplified in comparison with Davies et al. 2015 and it is not clear at all how it reflects the real chemotaxis, which is a rather complex behaviour.

      We thank the reviewer for these detailed comments. In the revised manuscript we substantially expanded the description of the chemotaxis model. We now provide an explicit mathematical formulation of how odor concentration modulates the lateral oscillator through the quantity A<sub>0</sub>, which perturbs both the frequency and amplitude of bending according to the mechanism proposed by Wystrach et al. (2016). We additionally clarify that the motor layer - including the intermittency module and all parameters governing crawling, pausing, and turning - remains fully identical to the configuration calibrated on the exploration dataset; no refitting was performed for the chemotaxis condition.

      To address the reviewer’s question regarding the distinction between weathervaning and head casting, we now explain that both behaviours emerge naturally from the same coupled-oscillator structure via stride-phase–dependent crawl–bend interference. High-amplitude headcasts occur during pauses when crawl-induced attenuation is lifted, whereas low-amplitude weathervaning arises during runs when the interference is active.

      This unified mechanism eliminates the need for separate modules.

      The chemotaxis experiments were implemented to qualitatively replicate the behavioural patterns described in Gómez-Marín et al. (2011, Fig. 1A–1F), and we now include explicit figure references in the captions. Because the present implementation is a proof of concept rather than a quantitatively calibrated chemotaxis model, we do not report success rates, approach distances, or track-length statistics, as these depend strongly on odorscape geometry and calibration against quantitative single-animal datasets that were not available for the current work. This clarification has been added to the text and is stated explicitly again in the Limitations section.

      Finally, we now specify that the reported odor concentrations (e.g. 8.9,µM) follow the values used in Gómez-Marín et al. (2011), and we added the precise Gaussian function used to generate the odorscape in the Materials & Methods. Together, these revisions provide a clear and transparent account of the chemotaxis model and its scope.

      (3) Associative learning paradigm. I assume that the authors intended to incorporate a bias in chemotaxis behaviour towards a particular odorant (CS) that would have been associated with a reward food (US). However the model works slightly differently, it is represented by an aversive and an appetitive gradient.

      Theoretically, this is already an assumption (unless there is evidence for it, that should be referenced). It would be more conservative to have one neutral side and one appetitive (attractive) side. Second, the use of a mushroom body model, (even though it has already been published) to decide on the valence adds a layer of complexity that seems unnecessary. The learning process is different from the output process. Finally, the model intends to show us a "realist simulation of Drosophila locomotion" and we do not know how the larvae reach the right side during the test. It would be useful to have some comparison of the larval and model behaviour towards the preferred side.

      In this last section, the objective of the research unweaves and falls short of its ambition.

      We thank the reviewer for these helpful comments. In the revised manuscript we clarified that our implementation follows the standard larval conditioning protocol in which a rewarded odor (CS+) is tested against a neutral odor, not against an aversive one. The previously contradictory phrasing has been corrected, and the text now consistently reflects the established experimental procedure.

      We further explain that the mushroom body (MB) model is included not in order to increase biological complexity in this section, but to demonstrate the flexibility of the proposed behavioral architecture: detailed circuit models and more abstract motor modules can coexist under the same framework. The MB model implements associative plasticity independently of any behavioral simulation, and its output - a scalar odor valence - is transformed linearly into an odor-gain parameter that modulates turning during the test phase. This separation between learning and behavioral output mirrors the logic of the biological system while keeping the overall architecture modular.

      Regarding the reviewer’s request for insight into “how larvae reach the right side,” we note that standard group assays used in larval olfactory learning provide only population-level preference indices rather than detailed individual trajectories. Our comparison to empirical data therefore relies on these established preference indices, which the model successfully reproduces across training trials, including the characteristic saturation reported in Jürgensen et al. (2024). We now state explicitly that although the behavioral simulation does generate full trajectories for each virtual larva, the lack of corresponding experimental single-animal tracks precludes a direct trajectory-level comparison. This clarification has been added to the revised text.

      Together, we believe that these revisions improve clarity and better situate the learning simulations within both the behavioral architecture framework and the constraints of available experimental data.

      Reviewer #3 (Recommendations for the authors):

      Figure 1a is very dense and I am struggling with the terms "reactive" and "basic" due to a general lack of clarity about the details of the model organization. For example, why do all of the sensory inputs point to turning proprioception? Why is proprioception two different things for turning and crawling? Why are some senses in light green while olfaction is in dark green? Why is feedback only from feeding, when crawling, head casting, and turning will change the sensory environment as well? Why is head casting not a behavioral module here? Why focus on following/being constrained by the "subsumption architecture paradigm" over a focus on the known literature and neuroanatomy?

      We thank the reviewer for this careful inspection of Figure 1. In the revised version we improved both the figure and its caption, as well as the corresponding description in the text.

      Specifically:

      - The “basic” layer has been renamed the “motor” layer for clarity, and the caption has been expanded to better describe each component.

      - The sensory inputs are now shown to target the motor layer as a whole, rather than just the proprioceptive component of turning.

      - Each motor module is conceptualized as a sensorimotor loop (green-red), which explains why proprioception appears in both crawling and turning.

      - The color coding has also been clarified: modules used in the current simulations are shown in darker shades, while others are faded.

      - Sensory perturbations caused by body locomotion – as in the case of crawling and turning – are not depicted in the figure as feedback between modules. We make this more explicit in the caption. The signal from feeding to the above layers is neuromodulatory – as indicated by the purple arrowhead.

      Finally, we explain that head casting and weathervaning are not modeled as separate modules, since both behaviors emerge from the coupled oscillator mechanism through crawl-bend interference. Our adherence to the subsumption architecture paradigm is motivated by its success in robotics and its conceptual alignment with hierarchical sensorimotor loops, but we have now made clearer that this is a simplifying framework rather than a rigid constraint.

      "Stimulus free conditions" (line 102) don't really exist. Substrate and temperature will always be present, light will have some intensity, etc. Does this really refer to fictive behaviors?

      We thank the reviewer for raising this point. In the revised manuscript we have removed the term “stimulus-free conditions” entirely to avoid the misleading implication that larvae experience no sensory input. We now explicitly describe these experiments as free exploration in the absence of navigation-guiding gradients, which accurately reflects the laboratory assay while avoiding any suggestion of fictive behavior. This terminology has been updated consistently throughout the text.

      The first results section is closer to an introduction than the intro itself is, owing to its focus on the context of the work the paper actually does rather than a broad review of larval behaviors that are not considered within this work.

      We believe the reviewer is referring to the “Model” section rather than the “Results.” The Model section is deliberately separated to outline the theoretical background of the behavioral architecture and to make explicit the general modeling assumptions, which explains why it cites previous work in detail. By contrast, the Introduction is intended as a brief overview of the broader larval behavioral repertoire, since the larva serves here as the case study for our framework. Presenting this repertoire is important because it defines the behaviors that populate the different layers of the architecture, even if only a subset of them is implemented in the simulations presented in this study.

      While the model components are described in the modeling section, no question is actually discussed. What is the goal of this model?

      This broader question is addressed in the public review section

      "Crawler" and "turner" are inconsistently described. They are described as "modules" in Figure 1, but they seem more like behavioral primitives.

      The specific terms "crawler" and "turner" refer to the computational modules, but correctly the reviewer points out that these generate the respective “crawling” and “turning” behavioral primitives. This has been made explicit in the Materials & Methods.

      Do larva-larva interactions matter here?

      In the revised manuscript we now state explicitly that larva–larva interactions are not included in the present simulations, as each virtual larva is modeled independently in accordance with the single-animal datasets used for calibration. We also point the reader to the Limitations section, where we note that although social interactions lie outside the scope of this study, the Larvaworld software package already supports tactile sensing and collision handling, enabling such interactions to be incorporated in future work.

      The description of the locomotor system, with coupled oscillators between crawling frequency and bending is very empirical. Is this because of the 2-segment model effectively limiting peristalsis to a single segment? What are the limits of this approach?

      The stride-phase–dependent modulation of bending amplitude was identified through kinematic analysis of full 12-segment larval datasets and is therefore independent of our later decision to implement a two-segment simplification. This means that the empirical relationship we describe should hold for any multisegment model, regardless of the reduced representation used in the present implementation. Generally, we performed our detailed empirical analyses with the goal to uncover statistical relations, which in turn were use for our data-driven coupled oscillator model in combination with the stochastic element of stride-chain and pause duration.

      Line 190: The paper starts discussing experimental larva tracks. These experiments need to be described.

      The reviewer probably refers to the dataset analysed in this study. This is a public dataset as described in the Dataset Description section in Materials & Methods, along with a description of the experiment per se.

      The purpose of Figure 2 is not entirely clear. Several panels are not referenced in the text (F,G,H) and all panels are referenced extremely out of order. Figure 3 is similarly hard to follow for the same reasons of being referenced out of order. In fact, this section is largely duplicated by the "Model calibration" appendix, which I find to be much more clearly written and with more directly relevant figure panels.

      In the revised manuscript, all panels of Figures 2 and 3 are now cited in the correct order, and their roles in the narrative have been clarified. Figure 2 is explicitly presented as a summary of the empirical kinematic analyses that motivate the structure of the locomotory model, while Figure 3 illustrates the corresponding model components. To avoid redundancy with the “Model calibration” appendix, we streamlined the main text and replaced duplicated descriptions with cross-references to the appendix, which contains the full methodological detail.

      The data describe larvae behaving with a range of parameters, presumably both as individuals and across time. However, the models described seem to employ a population of larvae that shares a common best-fit parameter and the equations presented in the methods are all ordinary differential equations without noise or stochasticity. Where is the inter-individual variation coming from?

      The reviewer is correct to point out the importance of variability. Our approach is agent-based, and we model populations of non-identical individuals rather than replicates of a single average larva. The simulated larvae retain variability across several parameters, capturing the combined range observed in the data. This was described in the original manuscript, and to avoid possible misunderstandings, we have now expanded the “Inter-individual variability” section in the Materials & Methods and, where appropriate, clarified this point elsewhere in the text.

      The absolute orientation of trajectories in Figure 4A is not meaningful in your model. I suspect it would be more informative to show aligned trajectories in order to better visually assess the behavioral similarity. Also, the biological experiment needs to be described here. Time crawling seems to not be a great fit, although the peaks are fairly well aligned. Do you have thoughts on why this is?

      In Figure 4A, which is intended as a visual comparison between experimental and simulated trajectories, the experimental tracks were transposed so that all starting points coincide at the center of the arena. As the reviewer notes, they were not rotated to a common axis, since our subsequent analysis focuses on spatial dispersal rather than directional alignment. The description of the experimental dataset has been clarified in the revised text.

      The reviewer is also correct that the distribution of time spent crawling is narrower in the simulations than in the experimental data. This reflects the fact that in the present study only three crawling-related parameters were sampled to generate inter-individual variability, and time spent crawling was not among them. We deliberately chose to assess how well the model reproduces distributions for behavioral metrics that were not explicitly fitted or parameterized. This point has now been made explicit in the revised manuscript.

      How did you assess the agreement of chemotaxis results with Gomez-Martin et al? It would be useful for the comparison to be made explicit within this paper, as well. How were the chemotaxis parameters fit?

      The agreement between experimental and simulated chemotaxis was assessed only qualitatively, as we did not perform quantitative locomotor analyses on chemotaxis datasets. For these simulations we used the same motor layer, including all its modules, as calibrated in the free-exploration condition (Fig. 4). The only additional adjustment was a single weighting parameter that translates the appetitive or aversive valence of odor sources into modulatory input for the bending module. This parameter was tuned manually using a visual criterion of performance, to ensure that both attractive and aversive chemotaxis were observable. We now make explicit in the text that for more complex simulations we retain the calibration obtained in simpler conditions and build upon it, rather than re-optimizing the model. Moreover, we now provide reference to the exact figure numbers in Gomez-Martin et al. for direct visual comparison also of the perceived concentration metrics in our Figure 5E&F where experimental and simulated data show a very good correspondence.

      Similarly, what are the key parameters for the mushroom body model and how did you fit their relationship to behavior? Was there actually feedback between the behavior of the larva and the training or was the SNN only used to generate the odor gain constant?

      The reviewer is correct to highlight this point. In the present study the mushroom body model was simulated independently to generate the odor-specific behavioral bias. This output was then translated into an odor gain constant, which served as input for the subsequent behavioral simulations of odor preference. There was no closed-loop interaction between the larval behavior and the training of the spiking network in this version. Establishing such a closed-loop connection is part of our future goals.

      It is unclear where feeding (as introduced in Figure 1) entered into the work presented, if at all.

      The reviewer is correct that the feeding module does not play a role in the present study. It was included in the behavioral architecture for completeness and because it is already implemented in the larvaworld package (see Sakagiannis et al., 2024). We have clarified this in the revised text.

      "During pauses, the input to the crawler module I_c = 0 and therefore forward..." The equations presented for the crawler module do not contain I_c.

      The inconsistency regarding the crawler module input has also been corrected. The equations now explicitly include the tonic input parameter, making them consistent with the descriptive text and our model implementation.

      Larva do more than crawl forward, they can also hunch up, head cast with their head in the air, dig, crawl backward, roll, and other behaviors. Because the individual modules in this framework have been defined as coupled oscillators, how would you decide to implement such aspects? At what point does the oscillator approach break down? In this model, how does the larva decide whether to bend left or right, and how is that affected by the environment or internal state? Can a larva bend in the same direction twice in a row?

      The intermittent coupled-oscillator model presented here does not attempt to cover the full larval repertoire, such as hunching, digging, backward crawling, or rolling. Nor does it explicitly implement handedness as a directional bias. Nevertheless, the framework already allows for sequences of repeated turns: from a stationary position a larva can execute successive bends of varying amplitude, which may occur in the same direction, mimicking repeated head casts to one side.

      Extending the model to include additional locomotor primitives would require the development of new modules, which could expand the basic locomotor layer either alongside or in place of the current lateral oscillator module. As noted in the manuscript, the modules implemented here are not intended as definitive but as placeholders that demonstrate how the architecture can integrate more elaborate models in the future. In this context, future directions include introducing handedness as part of inter-individual variability and enriching the behavioral repertoire with additional modules to capture the broader range of larval actions.

      I was not able to install `larvaworld` either through pip in a fresh environment on OS X 15 and various Python versions between 3.8 and 3.12. I ran into a range of issues, from `tables` (which is understandable) to issues installing the old NumPy in Python 3.12 where `setuptools` is no longer included. The packaging should be made more robust, or the working environment could be better defined. For example, the version pinning of dependencies seems much more strict than I would expect for a user-focused Python library, particularly with out-of-date versions of core tools like NumPy.

      We thank the reviewer for going to length and testing the implementation and pointing these issues to us. We have recently updated the package (version 2.0.1, November 2025) to improve installation robustness, relaxed unnecessary dependency pinning, and provided an environment specification to facilitate reproducibility. The revised manuscript directs users to recently updated installation instructions.

      Automated testing for python versions 3.10-3.11 for MacOS, Windows and Ubuntu is already implemented. Unfortunately we have not yet tried it on OS X15. Please post any issues on the larvaworld’s github page : https://github.com/nawrotlab/larvaworld.

    1. eLife Assessment

      This important study combines behavioural psychophysics with image-computable modelling to test whether face recognition relies on view-selective or view-tolerant mechanisms. Although the diagnostic orientation content of faces varies with viewpoint (more horizontal for frontal views, more vertical for profiles), human recognition remains predominantly tuned to horizontal information, consistent with the predictions of a view-tolerant model. The evidence for view-tolerant tuning to horizontal orientations is compelling, although questions remain about the plausibility of the computations implemented in the view-tolerant model and how they map onto mechanisms of everyday face recognition.

    2. Reviewer #1 (Public review):

      Summary:

      The authors describe the results of a single study designed to investigate the extent to which horizontal orientation energy plays a key role in supporting view-invariant face recognition. The authors collected behavioral data from adult observers who were asked to complete an old/new face matching task by learning broad-spectrum faces (not orientation filtered) during a familiarization phase and subsequently trying to label filtered faces as previously seen or novel at test. This data revealed a clear bias favoring the use of horizontal orientation energy across viewpoint changes in the target images. The authors then compared different ideal observer models (cross-correlations between target and probe stimuli) to examine how this profile might be reflected in the image-level appearance of their filtered images. This revealed that a model looking for the best matching face within a viewpoint differed substantially from human data, exhibiting a vertical orientation bias for extreme profiles. However, a model forced to match targets to probes at different viewing angles exhibited a consistent horizontal bias in much the same manner as human observers.

      Strengths:

      I think the question is an important one: The horizontal orientation bias is a great example of a low-level image property being linked to high-level recognition outcomes and understanding the nature of that connection is important. I found the old/new task to be a straightforward task that was implemented ably and that has the benefit of being simple for participants to carry out and simple to analyze. I particularly appreciated that the authors chose to describe human data via a lower-dimensional model (their Gaussian fits to individual data) for further analysis. This was a nice way to express the nature of the tuning function favoring horizontal orientation bias in a way that makes key parameters explicit. Broadly speaking, I also thought that the model comparison they include between the view-selective and view-tolerant models was a great next step. This analysis has the potential to reveal some good insights into how this bias emerges and ask fine-grained questions about the parameters in their model fits to the behavioral data.

      Weaknesses:

      I'll start with what I think is the biggest difficulty I had with the paper. Much as I liked the model comparison analysis, I also don't quite know what to make of the view-tolerant model. As I understand the authors' description, the key feature of this model is that it does not get to compare target and probe at the same yaw angle, but must instead pick a best match from candidates that are at different yaws. While it is interesting to see that this leads to a very different orientation profile, it also isn't obvious to me why such a comparison would be reflective of what the visual system is probably doing. I can see that the view-specific model is more or less assuming something like an exemplar representation of each face: You have the opportunity to compare a new image to a whole library of viewpoints and presumably it isn't hard to start with some kind of first pass that identifies the best matching view first before trying to identify/match the individual in question. What I don't get about the view-tolerant model is that it seems almost like an anti-exemplar model: You specifically lack the best viewpoint in the library but have to make do with the other options. I sort of understand the reasoning that this enforces tolerance of viewpoint variability, but I'm not clear on whether or not this is a version of face familiarity and recognition that the authors think has an analog in human visual processing.

      I do think that this model is interesting in terms of the differential tuning it exhibits, but don't find it easy to align with any theoretical perspective on face recognition. Specifically, do the authors think there is a stage of face processing in which tolerance as they've operationalized it in the model is extant? What I'm looking for is a concrete description of the circumstances that the authors are saying lead to this kind of model potentially being a meaningful analog of face recognition. For example, is the idea that one may become familiar with a face in some very limited set of viewpoints and then be presented with that face in other views?

      Alternatively, if the authors prefer to say that they simply thought this was a nice exercise in terms of identifying a different model and that it may not be a meaningful proxy for face recognition. I think that's fine, to be clear! I just still don't see anything in the text that convinces me of the ecological validity of this version of view-tolerance.

    3. Reviewer #2 (Public review):

      This study investigates the visual information that is used for the recognition of faces. This is an important question in vision research and is critical for social interactions more generally. The authors ask whether our ability to recognise faces, across different viewpoints, varies as a function of the orientation information available in the image. Consistent with previous findings from this group and others, they find that horizontally filtered faces were recognised better than vertically filtered faces. Next, they probe the mechanism underlying this pattern of data by designing two model observers. The first was optimised for faces at a specific viewpoint (view-selective). The second was generalised across viewpoints (view-tolerant). In contrast to the human data, the view-specific model shows that the information that is useful for identity judgements varies according to viewpoint. For example, frontal face identities are again optimally discriminated with horizontal orientation information, but profiles are optimally discriminated with more vertical orientation information. These findings show human face recognition is biased toward horizontal orientation information, even though this may be suboptimal for the recognition of profile views of the face.

      One issue in the design of this study was the lowering of the signal-to-noise ratio in the view-selective observer. This decision was taken to avoid ceiling effects. However, it is not clear how this affects the similarity with the human observers.

      Another issue is the decision to normalise image energy across orientations and viewpoints. I can see the logic in wanting to control for these effects, but this does reflect natural variation in image properties. So, again, I wonder what the results would look like without this step.

      Despite the bias toward horizontal orientations in human observers, there were some differences in the orientation preference at each viewpoint. For example, frontal faces were biased to horizontal (90 deg) but other viewpoints had biases that were slightly off horizontal (e.g. right profile: 80 deg, left profile: 100 deg). This does seem to show that differences in statistical information at different viewpoints (more horizontal information for frontal and more vertical information for profile) do influence human perception. It would be good to reflect on this nuance in the data.

      Comments on revisions:

      I am happy with the response and changes to the comments in my review. The key findings from this study are: (1) that there is bias toward the use of horizontal information across all viewpoints for face recognition in humans using an old-new recognition task. (2) In contrast, the optimal information for matching faces varies as a function of viewpoint. The view-selective model shows horizontal information is dominant for frontal views and vertical information is dominant for profile views.

      The data from the view-tolerant model is less easy to interpret as it doesn't fit with any theoretically plausible model of face recognition. It might be a useful model for a face matching task in which participants had to match unfamiliar faces across viewpoints. This might be a possible extension of the current work.

      Nonetheless, I still think this is an interesting contribution to the literature.

    4. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      The authors describe the results of a single study designed to investigate the extent to which horizontal orientation energy plays a key role in supporting view-invariant face recognition. The authors collected behavioral data from adult observers who were asked to complete an old/new face matching task by learning broad-spectrum faces (not orientation filtered) during a familiarization phase and subsequently trying to label filtered faces as previously seen or novel at test. This data revealed a clear bias favoring the use of horizontal orientation energy across viewpoint changes in the target images. The authors then compared different ideal observer models (cross-correlations between target and probe stimuli) to examine how this profile might be reflected in the image-level appearance of their filtered images. This revealed that a model looking for the best matching face within a viewpoint differed substantially from human data, exhibiting a vertical orientation bias for extreme profiles. However, a model forced to match targets to probes at different viewing angles exhibited a consistent horizontal bias in much the same manner as human observers.

      Strengths:

      I think the question is an important one: The horizontal orientation bias is a great example of a low-level image property being linked to high-level recognition outcomes, and understanding the nature of that connection is important. I found the old/new task to be a straightforward task that was implemented ably and that has the benefit of being simple for participants to carry out and simple to analyze. I particularly appreciated that the authors chose to describe human data via a lower-dimensional model (their Gaussian fits to individual data) for further analysis. This was a nice way to express the nature of the tuning function, favoring horizontal orientation bias in a way that makes key parameters explicit. Broadly speaking, I also thought that the model comparison they include between the view-selective and view-tolerant models was a great next step. This analysis has the potential to reveal some good insights into how this bias emerges and ask fine-grained questions about the parameters in their model fits to the behavioral data.

      Weaknesses:

      I will start with what I think is the biggest difficulty I had with the paper. Much as I liked the model comparison analysis, I also don't quite know what to make of the view-tolerant model. As I understand the authors' description, the key feature of this model is that it does not get to compare the target and probe at the same yaw angle, but must instead pick a best match from candidates that are at different yaws. While it is interesting to see that this leads to a very different orientation profile, it also isn't obvious to me why such a comparison would be reflective of what the visual system is probably doing. I can see that the view-specific model is more or less assuming something like an exemplar representation of each face: You have the opportunity to compare a new image to a whole library of viewpoints, and presumably it isn't hard to start with some kind of first pass that identifies the best matching view first before trying to identify/match the individual in question. What I don't get about the view-tolerant model is that it seems almost like an anti-exemplar model: You specifically lack the best viewpoint in the library but have to make do with the other options. Again, this is sort of interesting and the very different behavior of the model is neat to discuss, but it doesn't seem easy to align with any theoretical perspective on face recognition. My thinking here is that it might be useful to consider an additional alternate model that doesn't specifically exclude the best-matching viewpoint, but perhaps condenses appearance across views into something like a prototype. I could even see an argument for something like the yaw-averages presented earlier in the manuscript as the basis for such a model, but this might be too much of a stretch. Overall, what I'd like to see is some kind of alternate model that incorporates the existence of the best-match viewpoint somehow, but without the explicit exemplar structure of the view-specific model.

      The design of the view-tolerant model aligned with the requirements of tolerant recognition and revealed the stimulus information enabling to abstract identity away from variations in face appearance. However, it did not involve the notion that such ability may depend on a prototype or summary representation of face identity built up through varied encounters (Burton, Jenkins and Schweinberger 2011, Jenkins, White et al. 2011, Mike Burton 2013, Burton, Kramer et al. 2016, Menon, Kemp and White 2018).

      We agree with the Reviewer that the average of the different views of a face is a good proxy of its central tendency (i.e., stable identity properties; Figure 1). We thus followed their suggestion and included an additional model observer that compared specific views to full-spectrum view-averaged identities. The examination of the orientation tuning profile of this so-called view-average model observer confirmed the crucial contribution of horizontal identity cues to view-invariant recognition as the horizontal range best predicted the average summary of full-spectrum face appearances across views. This additional model observer is now presented in the Discussion and Supplementary files 2 and 3.

      Besides this larger issue, I would also like to see some more details about the nature of the cross-correlation that is the basis for this model comparison. I mostly think I get what is happening, but I think the authors could expand more on the nature of their noise model to make more explicit what is happening before these cross-correlations are taken. I infer that there is a noise-addition step to get them off the ceiling, but I felt that I had to read between the lines a bit to determine this.

      In the Methods section, we now provide detailed information about the addition of noise to model observer cross-correlations: ‘In a pilot phase, we measured the overall identification performance of each model. Initially, the view-selective model performed at ceiling, yielding a correlation of 1 since there was an exact target-probe match across all trials. To avoid ceiling effects and to keep model performance close to human levels (Supplementary File 2), we thus decreased the signal-to-noise ratio (SNR) of the target and probe images to .125 by combining each with distinct noise patterns (face RMS contrast: .01; noise RMS contrast: .08). Each trial (i.e. target-probe pairing) was iterated ten times with different random noise patterns.’

      We also added a supplemental with the graphic illustration of the d’ distributions of each model and human observers: ‘Sensitivity d’ of the view-tolerant model was much lower than view-selective model and human sensitivity (Supplementary File 2), even without noise. The view-tolerant model therefore processed fully visible stimuli (SNR of 1). This decreased sensitivity in the view-tolerant compared to the view-selective model is expected, as none of the probes exactly matched the target at the pixel level due to viewpoint differences. In contrast to humans who rely on internally stored representations to match identity across views, the model observer lacks such internal representations and entirely relies on (less efficient) pixelwise comparisons.’

      Another thing that I think is worth considering and commenting on is the stimuli themselves and the extent to which this may limit the outcomes of their behavioral task. The use of the 3D laser-scanned faces has some obvious advantages, but also (I think) removes the possibility for pigmentation to contribute to recognition, removes the contribution of varying illumination and expression to appearance variability, and perhaps presents observers with more homogeneous faces than one typically has to worry about. I don't think these negate the current results, but I'd like the authors to expand on their discussion of these factors, particularly pigmentation. Naively, surface color and texture seem like they could offer diagnostic cues to identity that don't rely so critically on horizontal orientations, so removing these may mean that horizontal bias is particularly evident when face shape is the critical cue for recognition.

      Our stimuli were originally designed by Troje and Bulthoff (1996). These are 3D laser scans of white individuals aged between 20 and 40 years, posing with a neutral expression. Different views of the faces were shot under a fixed illumination. Ears and a small portion of the neck were visible while the hair region was removed. All face images had a normalized skin color and we further converted them to grayscales

      While we agree that this stimulus set offers a restricted range of within- and between-identity variations compared to what is experienced in natural settings, we believe that the present findings generalize to more ecological viewing conditions. Indeed, past evidence showed that the recognition of face pictures shot under largely variable pose, age, expression, illumination, hair style is tuned to the horizontal range of the face stimulus (Dakin and Watt 2009, Dumont, Roux-Sibilon and Goffaux 2024). In other words, our finding that view-tolerant identity recognition is mainly driven by horizontal face information would likely replicate with the use of a more ecological stimulus set.

      Moreover, the skin color normalization and grayscale conversion, while limiting the range of face variability, did not eliminate the contribution of surface pigmentation in our study. It is thus unlikely that our findings exclusively reflect the orientation dependence of face shape processing. Pigmentation refers to all surface reflectance properties (Russell, Sinha et al. 2006) and hue (color) is only one among others. The grayscaled 3D laser scanned faces used here contained natural variations in crucial surface cues such as skin albedo (i.e., how light or dark the surface appears) and texture (i.e., spatial variation in how light is reflected); they have actually been used to disentangle the role of shape and surface cues to identity recognition (e.g., Troje and Bulthoff 1996, Vuong, Peissig et al. 2005, Russell, Sinha et al. 2006, Russell, Biederman et al. 2007, Jiang, Dricot et al. 2009). Moreover, a past study of ours demonstrated that the diagnosticity of the horizontal range of face information is not restricted to face shape cues; the specialized processing of face shape and surface both selectively rely on horizontal information (Dumont, Roux-Sibilon and Goffaux 2024).

      For these reasons, the present findings are unlikely to be fully determined by shape processing, and we expect them to generalize to more ecological stimulus sets. We discuss these aspects in the revised manuscript.

      Reviewer #2 (Public review):

      This study investigates the visual information that is used for the recognition of faces. This is an important question in vision research and is critical for social interactions more generally. The authors ask whether our ability to recognise faces, across different viewpoints, varies as a function of the orientation information available in the image. Consistent with previous findings from this group and others, they find that horizontally filtered faces were recognised better than vertically filtered faces. Next, they probe the mechanism underlying this pattern of data by designing two model observers. The first was optimised for faces at a specific viewpoint (view-selective). The second was generalised across viewpoints (view-tolerant). In contrast to the human data, the view-specific model shows that the information that is useful for identity judgements varies according to viewpoint. For example, frontal face identities are again optimally discriminated with horizontal orientation information, but profiles are optimally discriminated with more vertical orientation information. These findings show human face recognition is biased toward horizontal orientation information, even though this may be suboptimal for the recognition of profile views of the face.

      One issue in the design of this study was the lowering of the signal-to-noise ratio in the view-selective observer. This decision was taken to avoid ceiling effects. However, it is not clear how this affects the similarity with the human observers.

      In the Methods section, we now provide detailed information about the addition of noise to model observer cross-correlations: ‘In a pilot phase, we measured the overall identification performance of each model. Initially, the view-selective model performed at ceiling, yielding a correlation of 1 since there was an exact target-probe match across all trials. To avoid ceiling effects and to keep model performance close to human levels (Supplementary File 2), we thus decreased the signal-to-noise ratio (SNR) of the target and probe images to .125 by combining each with distinct noise patterns (face RMS contrast: .01; noise RMS contrast: .08). Each trial (i.e. target-probe pairing) was iterated ten times with different random noise patterns.’

      We also added a supplemental with the graphic illustration of the d’ distributions of each model and human observers.

      Another issue is the decision to normalise image energy across orientations and viewpoints. I can see the logic in wanting to control for these effects, but this does reflect natural variation in image properties. So, again, I wonder what the results would look like without this step.

      All stimuli were matched for luminance and contrast. It is crucial to normalize image energy across orientations as natural image energy is disproportionately distributed across orientations (e.g., Hansen, Essock et al. 2003). Images of faces cropped from their background as used here contain most of their energy in the horizontal range (Keil 2008, Keil 2009, Goffaux and Greenwood 2016). If not normalized after orientation filtering, such uneven distribution of energy would boost recognition performance in the horizontal range across views. Normalization was performed across our experimental conditions merely to avoid energy from explaining the influence of viewpoint on the orientation tuning profile.

      We were not aware of any systematic natural variations of energy across face views. To address this, we measured face average energy (i.e., RMS contrast) in the original stimulus set, i.e., before the application of any image processing or manipulation. Background pixels were excluded from these image analyses. Across yaws, we found energy to range between .11 and .14 on a 0 to 1 grayscale. This is moderate compared to the range of energy variations we measured across identities (from .08 to .18). This suggests that variations in energy across viewpoints are moderate compared to variations related to identity. It is unclear whether these observations are specific to our stimulus set or whether they are generalizable to faces we encounter in everyday life. They, however, indicate that RMS contrast did not substantially vary across views in the present study and suggest that RMS normalization is unlikely to have affected the influence of viewpoint on recognition performance.

      In the revised methods section, we explicitly motivate energy normalization: ‘Images of faces cropped from their background as used here contain most of their energy in the horizontal range (Goffaux, 2019; Goffaux & Greenwood, 2016; Keil, 2009). Across yaws, we found face energy to range between .11 and .14 on a 0 to 1 grayscale, which is moderate compared to the range of face energy variations we measured across identities (from .08 to .18). To prevent energy from explaining our results, in all images, the luminance and RMS contrast of the face pixels were fixed to 0.55 and 0.15, respectively, and background pixels were uniformly set to 0.55. The percentage of clipped pixel values (below 0 or above 1) per image did not exceed 3%.’.

      Despite the bias toward horizontal orientations in human observers, there were some differences in the orientation preference at each viewpoint. For example, frontal faces were biased to horizontal (90 degrees), but other viewpoints had biases that were slightly off horizontal (e.g., right profile: 80 degrees, left profile: 100 degrees). This does seem to show that differences in statistical information at different viewpoints (more horizontal information for frontal and more vertical information for profile) do influence human perception. It would be good to reflect on this nuance in the data.

      Indeed, human performance data indicates that while identity recognition remains tuned to horizontal information, horizontal tuning peak shows some variation across viewpoints. We primarily focused on the first aspect because of its direct relevance to our research objective, but also discussed the second aspect: with yaw rotation, certain non-horizontal morphological features such as the jaw line or nose bridge, etc. may increasingly contribute to identity recognition, whereas at frontal or near frontal views, features are mostly horizontally-oriented (e.g., Keil 2008, Keil 2009). In the revised Discussion, we directly relate the modest fluctuations of peak location to yaw differences in face feature appearance.

      Recommendations for the authors:

      Reviewing Editor Comments:

      Based on a discussion with the reviewers, we integrated the recommendations and reached a consensus on the eLife assessment. To move from a "solid" to a "compelling/convincing" strength-of-evidence rating, please address the reviewers' comments. Key points are to clarify and test the plausibility of the models (e.g., effects of different noise-addition steps, inclusion/exclusion of specific orientation channels in the view-dependent comparison, and alternative decision criteria), and to address or discuss the limitations of the stimulus set in capturing recognition under more naturalistic scenarios, for example, including texture cues.

      Reviewer #1 (Recommendations for the authors):

      I generally found the paper to be very well-written, so I have only a few minor comments here.

      (1) I didn't really follow why the estimation of the Gaussian functions described in the text was preferred over a simpler ML framework. Do these approaches differ that much? I see references to prior studies in which these were applied, so I can certainly go check these out, but I could see value in adding just a bit of text to briefly make the case that this is important.

      Employing a simpler linear framework, i.e. a linear model predicting d’ from the interaction between orientation and viewpoint, would result in an 8 (orientation) * 7 (viewpoint) design that is difficult to analyze. The interaction term would almost certainly reach significance but its interpretation would be limited. We would either have to rely on numerous local comparisons, which are not particularly informative for our research objectives (e.g., knowing whether d’ differs significantly between two adjacent orientations at a given viewpoint is of little relevance), or to use a polynomial contrast approach (testing the linear, quadratic, … up to the 7th order trends), which would also be difficult to interpret. For such complex, approximately Gaussian-shaped data, the highest-order polynomial trend would likely provide the best fit, but without offering meaningful insight.

      In contrast, a nonlinear approach appears more appropriate. The Gaussian model we used allows us to characterize the parameters of the tuning profile, namely, peak location, peak amplitude, standard deviation (or bandwidth) and base amplitude. These parameters are not merely statistical parameters. Rather, they are directly interpretable in cognitive/functional terms. The peak location corresponds to the orientation at which the Gaussian curve is centred, i.e. the preferred orientation band for identity recognition. The standard deviation represents the width of the curve, reflecting the strength or selectivity of the tuning. The base amplitude is the height of the Gaussian curve base, indicating the minimum level of sensitivity, typically found near vertical orientation. Finally, the peak amplitude refers to the height of the Gaussian curve relative to its baseline, that is, it captures the advantage of horizontal over vertical orientations.

      Moreover, the use of a nonlinear, Gaussian model is motivated by past work that showed that the Gaussian function fits the evolution of recognition performance as a function of orientation (Dakin and Watt 2009, Goffaux and Greenwood 2016). Orientation selectivity at primary stages of visual processing has also been modelled using Gaussian (or Difference of Gaussians; Ringach, Hawken and Shapley 2003).

      We revised the data analysis section to include a justification for our use of a Gaussian model: ‘Therefore, fitting the human sensitivity data could be fitted using a simple Gaussian model. seemed most appropriate as it allows characterizing the parameters of the tuning profile, namely, peak location, peak amplitude, standard deviation and base amplitude, which are directly interpretable in cognitive/functional terms. Moreover, the use of a nonlinear, Gaussian model is motivated by past work that showed that the Gaussian function fits the evolution of recognition performance as a function of orientation (Dakin & Watt, 2009; Goffaux & Greenwood, 2016). Simpler frameworks, i.e. a linear model predicting d’ from the interaction between orientation and viewpoint, would result in an 8 (orientation) * 7 (viewpoint) design that is difficult to analyze and interpret.’

      (2) When reporting the luminance and contrast of your stimuli, please make clear what these units and measures are. This was a case where I had to take a second to assure myself that I knew what the values meant.

      We clarified that the luminance and contrast values reported in the manuscript are on a grey scale ranging from 0 to 1.

      (3) In your Procedure section, I think describing the familiarization task right away would help the text flow more clearly. At present, you began talking about the old/new task, and I was immediately wondering how familiarization worked!

      The procedure section now starts with the description of the familiarization task.

      (4) p. 3 - "Culminates" doesn't seem like the right word here.

      We agree and rephrased this way: ‘The tolerance of face identity recognition is stronger for familiar than unfamiliar faces’.

      (5) p. 5 - I think "with the multiple" shouldn't have "the".

      Indeed, we removed the “the”.

      Reviewer #2 (Recommendations for the authors):

      I enjoyed reading the manuscript, but thought the Introduction was a bit long. I wasn't sure about the relevance of the section on temporal contiguity. I think this might have been more relevant if this had been a manipulation in the design. So, I wonder if this might be shortened or removed to focus on the key questions. On the other hand, I found the overview of the view-selective and view-tolerant to be a bit brief. There is plenty of detail here, but I found it difficult to break down what was done when I first read it. It might be good to provide an overview in the Discussion too.

      While past research on the contribution of temporal contiguity to face identity recognition brings interesting insights into the nature of the visual experience leading to view-tolerant performance, we agree with the Reviewer that this aspect is not directly at stake here. We reduced the review of this literature in the Introduction. We clarified the description of the model observers as suggested by the reviewer and made sure to provide an overview of the model observers in the Discussion as well.

      References.

      Burton, A. M., R. Jenkins and S. R. Schweinberger (2011). "Mental representations of familiar faces." Br J Psychol 102(4): 943-958.

      Burton, A. M., R. S. Kramer, K. L. Ritchie and R. Jenkins (2016). "Identity From Variation: Representations of Faces Derived From Multiple Instances." Cogn Sci 40(1): 202-223.

      Dakin, S. C. and R. J. Watt (2009). "Biological "bar codes" in human faces." J Vis 9(4): 2 1-10.

      Dumont, H., A. Roux-Sibilon and V. Goffaux (2024). "Horizontal face information is the main gateway to the shape and surface cues to familiar face identity." PLoS One 19(10): e0311225.

      Goffaux, V. and J. A. Greenwood (2016). "The orientation selectivity of face identification." Scientific Reports 6(34204): 34204.

      Hansen, B. C., E. A. Essock, Y. Zheng and J. K. DeFord (2003). "Perceptual anisotropies in visual processing and their relation to natural image statistics." Network 14(3): 501-526.

      Jenkins, R., D. White, X. Van Montfort and A. Mike Burton (2011). "Variability in photos of the same face." Cognition 121(3): 313-323.

      Jiang, F., L. Dricot, V. Blanz, R. Goebel and B. Rossion (2009). "Neural correlates of shape and surface reflectance information in individual faces." Neuroscience 163(4): 1078-1091.

      Keil, M. S. (2008). "Does face image statistics predict a preferred spatial frequency for human face processing?" Proc Biol Sci 275(1647): 2095-2100.

      Keil, M. S. (2009). ""I look in your eyes, honey": internal face features induce spatial frequency preference for human face processing." PLoS Comput Biol 5(3): e1000329.

      Menon, N., R. I. Kemp and D. White (2018). "More than a sum of parts: robust face recognition by integrating variation." R Soc Open Sci 5(5): 172381.

      Mike Burton, A. (2013). "Why has research in face recognition progressed so slowly? The importance of variability." Q J Exp Psychol (Hove) 66(8): 1467-1485.

      Ringach, D. L., M. J. Hawken and R. Shapley (2003). "Dynamics of orientation tuning in macaque V1: the role of global and tuned suppression." Journal of neurophysiology 90(1): 342-352.

      Russell, R., I. Biederman, M. Nederhouser and P. Sinha (2007). "The utility of surface reflectance for the recognition of upright and inverted faces." Vision Res 47(2): 157-165.

      Russell, R., P. Sinha, I. Biederman and M. Nederhouser (2006). "Is pigmentation important for face recognition? Evidence from contrast negation." Perception 35(6): 749-759.

      Troje, N. F. and H. H. Bulthoff (1996). "Face recognition under varying poses: the role of texture and shape." Vision Res 36(12): 1761-1771.

      Vuong, Q. C., J. J. Peissig, M. C. Harrison and M. J. Tarr (2005). "The role of surface pigmentation for recognition revealed by contrast reversal in faces and Greebles." Vision Res 45(10): 1213-1223.

    1. eLife Assessment

      This Review Article provides a compendium of advice for MD-PhD students to consider when deciding which, if any, clinical field they will select for residency training. It is grounded in published data and effectively considers factors including the potential for clinical disciplines to sustain research integration, provide mentorship, meet lifestyle expectations, and foster a long-term career as a research-focused physician-scientist.

    2. Reviewer #1 (Public review):

      Summary:

      This brief piece by Swartz and colleagues outlines the complexities surrounding the choice of clinical specialty for physician-scientists. It is, in general, clear and well-written, and it will be useful to research-oriented medical students choosing a path and to the mentors who are guiding them.

      Strengths:

      The writing is clear. The points made are not profound, but they are important and will be of use to the intended audience.

      Weaknesses:

      I have only minor suggestions for improvement. There are some areas of redundancy where the article could be tightened up by consolidating.

    3. Reviewer #2 (Public review):

      Summary:

      This article is a useful compendium of advice for MD/PhD students (and research-focused MD students) to consider when it is time to decide on a clinical field for residency training. The authors are a distinguished group of physician-scientists and program directors who are drawing on published data and their own experience as mentors to provide advice and resources to students about to make what can be a career-defining choice. It makes an effective argument for considering important differences between clinical fields in their ability to sustain research integration, provide mentorship, meet lifestyle expectations, and foster a long-term career as a research-focused physician-scientist.

      Strengths:

      (1) A lot has been written about physician-scientists as an endangered species. Given the important role that physician-scientists can play if they engage in research that is informed by experience in patient care, not nearly enough has been written about the choices that students make during training that can keep them on track or throw them off.

      (2) The article provides not only general advice, but specific information in the 2 tables that can help trainees to weigh their priorities and consider their options.

      (3) Among the best advice is to weigh clinical demands, maintenance of procedural skills, recognition of the impact of research time on salary, and the impact of high salaries on the tension between research effort and clinical effort in clinical departments, which is where most physician-scientists in academia are employed.

      Areas for potential improvement:

      (1) Some of the most useful pieces of advice are scattered through the text when they might be more impactful if focused. For example, what are the 4 or 5 most essential factors that someone in an MD/PhD or an MD program should weigh when they are deciding between clinical disciplines? There are also published data on the experience of past graduates in achieving a research-focused career in each clinical discipline. How should that data be applied by trainees? What are the factors that should be weighed in deciding where to work as a research-focused physician once training has been completed?

      (2) Some clinical fields at academic institutions have proved to be much more hospitable to careers as research-focused physicians than others. Published data highlight the challenges. I believe the authors have tried very hard to present a balanced perspective, but in the process, they have, I believe, missed an opportunity to guide trainees and make them aware of what they should look for to avoid making a decision that may prove incompatible with their long-term goals.

      (3) An issue that hasn't been raised: Where will be the jobs for physician-scientists who have an MD {plus minus} PhD and want to do research and discovery? How many openings will there be for physician-scientists in academia 5-10 years from now? In industry? How are recent events in Washington affecting the continuation of those jobs? Unfortunately, I am not aware of labor statistics for physician-scientists, but perhaps the authors can find them.

      (4) Additional questions that can be raised and addressed in the article: Should one of the "smart choices" in the article's title be where you do the residency, and not just which residency you do? How important is it to be at a successful, research-intensive medical center/university, both during and after residency and fellowship training? If being in an institution where there are numerous very successful physician-scientists and scientists improves the likelihood of being able to sustain a physician-scientist career, how should graduating students improve their chances of being at one of those institutions?

      (5) In every clinical discipline, there are departments that value physician-scientists more than other departments and invest accordingly. What advice would the authors give to help graduating students identify those departments?

    4. Author response:

      Thank you for the valuable feedback. We will be updating the manuscript to incorporate the reviewers' terrific suggestions. We specifically have:

      • Reduced redundancy and streamlined overlapping sections (especially around research alignment, protected time, and clinical demands)

      • Made the core decision-making framework more explicit and easier to extract (in a new Table 1, with clearer synthesis in the text)

      • Strengthened the emphasis on institutional/program context as a key determinant of success—arguably as important as specialty choice

      • Added more actionable guidance for trainees on how to evaluate departments (e.g., NIH Reporter, T32 presence, R01 density, K→R track record)

      • Included a slightly more explicit statement acknowledging that while all specialties can support physician-scientist careers, the structural ease varies and may require different levels of negotiation/support

      We did not address the broader workforce/job market question, since it feels outside the scope.

    1. eLife Assessment

      This valuable paper provides convincing evidence that humans can navigate better through maps whose local transitions were learned in an intermixed order than maps whose local transitions were learned in neighboring groups. The authors put forward a potential mechanism in which the grouped learning resulted in mental fragmentation, though evidence for this mechanism is incomplete. The work will be of interest to researchers studying cognitive maps and curriculum learning.

    2. Reviewer #1 (Public review):

      This paper investigates how different learning curricula influence the way that humans piece together directly experienced transitions into a broader cognitive map. When adjacent learning trials were grouped within rows or columns of the map, subsequent navigation through the map was weaker than when adjacent learning trials came from disjoint spaces in the map. The authors speculate that the grouped curriculum resulted in mental fragmentation that made navigation across space more difficult later on.

      This is an interesting paradigm that contributes useful new findings in the domain of map learning to the growing literature on curriculum learning. The evidence for a difference between conditions is highly compelling, but, as the authors are very transparent in acknowledging in the Discussion, the evidence for their proposed mechanism - mental fragmentation under grouped learning - is somewhat weak. The study thus presents an intriguing empirical result but not an ironclad mechanistic account.

      An alternative - by their account, "less interesting" - explanation is that grouped learning was easier because trials in close succession had overlapping elements, and so participants were not trying as hard or as engaged. There is a literature on spaced (as opposed to massed) learning being better for subsequent memory because it increases retrieval effort. It seems very plausible that this could be going on here, and the control experiment reported in the supplement would not help to rule this out. This literature deserves some discussion.

      The Introduction focuses entirely on literature showing advantages in grouped over intermixed learning, setting that up as the most well-motivated expectation from the literature. Upon finding the opposite, the Discussion then mentions that interleaving has been found to be useful in "applied domains", but then returns to how surprising this is in light of recent findings in the category learning literature. But there is a substantial earlier literature on interleaved vs blocked curricula in category learning, very often finding advantages for interleaving. See, e.g., Carvalho & Goldstone, 2015, for a review. There is also a paper showing interleaving advantages in associative inference, Zhou et al., 2023, JEP:G, which is very relevant to several of the discussion section paragraphs. Thus, the treatment of the prior curriculum learning literature is currently sparse.

    3. Reviewer #2 (Public review):

      I think this paper is an excellent and timely contribution. It clearly shows that learning overlapping relationships in a disjoint training schedule (where the overlaps are not encountered close together in time) appears to aid the formation of an integrated associative memory structure (a cognitive map) and supports generalisation. I believe the methods are sound and the results are clear. I only have a couple of methodological questions that may not warrant any changes to the paper (or only very minor changes/additions):

      (1) The mixed effects models did not include random slopes for the within-subject factors ("spatial manipulation" and "block"), and so the corresponding fixed effect inferences may be unsafe. Having said that, it is likely that including these slopes may not be warranted given their contribution to the model's fit. I recommend that the authors check this.

      (2) The mixed effects models for accuracy appear to model average performance across trials rather than using a generalised linear model with a (e.g.) logit link function and the binomial distribution to characterise performance. I think this is a little sub-optimal, as the latter is often more sensitive. Nonetheless, it is not in any way wrong; the results are clear enough as is, and there may be a good reason to avoid a non-linear link function, which can alter the interpretation of effects close to the ceiling and floor.

      I think the introduction and/or discussion would benefit from contrasting their results with Berens & Bird (2022, PLOS Comp Bio). In this paper, it is shown that blocking the training of discriminations in a linear hierarchy (what we call progressive training) substantially benefited transitive inference performance. This seems at odds with the author's finding that "participants struggle to integrate information across rows and columns, i.e. across groups of transitions that were trained separately in time".

      I would really like to know what the authors think about this discrepancy (or, indeed, whether they think there is one at all). Is it possibly because "progressive" learning is some combination of "grouping", "blocking" and "chaining" (where there is a structured overlap between adjacently trained relationships)? Or is it something else, e.g., that there is a fundamental difference between learning associations and discriminations (personally, I lean on this explanation)?

      Relevant to this, the authors note that their "findings do contradict recent reports from the category learning literature, where blocking seems to help learning and generalisation (Dekker et al., 2022; Flesch et al., 2018; Noh et al., 2016). It may be that where the goal is not to learn a complex knowledge structure - like a map - but simply to compress exemplars by mapping them onto a smaller number of labels - the benefits of blocking emerge." However, the benefit of progressive (blocked) training in my own work was observed in a task that required learning a complex/relational structure in the form of a transitive hierarchy, which theoretical accounts suggest depends on learning map-like representations (Whittington et al., 2020).

    4. Reviewer #3 (Public review):

      Summary:

      This study examines how training regimes influence the formation of cognitive maps. Participants learned two relational maps over three days through pairwise transitions: one map was trained with grouped sequences that followed rows or columns, while the other was trained with disjoint transitions sampled randomly across the map. In addition, the study manipulated the temporal spacing of training blocks (blocked vs. semi-blocked) and tested whether the results generalized across two map geometries (a 5×5 grid and a 4×4 torus).

      Furthermore, they run a follow-up experiment (or condition) testing rows and columns shuffled in the grouped condition.

      While grouped training produced better performance during learning, the authors report that disjoint training led to superior performance at test on tasks probing the global map knowledge.

      Summarising experimental design:

      (1) Map geometry (between-subjects): 5×5 grid vs 4×4 torus

      (2) Training block schedule (between-subjects): Blocked vs Semi-blocked

      (3) Training regime/transition sampling (within-subject): Grouped or Disjoint (Day 1 and Day 2)

      Strengths:

      The study addresses a clear and timely theoretical question about how the training regime affects the formation of cognitive maps. A further strength is the well-controlled experimental design, allowing the authors to test their hypotheses in a systematic and informative way.

      Weaknesses:

      (1) If I understood correctly, participants learned one map on the first day and the other on the second day, with the training regime (grouped vs. disjoint) counterbalanced across maps. This raises the possibility that experience with one training regime on day one could influence performance on the second day. For example, it would be interesting to examine whether participants who experienced the disjoint regime first showed any differences when learning the grouped regime on the following day. While it may be difficult to fully disentangle such transfer effects from the main training regime effects, it would be informative to test whether performance on the second day depends on the regime experienced on the first day (e.g., whether prior exposure to the disjoint regime predicts performance on the subsequent grouped training, but not vice versa).

      (2) The author mentions a control experiment. Did the participants in the control experiment complete only the training phase or also the testing tasks used in the main experiment? If testing was included, it would be informative to report whether performance at test was comparable to that observed in the main experiment. Given that this condition appears to involve blocked transitions while moving across both rows and columns, I would expect performance to fall somewhere between the grouped and disjoint conditions.

      (3) Participants' performance did not differ between conditions in the map reconstruction task, suggesting that participants in both the grouped and disjoint regimes were ultimately able to form a cognitive map. Was this task always administered last during the testing session? I wonder whether the explicit request of the reconstruction task could have influenced participants' awareness of the map structure.

      (4) The manuscript describes the study as consisting of four experiments (two groups per map shape, differing in the blocked versus semi-blocked schedule). However, based on the design described in the Methods, this appears more accurately characterized as a single experiment with two between factors: map geometry (grid vs. torus) and blocking schedule (blocked vs. semi-blocked) manipulated between participants, and training regime (grouped vs. disjoint) manipulated within participants.

      (5) It is not entirely clear to me from the Results section whether performance at test differed between the two map geometries (grid and torus), or whether the reported effects of training regime were consistent across them.

    1. eLife Assessment

      The authors combined human assembloids, fetal brain tissue, bulk and single cell RNA sequencing, and live imaging to understand the molecular mechanisms affected by hypoxia during cortical development. The findings are very important to the neurodevelopmental field, They reveal new insights into how migration of cortical interneurons can be affected in hypoxic conditions, and provide exciting models to probe broad neurodevelopmental processes in health and disease. The evidence is compelling. The data and analyses are very rigorous and go beyond the state-of-the-art.

    2. Reviewer #1 (Public review):

      Summary:

      This work aims to elucidate the molecular mechanisms affected in hypoxic conditions causing reduced cortical interneuron migration. They use human assembloids as a migratory assay of subpallial interneurons into cortical organoids and show substantially reduced migration upon 24 hours hypoxia. Bulk and scRNA-seq shows adrenomedullin (ADM) up-regulation, as well as its receptor RAMP2 confirmed at protein level. Adding ADM to the culture medium after hypoxic conditions rescues the migration deficits, even though the subtype of interneurons affected is not examined. However, the authors demonstrate very clearly that ineffective ADM does not rescue the phenotype and blocking RAMP2 also interferes with the rescue. The authors are also applauded for using 4 different cell lines and using human fetal cortex slices as an independent method to explore the DLXi1/2GFP-labelled iPSC-derived interneuron migration in this substrate with and without ADM addition (after confirming that also in this system ADM is up-regulated). Finally, the authors demonstrate PKA - CREB signalling mediating the effect of ADM addition, and also lead to up-regulation of GABAreceptors. Taken together this is a very carefully done study on an important subject - how hypoxia affects cortical interneuron migration. In my view it would be of great interest for the readers of Elife.

      Strengths:

      Its strengths are the novelty and the thorough work using several culture methods and 4 independent lines.

      Weaknesses:

      The main weakness is that we dont know which interneuron subtypes are most affected by hypoxia and which may be rescued in their migration by ADM.

      A further weakness is that the few genes confirmed to be regulated after hypoxia do not help determining which statistical cut-off can be considered reliable, given that they didn't compare strongly regulated versus weakly regulated genes.

      Comments on revisions:

      Unfortunately, the authors did not address my suggestions. While they show example stainings of interneuron subtypes, they do not show if Calretinin, calbinin or somatostatin+ interneurons are differentially affected by hypoxia or the rescue with ADM. I still consider this an important piece of information to add.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by Puno and colleagues investigates the impact of hypoxia on cortical interneuron migration and downstream signaling pathways. They establish two models to test hypoxia, cortical forebrain assembloids and primary human fetal brain tissue. Both of these models provide a robust assay for interneuron migration. In addition, they find that ADM signaling mediates the migration deficits and rescue using exogenous ADM. The findings are novel and very interesting to the neurodevelopmental field, revealing new insights into how cortical interneurons migrate and as well, establishing exciting models for future studies.The authors use sufficient iPSC lines including both XX and XY, so analysis is robust. In addition, the RNAseq data with re-oxygenation is a nice control to see what genes are changed specifically due to hypoxia. Further, the overall level of valiation of the sequencing data and involvement of ADM signaling is convincing, including the validation of ADM at the protein level. Overall this is a very nice manuscript. I have a few comments and suggestions for the authors.

      Strengths/Weaknesses:

      (1) Can they comment on the possibility of inflammatory response pathways being activated by hypoxia - has this been shown before? While not the focus of the manuscript, it would be discussed in the Discussion as an interesting finding and potential involvement of other cells in the Hypoxic response.

      (2) Can they comment on the mechanism at play here with respect to ADM and binding to RAMP2 receptors - is this a potential autocrine loop, or is the source of ADM from other cell types besides inhibitory neurons? Given the scRNA-seq data, what cell-to-cell mechanisms can be at play? Since different cells express ADM, there could be different mechanisms at place in ventral vs dorsal areas.

      (3) For data from Figure 6 - while the ELISA assays are informative to determine which pathways (PKA, AKT, ERK) are active, there is no positive control to indicate these assays are "working" - therefore, if possible, western blot analysis from assembloid tissue could be used (perhaps using the same lysates from Fig 3) as an alternative to validate changes at the protein level (however, this might prove difficult); further to this, is P-CREB activated at the protein level using WB?

      (4) Can the authors comment further on the mechanism and what biological pathways and potential events are downstream of ADM binding to RAMP2 in inhibitory neurons? What functional impact would this have linked to the CREB pathway proposed? While the link to GABA receptors is proposed, CREB has many targets beyond this.

      (5) Does hypoxia cause any changes to inhibitory neurogenesis (earlier stages than migration?) - this might always be known but was not discussed.

      (6) In the Discussion section - it might be worth detailing to the readers what the functional impact of delayed/reduced migration of inhibitory neurons into the cortex might results in, in terms of functional consequences for neural circuit development

      Comments on revisions:

      The authors have addressed my comments thoroughly. I have no further comments or suggestions

    4. Reviewer #3 (Public review):

      Summary:

      The authors aimed to test whether hypoxia disrupts the migration of human cortical interneurons, a process long suspected to underlie brain injury in preterm infants but previously inaccessible for direct study. Using human forebrain assembloids and ex vivo developing brain tissue, they visualized and quantified interneuron migration under hypoxic conditions, identified molecular components of the response, and explored the effect of pharmacological intervention (specifically ADM) on restoring the migration deficits.

      Strengths:

      The major strength of this study lies in its use of human forebrain assembloids and ex vivo prenatal brain tissue, which provide a direct system to study interneuron migration under hypoxic conditions. The authors combine multiple approaches: long-term live imaging to directly visualize interneuron migration, bulk and single-cell transcriptomics to identify hypoxia-induced molecular responses, pharmacological rescue experiments with ADM to establish therapeutic potential, and mechanistic assays implicating the cAMP/PKA/pCREB pathway and GABA receptor expression in mediating the effect. Together, this rigorous and multifaceted strategy convincingly demonstrates that hypoxia disrupts interneuron migration and that ADM can restore this defect through defined molecular mechanisms.

      Overall, the authors achieve their stated aims, and the results strongly support their conclusions. The work has significant impact by providing the first direct evidence of hypoxia-induced interneuron migration deficits in the human context, while also nominating a candidate therapeutic avenue. Beyond the specific findings, the methodological platform-particularly the combination of assembloids and live imaging-will be broadly useful to the community for probing neurodevelopmental processes in health and disease.

      Comments on revisions:

      The authors have fully addressed my concerns by incorporating the relevant discussion into the manuscript, especially regarding how well the migration observed in hSO-hCO assembloids reflects in vivo condition. I have no further comments.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #2 (Public review): 

      Weaknesses:

      (1) Can the authors comment on the possibility of inflammatory response pathways being activated by hypoxia? Has this been shown before? While not the focus of the manuscript, it could be discussed in the Discussion as an interesting finding and potential involvement of other cells in the Hypoxic response.

      We thank the reviewer for reviewing our manuscript and for the important comment about inflammation. Indeed, hypoxia has been shown to activate the inflammatory response pathways. In various studies, it was found that HIF-1a can interact with NF-κB signaling, leading to the upregulation of pro-inflammatory cytokines such as IL-1β, IL-6, and TNF-α (Rius et al., Cell, 2008; Hagberg et al., Nat Rev Neurol, 2015).

      In our transcriptomics data (Fig. 2D), and to the reviewers’ point, we identified enrichment of inflammatory signaling response following the hypoxic exposure. Since hSO at the time of analyses do contain some astrocytes, we think these contribute to the observed pro-inflammatory changes and emphasize the feasibility of capturing this response in organoids in vitro. This is also important because ADM is known to have anti-inflammatory properties and should be investigated as such in future studies focused on hypoxia-induced inflammation.

      In the manuscript, we included a few sentences in the discussion to address the lack of in-depth analyses of inflammation as a limitation of our study.

      (2) Could the authors comment on the mechanism at play here with respect to ADM and binding to RAMP2 receptors - is this a potential autocrine loop, or is the source of ADM from other cell types besides inhibitory neurons? Given the scRNA-seq data, what cell-to-cell mechanisms can be at play? Since different cells express ADM, there could be different mechanisms in place in ventral vs dorsal areas.

      Based on our scRNA-seq data in hSOs showing significant upregulation of ADM expression in astrocytes and progenitors, and increased expression of RAMP2 receptors on neurons, we speculate that the primary mechanism is likely to involve paracrine interactions. However, we cannot exclude autocrine mechanisms with the current experiments. Dissecting these interactions in a cell-type specific manner could be an important focus for future ADM-related studies.

      To address the question about the possible different mechanisms in ventral versus dorsal areas, in the revision, we plotted and included in the figures the data about the cell-type expression of ADM and its receptors in hCOs (Fig. S3)

      (3) For data from Figure 6 - while the ELISA assays are informative to determine which pathways (PKA, AKT, ERK) are active, there is no positive control to indicate these assays are "working" - therefore, if possible, western blot analysis from assembloid tissue could be used (perhaps using the same lysates from Figure 3) as an alternative to validate changes at the protein level (however, this might prove difficult); further to this, is P-CREB activated at the protein level using WB?

      We thank the reviewer for this comment and the observation. Although we did not include a traditional positive control in these ELISA assays, several lines of evidence indicate that the measurements are reliable. First, the standard curves behaved as expected, and all sample values fell within the assay’s dynamic range. Second, technical replicates showed low variability, and the observed changes across experimental conditions (e.g., hypoxia vs. control) were consistent with the expected biological responses based on previous literature. We agree that including western blot validation would strengthen the findings, and we will note this for our future studies focused on CREB and ADM.

      (4) Could the authors comment further on the mechanism and what biological pathways and potential events are downstream of ADM binding to RAMP2 in inhibitory neurons? What functional impact would this have linked to the CREB pathway proposed? While the link to GABA receptors is proposed, CREB has many targets beyond this.

      We appreciate the reviewers’ insightful question. Currently, not much is known about the molecular pathways and downstream cellular events triggered by ADM binding to RAMP2 in inhibitory neurons, and in general in brain cells. The data from our study brings the first information about the cell-type specific expression of ADM in baseline and hypoxic conditions and is one of the key novelties of our study.

      While the signaling landscape of ADM in interneurons is largely unexplored, several studies in other (non-brain) cell types have demonstrated that ADM binding to RAMP2 can activate downstream cascades such as the cAMP/PKA/CREB pathway, PI3K/AKT, and ERK/MAPK, all of which are also known to be critical regulators of neuronal development and survival. These previously published data along with our CREB-targeted findings in hypoxic interneurons, suggest ADM–RAMP2 signaling could influence multiple aspects of interneuron biology, but these remain to be evaluated in future studies.

      We agree with the reviewer that CREB has a wide range of transcriptional targets. We decided to focus on GABA as a target of CREB for two main reasons, including: (i) GABA signaling has been previously shown to play an important role in the migration of cortical interneurons, and (ii) a previous study by Birey et al. (Cell Stem Cell, 2022) demonstrated that CREB pathway activity is essential for regulating interneuron migration in assembloid models of Timothy Syndrome, thus further providing evidence that dysregulation of CREB activity disrupts migration dynamics.

      While our study provides a first step toward uncovering the mechanisms of interneuron migration protection by ADM, we fully acknowledge that future work will be needed to delineate the full spectrum of ADM–RAMP2 downstream signaling events in inhibitory neurons and other brain cells.

      (5) Does hypoxia cause any changes to inhibitory neurogenesis (earlier stages than migration?) - this might always be known, but was not discussed.

      We appreciate this question from the reviewer; however, this was not something that we focused on in this manuscript due to the already large amount of data included. A separate study focusing on neurogenesis defects and the molecular mechanisms of injury for that specific developmental process would be an important next step.

      (6) In the Discussion section, it might be worth detailing to the readers what the functional impact of delayed/reduced migration of inhibitory neurons into the cortex might result in, in terms of functional consequences for neural circuit development.

      We thank the Reviewer for the suggestion of detailing the functional impact of reduced inhibitory neuron migration. The manuscript to discuss that previous studies show that failure of interneurons to migrate and reach their designated targets within the appropriate developmental window leads to their elimination through apoptosis. Decreased numbers (or abnormal development) of interneurons are associated with neurodevelopmental impairments and abnormal functional connectivity in the brain.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The authors should examine if all cortical interneurons are affected by ADM or only subtypes (Parvalbumin/Somatostatin).

      We thank the reviewer for raising this important question. In our study, we utilized the Dlx1/2b::eGFP reporter to broadly label cortical interneurons; however, this system does not distinguish specific interneuron subtypes. To address this, in the manuscript we used the single-cell RNA sequencing data and immunostainings to provide this information. As expected based on our previous reports, most cortical interneurons present in organoids are represented by calretinin (CALB2), somatostatin (SST) and calbindin (CALB1). These data are now presented in Fig. S3.

      Separately, we used available scRNA-seq data from developing human brain and showed that at ~20 PCW, the developing human brain has similar types of cortical interneurons. These data are now included in Fig. S5.

      (2) The authors should test more candidates from their bulk RNA-seq data with different fold changes for regulation after hypoxia, to allow the reader to judge at which cut-off the DEGs may be reproducible. This would make this database much more valuable for the field of hypoxia research.

      We appreciate the reviewers’ thoughtful suggestion. In addition to the bulk RNA-seq analysis, we did validate several upregulated hypoxia-responsive genes with varying fold changes by qPCR; these include PDK1, PFKP, VEGFA (Fig. S1).

      We do agree that in-depth investigation of specific cut-offs would be interesting, however, this could be the focus of a different manuscript.

      Reviewer #3 (Recommendations for the authors):

      Most of the evidence presented is convincing in supporting the conclusions, and I have only minor suggestions for improvement:

      (1) The bulk RNA-seq was performed in hSOs only, which may not fully capture the phenotypes of migrating or migrated interneurons. It would be valuable, if feasible, to sort migrated cells from hSO-hCO assembloids and specifically examine their molecular mediators.

      We thank the reviewer for this suggestion. While it is likely that the cellular environment will have some influence on a subset of the molecular changes, based on all the data from the manuscript and our specific target, the RNA-sequencing on hSOs was sufficient to capture essential changes like ADM upregulation. The in-depth exploration on differential responses of migrated versus non-migrated interneurons to hypoxia could be the focus of a different project.

      (2) In Figure 3, it is striking that cell-type heterogeneity dominates over hypoxia vs. control conditions. A joint embedding of hSO and hCO cells could provide further insight into molecular differences between migrated and non-migrated interneurons.

      We thank the reviewer for this observation and opportunity to clarify. Since we manually separated the assembloids before the analyses, we processed these samples separately. That is why they separate like this. In the revision, we added data about ADM expression and its receptors’ expression in the hCOs.

      (3) It would be helpful to expand the discussion on how closely the migration observed in hSO-hCO assembloids reflects in vivo conditions, and what environmental aspects are absent from this model. This would better frame the interpretation and translational relevance of the findings.

      We thank the Reviewer for bringing up this important point. Although the assembloid model offers the unique advantage of allowing the direct investigation of migration patterns of hypoxic interneurons, we fully agree it does not fully recapitulate the in vivo environment. While there are multiple aspects that cannot be recapitulated in vitro at this time (e.g. cellular complexity, vasculature, immune response, etc), we are encouraged by the validation of our main findings in ex vivo developing human brain tissue, which strongly supports the validity of our findings for in vivo conditions.

      We expanded our discussion to include more details and the need to validate these findings using in vivo models.

      (4) The authors suggest that hypoxia is also associated with delayed interneuron maturation, yet the bulk RNA-seq data primarily reveal stress and hypoxia-related genes. A more detailed discussion of why genes linked to interneuron maturation and function were not strongly affected would clarify this point.

      We thank the Reviewer for the opportunity to clarify.

      The RNAseq data was performed during the acute stages of hypoxia/reoxygenation and we think a maturation phenotype might be difficult to capture at this point and would require analysis at later in vitro assembloid maturation stages.

      Our speculation about a possible maturation defect is based on data from previous studies from developmental biology that showed failure of interneurons to reach their final cortical location within a specified developmental window will impair their integration within the neuronal network, and thus lead to maturation defects and possible elimination by apoptosis.

      Since preterm infants suffer from countless hypoxic events over multiple months, we speculate these repetitive events are likely to induce cumulative delays in migration, inability of interneurons to reach their target in time, followed by abnormal integration within the excitatory network, and eventual elimination of some of these interneurons through apoptosis. However, the direct demonstration of this effect following a hypoxic insult would require prolonged in vivo experiments in rodents to follow the migration, network integration and apoptosis of interneurons; to our knowledge this experimental design is not technically feasible at this time, and thus this hypothesis remains speculative and only included in the discussion.

      (5) Relatedly, while the focus on interneuron migration is well justified, acknowledging how hypoxia might also impact other aspects of cortical development (e.g., progenitor proliferation, neuronal maturation, or circuit integration) would place the findings in a broader developmental framework and strengthen their relevance.

      We appreciate the Reviewer’s suggestion to discuss the role of hypoxia on other interneuron developmental processes during cortical development. In the manuscript, we included text in the discussion about the likely effects of hypoxia on interneuron proliferation, maturation and circuit integration.

      (6) Very minor: in Figure S3C and D, it was not stated what the colors mean (grey: control, yellow: hypoxia)

      Thank you for pointing out this error; we corrected it in our revision.

    1. eLife Assessment

      This valuable study proposes a novel rapid-entry mechanism for Staphylococcus aureus, involving the rapid release of calcium from lysosomes. The paper's strength lies in its very interesting hypothesis. The methods used are solid and adequately support the conclusions.

    2. Reviewer #2 (Public review):

      [Editors' note: This version was assessed by the editors. The authors have addressed a point raised by Reviewer #2, who thought the authors compared cells grown in low-serum and high serum conditions. This has been clarified in the latest version.]

      In the manuscript Ruhling et al propose a rapid uptake pathway that is dependent on lysosomal exocytosis, lysosomal Ca2+ and acid sphingomyelinase, and further suggest that the intracellular trafficking and fate of the pathogen is dictated by the mode of entry. Overall, this is manuscript argues for an important mechanism of a 'rapid' cellular entry pathway of S.aureus that is dependent on lysosomal exocytosis and acid sphingomyelinase and links the intracellular fate of bacterium including phagosomal dynamics, cytosolic replication and host cell death to different modes of uptake.

      A key strength is the nature of the idea proposed, while continued reliance on inhibitor treatment combined with lack of phenotype / conditional phenotype for genetic knock out is a major weakness.

      In the previous version, the authors perform experiments with ASM KO cells to provide genetic evidence of the role for ASM in S. aureus entry through lysosomal modulation.

    3. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #2 (Public review):

      In the manuscript Ruhling et al propose a rapid uptake pathway that is dependent on lysosomal exocytosis, lysosomal Ca2+ and acid sphingomyelinase, and further suggest that the intracellular trafficking and fate of the pathogen is dictated by the mode of entry. Overall, this is manuscript argues for an important mechanism of a 'rapid' cellular entry pathway of S.aureus that is dependent on lysosomal exocytosis and acid sphingomyelinase and links the intracellular fate of bacterium including phagosomal dynamics, cytosolic replication and host cell death to different modes of uptake. 

      Key strength is the nature of the idea proposed, while continued reliance on inhibitor treatment combined with lack of phenotype / conditional phenotype for genetic knock out is a major weakness. 

      In the revised version, the authors perform experiments with ASM KO cells to provide genetic evidence of the role for ASM in S. aureus entry through lysosomal modulation. The key additional experiment is the phenotype of reduced bacterial uptake in low serum, but not in high serum conditions. The authors suggest this could be due to the SM from serum itself affecting the entry. While this explanation is plausible, prolonged exposure of cells to low serum is well documented to alter several cellular functions, particularly in the context of this manuscript, lysosomal positioning, exocytosis and Ca2+ signaling. A better control here could be WT cells grown in low serum.

      As the reviewer suggested, we did culture both, WT control cells as well as ASM knock-outs, under low serum conditions before conducting the invasion assays. Hence, the detected effects on S. aureus invasion must be caused by lack of functional ASM in the mutant.

      We apologize that this did not become evident from the manuscript’s text. We thus included a change in line 259 which now reads:

      ”To test whether FBS confounded our invasion experiments, we cultivated WT as well as ASM K.O. cells in medium with reduced FBS concentration (1%) and determined the S. aureus invasion efficiency (Figure 2I).”

      If SM in serum can interfere, why do they see such pronounced phenotype on bacterial entry in WT cells upon chemical inhibition?

      We explain the differences between inhibitor-treated WT cells and ASM K.O.s by the severe accumulation of SM upon genetic ablation of ASM. We demonstrated this by HPLC-MS/MS measurements in Figure 2L. If cells were cultured in 10% FBS, an ASM K.O. resulted in approx. 4-times higher levels of cellular SM C18:0 when compared to WT cells, while amitriptyline treatment of WT cells had no effect, and ARC39 treatment increased SM C18:0 levels only by 2-fold. This likely results from different durations of SM accumulation in the cell pools which is caused either by complete absence of ASM (in case of the ASM K.O.) or only in the hour-range upon treatment with the inhibitors.

      Under low serum conditions, the severe SM C18:0 accumulation in the ASM K.O. was found decreased (from 4-fold to 2-fold when compared to WT cells; Figure 2M). Here, the WT cells used as reference also were cultured in the same manner as the ASM K.O. A similar pattern was observed for other SM species (Supp. Figure 3). This correlates with the S. aureus invasion phenotype in ASM K.O.: under high serum conditions (and resulting in severe SM accumulation) we did not detect an invasion defect, while under low serum conditions (resulting in only moderate SM accumulation) S. aureus invasion was reduced in the knock-outs when compared to WT cells cultured in the same conditions, respectively.

      While the authors argue a role for undetectable nano-scale Cer platforms on the cell surface caused by ASM activity, results do not rule out a SM independent role in the cellular uptake phenotype of ASM inhibitors.

      Since the comments starting with the line above are identical to the previous comments by the reviewer, we assume that these points of criticism still resound with the Reviewer, although we had agreed previously with the reviewer that we do not show formation of ceramide-enriched platforms, we had changed the manuscript accordingly in the previous revision round already (see also our comment below).

      The authors have attempted to address many of the points raised in the previous revision. While the new data presented provide partial evidence, the reliance on chemical inhibitors and lack of clear results directly documenting release of lysosomal Ca2+, or single bacterial tracking, or clear distinction between ASM dependent and independent processes dampen the enthusiasm.

      We continue to share the reviewer’s desire to discriminate between ASM-dependent and ASMindependent processes, but the simultaneous occurrence of multiple pathways of bacterial uptake is currently the limiting factor and technological challenge in our laboratory, since these events happen rapidly. We do hope that we or others will be able to address these limitations in the future, for instance with the technologies suggested by the reviewer.

      I acknowledge the author's argument of different ASM inhibitors showing similar phenotypes across different assays as pointing to a role for ASM, but the lack of phenotype in ASM KO cells is concerning. The author's argument that altered lipid composition in ASM KO cells could be overcoming the ASMmediated infection effects by other ASM-independent mechanisms is speculative, as they acknowledge, and moderates the importance of ASM-dependent pathway. The SM accumulation in ASM KO cells does not distinguish between localized alterations within the cells. If this pathway can be compensated, how central is it likely to be ? 

      We here want to elaborate again, since our revision experiments demonstrate the ASM-dependency of the rapid uptake under low serum conditions – see also above. We were convinced that the genetic evidence of an S. aureus invasion phenotype in ASM K.O.s under these conditions would eliminate the reviewer’s concern about the role of ASM during the bacterial invasion (see also above). Our lipidomics data of ASM K.O.s cultured in 1% and 10% FBS (Figure 2, M, Supp. Figure 3) and inhibitor-treated WT cells (Figure 2L, Supp. Figure 3) show a correlation between SM accumulation and the invasion phenotype observed by us.

      We agree with the reviewer, however, that it remains elusive why changes in the sphingolipidome increase ASM-independent S. aureus internalization by host cells. One explanation is a dysfunction of the lipid raft-associated protein caveolin-1 upon strong SM accumulation, which was previously shown to appear in ASM-deficient cells (1, 2). A lack of caveolin-1 results in strongly increased host cell entry of S. aureus in certain cell types (3, 4). In other cell types, such as A549 cells, S. aureus invades in an αtoxin and caveolin-1 dependent fashion (5). It will be interesting to study, to what extent such processes as described by Goldmann and colleagues will depend on ASM. However, a characterization of the mechanism behind these observations requires further experimentation and is beyond the scope of the current manuscript. 

      As to the centrality of the pathway: we cannot and do not make any assumptions on the centrality of the pathway and its importance in vivo. As scientists we were intrigued by our finding of an ASM dependent uptake pathway for S. aureus – especially its speed. In different as of yet still unidentified host cell types or cell lines such a pathway may pose a major entry point for pathogens. Alternatively, we may have identified an ASM-dependent mode of receptor uptake, with which the bacteria “piggyback” into the cells.

      The authors allude to lower phagosomal escape rate in ASM KO cells compared to inhibitor treatment, which appears to contradict the notion of uptake and intracellular trafficking phenotype being tightly linked. As they point out, these results might be hard to interpret.

      We again want to add that we measured phagosomal escape of S. aureus in WT and ASM K.O. cells cultured in 1% FBS (low serum conditions) and compared it to escape rates obtained with host cells cultured in 10% FBS. Again, we infected cells for 10 or 30 min and determined the escape rates 3h p.i. However, the results are similar to escape rates determined with 10% FBS (see Author response image 1). This was addressed already during the manuscript’s first revision. We found that escape rates of S. aureus were significantly decreased in absence of ASM regardless of the FBS concentration in the medium.

      Author response image 1.

      We therefore think that prolonged absence of ASM has additional side effects. For instance, certain endocytic pathways could be up- or down-regulated to adapt for the absence of ASM or could be affected by other changes in the lipidome (that can be minimized but not completely prevented by culturing cells in 1% FBS). This could, for instance, affect maturation of S. aureus-containing phagosomes and hence phagosomal escape.

      As it is currently unclear in how far the prolonged absence of ASM activity affects cellular processes, we think other experiments investigating the role of ASM-dependent invasion for phagosomal escape are more reliable. Most importantly, bacteria that enter host cell early during infection (and thus, predominantly via the “rapid” ASM-dependent pathway) possess lower phagosomal escape rates than bacteria that entered host cells later during infection (Figure 5, D and E). This is confirmed by higher escapes rates upon blocking ASM-dependent invasion with Vacuolin-1 (Figure 4E) and three different ASM inhibitors (Figure 4C and D). We further demonstrate that sphingomyelin on the plasma membrane during invasion influences phagosomal escape, while sphingomyelin levels in the phagosomal membrane did not change phagosomal escape (Figure5 a and b). This is summarized in Figure 5F.

      Could an inducible KD system recapitulate (some of) the phenotype of inhibitor treatment? If S. aureus does not escape phagosome in macrophages, could it provide a system to potentially decouple the uptake and intracellular trafficking effects by ASM (or its inhibitor treatment) ?

      Knock-downs in our laboratory are based on the vector pLVTHM(6). Inducible knock-downs in the cells would require the introduction of an inducible Tet<sup>on</sup> system, which the cells currently do not harbor.

      However, it needs to be stated that for optimal gene knock-downs, the induction of this system has to be performed by doxycycline supplementation in the medium for 7 days thus leading to several days of growth of the cells, which will allow the cells to adapt their lipid metabolism thus reflecting a situation that we encounter for the K.O.s.

      ASM-dependent uptake of S. aureus in macrophages has been demonstrated before (7). However, the course of infection in macrophages differs from non-professional phagocytes (8). E.g. in macrophages, S. aureus replicates within phagosomes, whereas in non-professional phagocytes replicates in the host cytosol. Absence of ASM therefore may influence the intracellular infection of macrophages with S. aureus in a distinct manner.

      The role of ASM on cell surface remains unclear. The hypothesis proposed by the authors that the localized generation of Cer on the surface by released ASM leads to generation of Cer-enriched platforms could be plausible, but is not backed by data, technical challenges to visualize these platforms notwithstanding. These results do not rule out possible SM independent effects of ASM on the cell surface, if indeed the role of ASM is confirmed by controlled genetic depletion studies.

      We agree with the reviewer that we do not show generation of ceramide-enriched platforms (see also above). We thus already had changed Figure 6F in the revised manuscript to make clear that it remains elusive whether ceramide-enriched platforms are formed. We also had added a sentence to the discussion (line 615) to emphasize that the existence of these microdomains is still debated in lipid research.

      We think that the following observations support SM-dependent effects of ASM during S. aureus invasion:

      (i) Reduced invasion upon removing SM from the plasma membrane (Figure 2N, Supp. Figure 2M)

      (ii) Increased invasion in TPC1 and Syt7 K.O. (Figure 2, P) in presence of exogenously added SMase.

      However, we agree with the reviewer that we do not directly demonstrate ASM-mediated SM cleavage during S. aureus invasion. Hence, we had added a sentence to the discussion that mentions a possible SM-independent role of ASM for invasion (line 556) that reads:

      “Since it remains elusive to which extent ASM processes SM on the plasma membrane during S. aureus invasion, one may speculate that ASM could also have functions other than SM metabolization during host cell entry of the pathogen. However, we did not detect a direct interaction between S. aureus and ASM in an S. aureus-host interactome screen (9).”

      The reviewer acknowledges technical challenges in directly visualizing lysosomal Ca2+ using the methods outlined. Genetically encoded lysosomal Ca2+ sensor such as Gcamp3-ML1 might provide better ways to directly visualize this during inhibitor treatment, or S. aureus infection. 

      We again thank the reviewer for this suggestion. We already had included the following section in our discussion (then: line 593): “Since fluorescent calcium reporters allow to monitor this process microscopically, future experiments may visualize this process in more detail and contribute to our understanding of the underlying signaling. mechanisms.”

      References for the purpose of this response letter:

      (1) Rappaport, J., C. Garnacho, and S. Muro, Clathrin-mediated endocytosis is impaired in type AB Niemann-Pick disease model cells and can be restored by ICAM-1-mediated enzyme replacement. Mol Pharm, 2014. 11(8): p. 2887-95.

      (2) Rappaport, J., et al., Altered Clathrin-Independent Endocytosis in Type A Niemann-Pick Disease Cells and Rescue by ICAM-1-Targeted Enzyme Delivery. Mol Pharm, 2015. 12(5): p. 1366-76.

      (3) Hoffmann, C., et al., Caveolin limits membrane microdomain mobility and integrin-mediated uptake of fibronectin-binding pathogens. J Cell Sci, 2010. 123(Pt 24): p. 4280-91.

      (4) Tricou, L.-P., et al., Staphylococcus aureus can use an alternative pathway to be internalized by osteoblasts in absence of β1 integrins. Scientific Reports, 2024. 14(1): p. 28643.

      (5) Goldmann, O., et al., Alpha-hemolysin promotes internalization of Staphylococcus aureus into human lung epithelial cells via caveolin-1- and cholesterol-rich lipid rafts. Cell Mol Life Sci, 2024. 81(1): p. 435.

      (6) Wiznerowicz, M. and D. Trono, Conditional suppression of cellular genes: lentivirus vectormediated drug-inducible RNA interference. J Virol, 2003. 77(16): p. 8957-61.

      (7) Li, C., et al., Regulation of Staphylococcus aureus Infection of Macrophages by CD44, Reactive Oxygen Species, and Acid Sphingomyelinase. Antioxid Redox Signal, 2018. 28(10): p. 916-934.

      (8) Moldovan, A. and M.J. Fraunholz, In or out: Phagosomal escape of Staphylococcus aureus. Cell Microbiol, 2019. 21(3): p. e12997.

      (9) Rühling, M., et al., Identification of the Staphylococcus aureus endothelial cell surface interactome by proximity labeling. mBio, 2025. 0(0): p. e03654-24.

    1. eLife Assessment

      This important study shows that the Nora virus, a natural Drosophila pathogen that also persistently infects many laboratory fly stocks, infects intestinal stem cells (ISCs), leading to a shorter life span and increased sensitivity to intestinal infection with the bacterium Pseudomonas. The authors provide convincing data to support their conclusions. The paper provides new insights into virus-host interactions in the Drosophila gut and serves as a warning for scientists who use the fruit fly as a model to study gut physiology.

    2. Reviewer #1 (Public review):

      [Editors' note: The article has been improved and several points raised by the reviewers have now been addressed. The authors should ideally further improve the clarity of the figures and the description of the experimental methods. This is particularly important for an article discussing potential confounding factors.]

      Summary:

      This important article reveals that the Nora virus can colonize the intestinal cells of Drosophila melanogaster, where it persists with minimal immediate impact on its host. However, upon aging, infection, or exposure to toxicants, stem cell activation induces Nora virus proliferation, enabling it to colonize enterocytes. This colonization disrupts enterocyte function, leading to increased gut permeability and a significant reduction in lifespan. Results are convincing and hold significant import for the Drosophila community.

      Strengths:

      (1) Building on previous studies by Habayeb et al. (2009) and Hanson et al. (2023), this study highlights cryptic Nora virus infection as a crucial factor in aging and gut homeostasis in Drosophila melanogaster.

      (2) Consistent with the oral route of Nora virus transmission, the study demonstrates that the virus resides in intestinal stem cells, with its replication directly linked to stem cell proliferation. This process facilitates the colonization of enterocytes, ultimately disrupting intestinal function.

      (3) The study establishes a clear connection between stem cell proliferation and virus replication, suggesting that various factors - such as microbiota, aging, diet, and injury - can influence Nora virus dynamics and associated pathology.

      (4) The experimental design is robust, comparing infected flies with virus-cured controls to validate findings.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors report that Nora virus, a natural Drosophila pathogen that also persistently infects many laboratory fly stocks, infects intestinal stem cells (ISCs), leading to a shorter life span and increased sensitivity to intestinal infection with the Pseudomonas bacterium. Nora virus infection was associated with an increased proliferation of ISC and disrupted gut barrier function. Genetically, the authors show that increased ISC division in Nora virus and Pseudomonas coinfected flies is driven by signaling through the JAK-STAT pathway and apoptosis.

      Accordingly, blocking apoptosis and JAK-STAT signaling reduces viral load, suggesting that in this context the JAK-STAT pathway is proviral in contrast to other previous observations in systemically infected flies. This work adds to the findings of another recent paper showing that another persistent fruit fly virus, Drosophila A virus, also increases ISC proliferation and decreases gut barrier function. Intestinal viruses should therefore be considered confounders in studies of fly intestinal physiology.

      Strengths:

      Overall, the data are convincing and robust, starting with two wildtype fly stocks (Ore-R strain) that differ in their Nora virus infection status, followed by experiments in which cleared stocks are reinfected with a purified Nora virus stock preparation. The conclusions of the paper will be of interest to scientists working on insect physiology, virology, and immunology, but should also serve as a warning for scientists that use the fly as a model to study gut physiology.

    4. Reviewer #3 (Public review):

      Summary:

      Franchet et al. sought to characterize the impact of Nora virus on host lifespan and sensitivity to a variety of infectious or stressful treatments. Through careful and rigorous analyses, they provide evidence that the Nora virus greatly impacts fly survival to infection, overall lifespan, and intestinal integrity. The authors have been thorough and rigorous, and the experimental evidence including proper isolation of the virus and Koch's Postulate reinoculation of the organism is excellent. The additional work is valuable and to the gold standard of the field, characterizing the pathology of the gut, including data showing gut leakage, the presence of the virus in the intestinal stem cells, and the importance of stem cell proliferation for virus replication and spread using elegant genetic tools to block stem cell proliferation or enterocyte death.

      Strengths:

      The authors have been rigorous and careful. The initial finding is presented through the lens of two related strains differing in virus infection. From there, the authors characterized the virus and isolated a purified culture, which they used to reinoculate a cleared strain to demonstrate proper Koch's Postulate satisfaction. The authors have also probed various parameters in terms of dietary importance in relevant conditions for many experiments. The additional work to characterize the pathology of the gut is compelling, using genetic tools to block or allow intestinal stem cell proliferation and enterocyte death through JAK-STAT and JNK signalling alongside the tracing of virus presence using a Nora virus antibody. JAK-STAT and JNK are previously described as regulators of these processes, making these tools appropriate and convincing. It is also interesting to see good evidence that the virus itself is damaging, rather than simply permitting coinfection by gut microbes (which does happen).

    5. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) The study does not explore or discuss how oral ingestion of Nora virus leads to the colonization of stem cells, which are located basally in the gut. This mechanism should be discussed.

      We have added an additional paragraph (4th) in the Discussion dealing with this issue and are further discussing the consequences of RNAi potentially not being functional in progenitor cells in the paragraph on antiviral responses.

      (2) The authors fail to detect Dicer-GFP fusion protein expression in stem cells, a finding that could explain why the virus persists in these cells. Further investigation is needed to determine whether RNAi functions are effective in stem cells compared to enterocytes. For clarification, the authors could cross esg-Gal4 UAS-GFP and Myo-Gal4 UAS-GFP with UAS GFP-RNAi and/or express a Dicer-GFP construct under a stem cell-specific driver.

      Actually, it is well-known in the Drosophila literature on the intestinal epithelium that RNAi functions well in progenitor cells as the technique has been widely used to understand the control of stem cell division and differentiation in tens of articles. We provide here just a few examples: Jiang et al., Nat Commun (2025) https://doi.org/10.1038/s41467-024-55255-1; Zhai et al., PLoS Genetics (2017) https://doi.org/10.1371/journal.pgen.1006854; Wu et al., https://doi.org/10.1371/journal.pgen.1009649.

      (3) The presentation of experimental parameters (e.g., pathogen type, temperature, time points) should be improved in the results section and at the top of the figures to enhance clarity. Additionally, details regarding the mode of oral infection (continuous exposure vs. single feeding on a filter) should be specified. Given that fly stock flipping frequency influences microbiota load (as noted in Broderick et al.), this should be reported, especially for lifespan studies.

      P. aeruginosa oral infection was always by continuous exposure, as detailed in the Mat.& Meth. section. Nora infection was done by exposure to the viral solution for 24h, as detailed in Mat. & Meth. The flipping frequency had also been reported in that section.

      (4) To confirm that enterocyte colonization requires stem cell proliferation and differentiation, the authors should analyze Nora virus localization in JAK-STAT-deficient flies infected with bacteria or toxicants. This would help determine whether the virus can infect enterocytes in the absence of enterocyte differentiation, but stimulation of stem cells.

      We now provide these data (pictures and quantification) in Fig.7 G-H and discuss them in the main text.

      (5) The study does not discuss the spatial distribution of Nora virus infection along the gut. Specifically, it remains unclear whether viral colonization is higher in gut regions R2 and R3, which contain proliferative stem cells. Addressing this could provide valuable insights into the virus's infection dynamics.

      We have now specified that Nora virus was detected only in the posterior midgut; we are now also providing a schematic illustration in Fig. S5J.

      Recommendations for the authors:

      Major Suggestion

      See weaknesses section for key areas requiring improvement.

      Minor Suggestions

      (1) Line 79: Mention Nox in the text. Key references on Nox include Jones (2013), Iatsenko (2018), and Patel (2016).

      Done.

      (2) Line 92: The long list of publications is unnecessary and can be shortened.

      We are not sure that many investigators are aware of the scope of our studies on host-pathogen relationships and this is the adequate place for a reminder.

      (3) Line 196: Cite Choi et al. (Aging Cell, 2008; 7:318-334. doi: 10.1111/j.1474- 9726.2008.00380.x) for the initial work on gut dysplasia during aging. However, note that dysbiosis in aging is demonstrated in Buchon et al. (2009, Genes and Development) and other studies.

      Done.

      (4) Line 265: It would be interesting to clarify whether the shortened lifespan of Norainfected flies after a clean injury is dependent on the microbiota.

      The shortened life span of Nora-infected flies is not due to the injury as demonstrated in Fig. S4F. Hence, the shortened lifespan is differentially affected by the microbiota according to nutrition conditions as documented in Fig. 3D-E.

      (5) Line 285: Clarify what is meant by "polyubiquitin promoter"-do the authors mean a ubiquitous Gal4 driver? Specify the Gal4 lines used in the result section.

      Done. The construct is a direct fusion of the ubiquitin p63E promoter to the Dicer-fluorescent protein sequences as described in Girardi et al., Sci Rep, 2015.

      (6) Line 347: Indicate the references aligning with the most recent studies on this topic.

      Done.

      (7) Line 373 and elsewhere: Mention studies that have shown the microbiota influence on lifespan, in relation to dietary richness.

      Done.

      (8) Line 588: Provide details on the method used for hemolymph collection.

      Done.

      (9) Line 964: Clarify the phrase "as previously shown"-where in this paper was it demonstrated?

      The legends have been rewritten and the phrase has been deleted.

      (10) Line 987: In "survival of non-infested with PA14," explicitly mention Nora to distinguish between different infections.

      Done.

      Figures & Experimental Details

      (11) Figures: Improve figure legends or add information at the top of figures, specifying:

      Number of flies used to monitor Nora virus titer.

      Temperature conditions. o Age of flies used in experiments.

      Done.

      (12) Figure 2E: The lifespan of Nora-negative flies appears very short. Was this lifespan assay conducted at 29{degree sign}C? What was the fly stock flipping rate?

      Correct, it was 29°C. As described in the Material and Methods section, the flies were flipped every two (29°C) to four days (25°C).

      (13) Figure 4C: Improve labeling on the plate for better clarity.

      Done.

      (14) Figure 6C: The figure legend on the right is difficult to interpret. Clarify what "+" indicates and explicitly write out the genotype. Is NP identical to NPG4G80?

      Done. NP is the NP1 driver. We usually use it in a version that also includes a Gal80<sup>ts</sup> transgene to express the gene of interest only at the adult stage.

      (15) Dissection Details: Clearly state which part of the gut was dissected-midgut, entire gut, {plus minus} Malpighian tubules. This should be specified in the results section.

      Done (no Malpighian tubules nor crop) for RTqPCR analyses.

      (16) Clean Injury: Provide more details in the results section regarding the injury site and needle size.

      Done.

      (17) Use "Abx" instead of "AntiB," as the former is more commonly recognized.

      Done.

      Reviewer #2 (Public review):

      The title does not seem to be fully supported by the data. While the authors convincingly show the increased sensitivity to Pseudomonas infection, effects on another tested bacterium, Serratia marcescens, were not significantly different between Nora-virus-infected and noninfected flies. Thus, effects of 'intestinal infection' seem to be too broad a claim.

      We agree with the reviewer and have accordingly modified the title, which now explicitly refers to P. aeruginosa.

      Also, whether the Nora virus increases sensitivity to oxidative stress is not so clear to me: the figure that supports this claim is the survival assay of Figure 5F. However, the difference in survival between control and paraquat-treated Nora (-) flies seems to be in the same order as between control and paraquat-treated Nora (+) flies. Rather, cause and effect seem to be the reverse: paraquat increases ISC proliferation, higher viral loads, and consequently shorter survival. I suggest rephrasing the title and conclusions accordingly.

      While we usually just directly compare Nora (+) vs. Nora (-) flies with the same conditions, we note that the difference of survival between control and paraquat-treated Nora (-) flies is of about 9 days, based on LT50 values whereas it is of 8 days for Nora(+) flies. This difference is of about two days when comparing Nora (+) to Nora (-) flies exposed to paraquat. Thus, Nora does contribute to an increased sensitivity to oxidative stress likely by the process highlighted by the reviewer and also by its own detrimental action on the homeostasis of the intestinal epithelium and associated disruption of its barrier function.

      Quantification of immunofluorescence microscopy is missing, rendering the images somewhat anecdotal. Quantification should be provided. It will then also be of interest to quantify the number of Nora (+) cells, and the Nora virus levels per infected cell (e.g. Figure 5H). Also, the claim that the Nora virus initially infects ISC and later (upon stress) infects enterocytes requires quantification.

      Missing quantifications of pictures have been added: Figs. S5E and 7H. We are not sure we understand the reviewer comment on “Nora virus levels per infected cell”: the signal we are seeing may correspond to aggregates of the virus and would be impossible to quantify reliably, e.g., in the right-most panel of Fig. 5H. Fig. 5I clearly shows that no Nora is detected in enterocytes of young 5-day-old flies in the absence of infectious or xenobiotic challenge.

      Genetic support for the role of the JAK-STAT pathway in driving ISC proliferation and supporting Nora virus replication is convincing. It would also be of interest to analyze other pathways implicated in ISC proliferation (e.g. JNK, EGFR), especially given the observations of Nigg et al, showing an involvement of STING/NF-kB and EGFR pathway in driving intestinal phenotypes of Drosophila A virus-infected flies (doi: 10.1016/j.cub.2024.05.009).

      We agree with the reviewer that these would be interesting experiments to perform, especially in the light of one hypothesis that antiviral defenses may prevent the initial infection of enterocytes as discussed at length in our updated discussion on host antiviral defenses. However, we are currently unable to perform additional experiments and leave it to other interested investigators studying antiviral innate immunity to address these questions. In this work, we used the interference with the JAK-STAT pathway as a second tool to block the division of ISCs.

      Figure 5E: An intriguing observation is that GFP:Dicer2 seems to be unstable in Nora virusinfected cells. Here, GFP control driven by the same driver line would be required to confidently conclude that this is due to an effect on Dicer-2 specifically.

      Actually, this experiment was not performed using the Gal4-UAS system but a direct fusion. We do know that GFP is stable when expressed in enterocytes, e.g., Lee et al., Cell Host&Microbe (2016) DOI: 10.1016/j.chom.2016.10.010.

      Legends are mostly conclusive, and essential information about the experimental setup is missing in the captions of multiple figures, making the interpretation of the data difficult. See my private recommendations for suggestions to improve the data presentation.

      Done.

      Recommendations for the authors:

      Suggestions for the presentation of the data:

      (1) I found the names Ore-R(SC) and Ore-R(SM) for noninfected vs infected Ore-R flies not very intuitive. I suggest renaming them into something that makes the infection status clear.

      These notations refer to two distinct sub-strains that may reflect different origins with some likely genetic drift accounting for the distinct properties of the two sub-strains. As the ORE-R (SM) have different infection status: infested, cleaned, re-infected, we fear that this would not clarify the matter. Of note, ORE-R(SC) are refractory to Nora virus infection (Fig. S1I).

      (2) Please define the number of flies analyzed for survival assays in the legends.

      Done.

      (3) The authors provide conclusions in most of the figure legends, without providing an explanation of the experiment that was done. Conclusions should be used sparingly, if at all, in legends. Also, relevant information is often missing in the legends (time points after infection, Figure 2E food source, etc.). I suggest the authors carefully double-check their legends and rephrase the conclusive legends with descriptive ones.

      Done. The figure legends have been rewritten.

      (4) Several of the legends indicate that 'data represent the mean of biological triplicates' however some panels do not represent triplicates (e.g. Figure 1C-E). Please correct.

      Done.

      (5) Legends: which multiple comparison test was used for ANOVA?

      Done. Tukey’s post-hoc test was used for direct comparisons.

      (6) Line 888: black arrows are not shown in the figure.

      Corrected.

      (7) Figure 1F: legend on the figure seems incorrect (all are labeled Nora (+)); likewise for Figure 2C (all labeled Nora (-)).

      Corrected.

      (8) Materials and methods: please describe how the Nora virus antibody was raised (and specify on line 271 what viral protein is recognized).

      Done. As the whole virus was used for immunization, we cannot state which specific viral proteins are detected by the antibody.

      (9) Please define what is presented in the box plots (mean, range, whiskers, individual data points).

      Done.

      (10) Figure 4 and associated text (line 221): a brief explanation of the Smurf assay would be useful.

      Done.

      (11) Figure 4C: I did not find the picture of the agar plate informative, as similar information is conveyed in Figure 4D. Also, the labelling cannot be clearly read.

      Figure 4D provides a quantification of panel C. The readability has been improved.

      (12) Figure 4C: It is suggested that Nora-positive, smurf-negative flies were analyzed, but from Figure 4B it seems that these do not exist. Please explain.

      The data in Fig. 4B do not represent absolute numbers but percentages. Thus, there were at most 50% of SMURF-positive flies at the time of the assay, the rest being Smurf-negative yet Nora-positive.

      (13) The abbreviations PA14 and Db11 are used in several figures. I would suggest defining the abbreviation in the legend to facilitate interpretation.

      Done.

      (14) Figure 5A/5G: the Nora virus RNA levels in this figure are dramatically lower than the levels in other figure panels. Please check/correct.

      Done. The reviewer is indeed correct: we have forgotten to write that for these two panels, the loads are relative and not absolute as is the case in other panels. 5A: the load in whole flies was taken to be 1; 5G: untreated Nora-positive flies were taken to be 1.

      (15) Figure 6A: total number of AporTag positive cells are reported. Were the same number of total cells analyzed? Please define.

      We have not counted all of the cells in each midgut but provide the number of ApopTag positive cells per midgut. We thus make the assumption that the overall number of midgut cells is not varying much from one midgut to the other. Visual inspection of DAPI-stained nuclei did not reveal any obvious change in the density of enterocyte nuclei as illustrated in Fig. S6 (we guess that everyone in the field is making the same assumption when counting mitotic ISCs with PHH3 staining).

      (16) Figure 6C: I find the shades of blue difficult to distinguish and suggest to us other colors.

      Done.

      (17) There seems to be a large mismatch between the percentage of Nora virus-positive cells in Figures 5C, 6H and the images of Figures 5G and 5H. Why?

      We think there might be a mistake with the Figure numbers cited by the referee. We guess the point the referee was trying to raise is the difference of perceived Nora virus burden between Fig. 5H and Fig. 6G, a quite valid point. For Fig. 5H, we had measured the Nora-virus load by RTqPCR (Fig. 5G, relative burden) but had not quantified the images. This is now done and shown in Fig. 5I. In Fig. 5H, young flies were used and hence there was no Nora virus detected in ECs, as now quantified in Fig. 5I. For Fig. 6G, we had to use 30-day old intestines to be able to observe Nora virus in the enterocytes of the controls. We have now included this important point in the main text and in the Figure legends.

      (18) The Title of the legend in Figure 7 is not supported by the data as 'spread through the intestine' has not been analyzed. Please adjust.

      Done.

      (19) All figures in which ANOVA is used: I assume that anything not labeled with an asterisk was found to be non-significant? If so, this should be indicated in the manuscript.

      Actually, we have not highlighted obvious differences to maintain clarity (e.g., Fig. 1E between uncured Ore-R(SM) and cured Ore-R(SC). We thus have underlined the biologically relevant differences in the panels. The interested readr can refer to the primary data that are accessible on a data repository.

      (20) Figure 7C: the authors may want to contrast their finding that Upd3 was not upregulated in Nora virus-infected flies (in the absence of PA14) with the findings of Kuyateh et al, who did report upregulation of Upd3 (https://doi.org/10.3390/v15091849).

      We thank the reviewer for pointing out this study we were unaware of. We would like to point out that this article is difficult to follow as it is not 100% clear in which of the analyzed studies the induction of upd3 was observed and which exact experimental conditions were followed, e.g., young or old flies, whole flies or gut… We have looked in more detail at ref. 133 of this article, which refers to an unpublished study from the Hultmark laboratory that is however available online: (https://www.diva-portal.org/smash/record.jsf?aq2=%5B%5B%5D%5D&c=15&af=%5B%5D&searchType=SIMPLE&sortOrder2=title_sort_asc&query=Nora+virus&language=en&pid=diva2%3A1045375&aq=%5B%5B%5D%5D&sf=all&aqe=%5B%5D&sortOrder=author_sort_asc&onlyFullText=false&noOfRows=50&dswid=4587).

      In that study, flies were “infected” with Nora virus by expressing a cDNA clone injected into embryos. The problem is that for some unknown reasons the authors used Relish mutant flies. It is thus difficult to conclude as these flies are defective for the IMD and Sting pathways whereas our flies are wild-type. We were also interested to read that genes involved in midgut stem cells differentiation were expressed in flies harboring Nora virus, which is in keeping with the data of the present study. However, it is difficult to discuss this when we know little on the background of the studies analyzed by Kuyateh et al, in as much as our Discussion is already rather long.

      (21) Figure 7E: are the differences between control and Dome/Stat knockdown flies significantly different for Nora (+) flies (in the absence of Pseudomonas)? This is not clear from the data presentation.

      The answer to the question is positive: the JAK-STAT pathway also contributes to the maintenance of intestinal epithelium homeostasis in the absence of bacterial infection, that is presumably basal conditions. We have modified Fig. 7E to include more comparisons.

      Textual suggestions:

      (22) Line 25 strives > thrives

      Done.

      (23) Lines 150- 152, etc are not very informative. Also, some of the viruses analyzed are not "known contaminating viruses", but viruses used experimentally (VSV, IIV6, CrPV). I suggest adjusting the phrasing.

      Done.

      (24) Line 862: weaker fitness > lower fitness.

      Done.

      (25) Virology terms:

      (a) I suggest not using the term titer for qPCR readouts (which do not involve titration). Viral RNA level or viral RNA load would be more appropriate.

      Done.

      (b) I would propose rephrasing the Y-axis label of Figure 1C, E to Nora RNA load (same for other figures showing viral RNA).

      Done.

      (c) Infested: rather use the more accurate term infected.

      Done.

      (d) Contamination: rather use the term infection.

      We have modified some but not all occurrences of this word. We believe that it is important to use the word contamination when referring to enterocytes: the enterocytes are not infected by Nora; rather, differentiated infected ISCs become contaminated enterocytes. Infection refers to an active process whereas contamination refers to a state.

      (e) Proliferation: rather use the term replication.

      According to our US-English dictionary, proliferation refers to the “rapid reproduction of a cell, part, or organism”, which is the meaning we intend. Replication does not have this notion of speed of reproduction.

      (f) Drosophila should not be italicized in Drosophila A virus, following the ICTV convention that a "virus name should never be italicized, even when it includes the name of a host species or genus" https://ictv.global/faq/names.

      Done.

      (26) Line 873-975: please rephrase the legend of Figure 1F as the current one is not informative.

      Done.

      (27) Line 934: I suggest moving the justification of the time point chosen "= LT50 on the survival test in 935 Fig. 2E" to the main text.

      Done.

      (28) Line 936: with drop > with a drop.

      No longer relevant.

      (29) Line 940-941: the grammar of the sentence does not seem to be correct as it suggests that SDS induces Diptericin expression.

      No longer relevant.

      (30) Line 952-953; line 980: please correct mismatch singular/plural (antibody have, inhibition do).

      Done.

      (31) Line 422: "It will be interesting to determine whether the absence of a Dcr2 fluorescent proteins fusions in progenitor cells that we report in this study rules out a role for the RNAi pathway in intestinal host defense against the Nora virus". It would be of interest to discuss this finding in the context that virus-derived Nora virus siRNAs can be easily detected and that the viruses encode an RNAi antagonist (doi: 10.1371/journal.ppat.1002872).

      Done. We have updated the Discussion and propose a model whereby RNAi would prevent primary infection of enterocytes and then virus replication in proliferating progenitor cells would allow the virus to effectively inhibit the RNAi machinery when the infected progenitor cells become enterocytes.

      (32) Line 159: Nora virus phenotypes differ between laboratories. I would be interested to read the authors' speculations on why this would be the case.

      Our work shows that the effects of Nora virus depend significantly on several parameters we have identified: nutrition quality, age, exposure to abiotic or biotic stresses, and fly genotypes with the existence of Nora-refractory strains. These parameters as well as potential differences between laboratories are actually discussed in the second paragraph of the Discussion.

      (32) Line 175: capitalization of ORE-R vs Ore-R at other places in the manuscript.

      Done.

      (33) Line 185-194: PA14 and Pseudomonas are used interchangeably. Perhaps it is clearer to stick to a single term for consistency.

      PA14 is one clinical strain used to study P. aeruginosa. There are many others such as PAO1, which is also widely used. We have decided to write P. aeruginosa PA14 the first time we are using it in each figure legend, and use only PA14 afterwards.

      Reviewer #3 (Public review):

      The claim that Dcr2 is not abundant in ISCs because the protein is not stable is logically consistent and reasonable. Perhaps I missed this, but the authors could additionally knock down or use somatic CRISPR to delete Dcr2 in ISCs to test whether a lack of Dcr2 underlies sensitivity. In this experiment, the expectation would be that depleting Dcr2 in ISCs genetically would make little difference to susceptibility overall compared to controls. This is not an essential experiment request.

      We agree with the reviewer that these would be interesting experiments to perform. However, we are currently unable to perform additional experiments and leave it to other interested investigators studying antiviral innate immunity to address these questions dealing with the specific steps of RNA interference that may be missing in progenitor cells.

      Recommendations for the authors:

      (1) Line 206-207 and 214-216: the order of ideas presented here is unintuitive. In Lines 206207, it is said that ABX treatment had no effect, which is counterintuitive to the nature of infection susceptibility. But this is resolved in Lines 214-216 when the reader realizes that S3G is fed on a sucrose solution, and so likely microbiota-depleted. Perhaps more could be said to clarify this in the main text, and/or swap the order of these observations so a casual reader is not confused about the nature and extent of the microbiota contributing to the sensitivity of Nora-infected flies.

      As suggested by the reviewer, we have clarified the text with respect to the food source and microbiota load; we emphasize that the microbiota plays a protective role in Nora-negative flies fed on sucrose solution even though the microbiota load is very low under these conditions. Of note, the microbiota is not depleted in sucrose-fed Nora-positive flies: we suspect that delaminating enterocytes may actually provide directly or more likely indirectly (peritrophic matrix) nutrients for the microbiota.

      (2) Line 262-265: the text may be a bit exaggerated given only 3 pathogens tested, one of which was a fungal natural infection breaching the cuticle and largely bypassing the gut. This could be re-phrased.

      The important point is that uninfected Nora-positive flies die with a LT50 of about 10 days even when noninfected; it has nothing to do with the number of pathogens tested. Thus, any infection that causes death with kinetics in this range may be misinterpreted in the absence of a relevant uninjured or clean injury control.

      (3) Line 379-382: I don't know if citing Schissel et al. is needed here. This paper's methods and data are highly problematic, as mentioned by the authors. This is not a highly cited paper, nor does it add value to the present discussion to cite it only to discredit it. Perhaps this can be left out and the field can move on quietly - naturally, this choice is the present authors', and this is just my view.

      We have actually cited this article at two other places and thus had not cited it “only to discredit it”. We have nevertheless removed the lines as suggested by the reviewer.

      (4) Line 404: perhaps clarify "Interestingly, mammalian stem cells..."

      Done.

      (5) Line 455: my understanding of digital PCR is that it is highly useful for detecting rare variants but not particularly better than qPCR for estimating loads/titres? This is not to say dPCR is worse, just that dPCR and primer-specific RT + qPCR are comparable if load/titre is desired. For instance, Qiagen actually recommends qPCR over dPCR specifically (and pretty much exclusively) for gene expression: https://www.qiagen.com/us/applications/digitalpcr/beginners/dpcr-vs-qpcr.

      (6) Perhaps Line 455 could drop the advocacy for digital PCR? I agree using dissected guts, or seemingly aged individuals per Figure 3B(?), is a valuable thing to point out. Maybe the aged individuals point could be added here? I guess the idea behind dissected guts is to have samples enriched in Nora virus.

      Cleaning Nora-positive strains is really difficult and we suspect that as long as there is one viral particle left, it may be sufficient to re-ignite the contamination of the strain. Our own experience with digital PCR on the expression of AMP-like molecules in the head of flies is that we found the approach to be more sensitive than classical RTqPCR (Xu et al., EMBO Rep, 2023).

    1. eLife Assessment

      This valuable study identifies and characterizes probe binding errors in a widely used commercial platform for spatial transcriptomics, discovering that at least 21 out of 280 genes in a human breast cancer panel are not accurately detected. The authors provide convincing evidence for their findings through validation against multiple independent sequencing technologies and reference datasets, and they introduce a computational tool to help predict potential off-target probe binding. Given the broad adoption of this platform in biomedical research, this work provides an essential quality control resource that will improve data interpretation across numerous studies.

    2. Reviewer #2 (Public review):

      This paper describes an analysis of a commercially available panel for a spatial transcriptomic approach and introduces a computational tool to predict potential off-target binding sites for the type of probe used in the aforementioned panel. The performance of the prediction tool was validated by examining a dataset that profiled the same cancer tissue with multiple modalities. Finally, a detailed analysis of the potential pitfalls in a published study communicated by the company that commercialized the spatial transcriptomic platform in question is provided, along with best practice guidelines for future studies to follow.

      Strengths:

      - The manuscript is clearly written and easy to follow.<br /> - The authors provide clean, organized, and well-documented code in the associated GitHub repository.

      Comments on revision:

      My impressions from the first round of review haven't really changed. I don't think the software tool is well developed, and failing to incorporate thermodynamics or consider the impact of alignment settings is a major weakness.

      I do think the topical area is relevant. The inclusion of the Xenium /Hubmap data modestly strengthens the manuscript relative to the original submission.

    3. Reviewer #3 (Public review):

      Summary:

      The authors present a new computational method (OPT) for predicting off-target probe binding in the commercial 10X Xenium spatial transcriptomics platform. They identified 28 genes in the 10x xenium human breast cancer gene panel (280 genes) that are not accurately detected at the single-molecule level. They validated the predicted off-target binding using reference data from single-cell RNA-seq and 3'-sequencing-based Visium RNA-seq. This work provides a practical resource and will serve as a valuable reference for future data interpretation.

      Strengths:

      (1) Provides a toolbox for the community to identify off-target probes.

      (2) Validates the predictions using single-cell RNA-seq and sequencing-based Visium RNA-seq datasets.

      Comments on revision:

      The authors state that OPT is a new software tool and have posted example code on GitHub. However, the Jupyter notebook does not display any figures or workflows that would allow the process to be replicated. Please provide documentation and code that can reproduce the results/figures presented in the paper.

    4. Author Response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      We thank the editors and the reviewers for their constructive feedback in helping us strengthen this manuscript.

      During the revision process, new information was shared with us by the 10x Genomics team regarding the Xenium probe sequences evaluated in our original paper. Briefly, the Xenium probe sequences we evaluated represented an earlier iteration of the probes used to generate the data in Janesick et al. Further, we were made aware that the probe sequences used in Janesick et al. represented an earlier iteration of the commercially available Xenium v1 Human Breast Gene Expression Panel. We now elaborate further in a new Supplementary Note. We have therefore updated the paper throughout to reflect this new understanding, though we emphasize that our conclusions do not change. Rather, this newfound understanding provides stronger evidence of off-target probe binding with imperfect sequence matching, which we support with new supplementary figures.

      (1) Limited evaluation of tissues and gene panels

      “The results were only tested with one tissue (human breast). However, this is not a major weakness, as one can easily extrapolate that this should be the case for any other tissue.”

      “Does not apply the OPT method to the most widely used Xenium gene panels (e.g., pan-Human, pan-Mouse panels with ~5,000 genes each).”

      “The authors claim that OPT is a generalizable method for identifying off-target probes. To support this claim, they should provide similar predictions for the Xenium Pan-Human or Pan-Mouse gene panels, which are more widely used than the breast cancer panel.”

      “While I understand that conducting new experimental studies is likely beyond the authors' intended scope of the manuscript, the narrow reliance on Janesick et al. for all of the validation makes it difficult to assess the broad usability of OPT. In the absence of designing and then validating novel padlock probe designs with OPT, are there other publicly available datasets that authors could perform secondary analysis on using OPT?”

      Our primary focus on breast cancer was driven by data availability rather than tissue specificity. For this probe panel, matched Xenium, Visium, and scRNA-seq datasets are publicly available, enabling direct cross-platform comparisons of gene expression and allowing us to evaluate the impact of off-target probe binding in Xenium.

      OPT is tissue-agnostic and can be applied to any probe panel regardless of tissue type. To demonstrate this generalizability, we have now applied OPT on all publicly available 10x Genomics probe sets beyond the breast panel, including the Xenium pan-Human and pan-Mouse gene panels. The complete results of these analyses have been generated and are provided as a compressed zip file accompanying the revised manuscript.

      Beyond pre-designed panels, in this revision, we have now also applied OPT to custom Xenium gene panels from the Human BioMolecular Atlas Program (HUBMAP) and further demonstrate integration of HUBMAP RNA-seq data to evaluate the impact of potential predicted off-targets in a new section “Bulk RNA-seq reference atlases suggest off-target binding can variably impact results in Xenium custom probe panels.”

      Overall, in these newly evaluated panels, we identify many cases of off-target probe binding with non-negligible expression of off-target genes in the target tissue, underscoring that our findings are not specific to human breast tissue. Therefore, in the revision, we have broadened the title to “Evidence of off-target probe binding affecting 10x Genomics Xenium Gene Panels compromise accuracy of spatial transcriptomic profiling”

      (2) Limited quantifications

      “Lacks clarity on how the confidence level of off-target predictions is calculated.”

      “How can the confidence level of these off-target predictions be quantitatively assessed? Please provide benchmarks or validation metrics if available.”

      We thank the reviewer for raising this important point. To strengthen our claim that predicted off-targets can contribute to observed Xenium expression patterns, we incorporated a quantitative assessment in addition to the qualitative comparisons presented previously. Specifically, we leveraged Visium and scRNA-seq data to compare spot- and cluster-level expression of target genes alone versus expression aggregated with their predicted off-target genes. Across all examples shown, inclusion of predicted off-targets consistently resulted in stronger agreement with the Xenium results, as reflected by decreased RMSE and increased Pearson correlation relative to using the target gene alone.

      We emphasize, however, that OPT does not assign a formal confidence score to off-target predictions based on sequencing data alone. Importantly, identification of a potential off-target by OPT does not imply that it will necessarily affect Xenium results. As we’ve noted, if the off-target gene is not expressed, then it will not affect the observed gene expression magnitudes of the target gene. To help users assess whether predicted off-target genes will affect observed gene expression magnitudes of the target gene for a tissue of interest, we now provide a complementary analysis, including heat-map visualizations comparing the expression of target genes and their predicted off-targets in matched bulk RNA-seq or scRNA-seq datasets from the same tissue (Supplementary Figures 9, 10, 11). We hope this evaluation pipeline will clarify to researchers they can evaluate whether predicted off-targets will appreciably affect results in their tissue of interest.

      (3) Under-developed and non-essential software

      “The manuscript section on the software tool feels underdeveloped.”

      “Once the 10X Genomics corrects their gene panels according to this finding, the tool (OPT) will not be useful for most people. Still, it can be used by those who want to design de novo probes from scratch.”

      “Since the authors claim that OPT is intended for community use, the paper should provide a clear, step-by-step user guide, such as Jupyter tutorial, ideally as supplementary material.”

      We agree with the reviewers that the description of the software tool itself is relatively concise. This is intentional, as the primary goal of this manuscript is not to introduce a standalone software framework, but rather to use the tool as a means to characterize and quantify off-target probe binding and its potential downstream impact on spatial gene expression analyses. Accordingly, our emphasis is placed on the biological and analytical insights enabled by this approach, rather than on extensive software tool details. To support potential users, we have now included additional software documented with an example Python notebook demonstrating how it can be applied to any probe panels in the GitHub repository: https://github.com/JEFworks-Lab/off-target-probe-tracker/blob/main/example.ipynb

      Likewise, the primary goal of this manuscript is not to suggest that a specific vendor’s probe panels are flawed, but rather to demonstrate that off-target probe binding is a general and underappreciated phenomenon that can occur in some probe-based spatial transcriptomics platforms to meaningfully impact downstream analyses and biological interpretation.

      OPT was developed as a framework to identify potential off-target probe interactions based on sequence homology. In practice, OPT can serve as a post hoc tool that allows researchers to assess whether predicted off-target interactions may exist in a given panel and to account for these possibilities when interpreting spatial expression patterns, even when panels have been developed by the many probe designing methods now highlighted in the revised manuscript. Given the complexity of probe design and hybridization behavior, we believe that explicitly identifying and reporting potential off-targets remains valuable for downstream data interpretation, cross platform comparisons, and reproducibility. Thus, OPT is intended to complement existing probe design strategies and vendor efforts, rather than replace them, by providing researchers with additional context to interpret their data more accurately.

      In our revision, we have therefore elaborated on this in the discussion, reiterated here for convenience: “Although we focus here on the 10x Genomics Xenium technology, we do not exclude the possibility that off-target binding may similarly affect other probe-based gene detection approaches from other commercial vendors. Any technology that relies on hybridization-based detection is inherently susceptible to off-target probe binding when sequence similarity exists. Further, hybridization-based detection often inherently involves a trade-off between sensitivity and specificity. Given these inherent technological limitations, we therefore emphasize the importance of transparency through sharing probe sequences. However, many companies do not release the probe sequences used in their assays, limiting the consumer’s ability to fully interpret their results as well as the community’s ability to effectively characterize and benchmark performance variation across platforms. Therefore, we strongly recommend that companies publish probe sequences for pre-designed panels and likewise that researchers using these technologies should obtain and publish probe sequences used in their studies to support transparent and reproducible science. “

      Recommendations for the authors:

      “The paper only describes evidence of the off-target effect based on perfect sequence homology, although the tool (OPT) provides an option to find additional "potential" off-targets that allow mismatches. It would be very nice if the authors could additionally provide at least one example of off-target binding with at least one mismatch.”

      We thank the reviewer for the opportunity to clarify this point. In addition to analyses based on perfect sequence homology, we examined predicted off-target binding when allowing mismatches at the terminal ends of probe sequences. This analysis is presented in the Results section titled “OPT results when allowing mismatches at the terminal ends of the probe sequences identifies additional off-target candidates.”

      In this revision, we now allowed a 10bp padding on either end of the 40bp probe sequence, permitting imperfect sequence matching at the terminal regions. Under these conditions, OPT identified additional off-target candidates, including TUBB2B and ACTG2, which we highlight as representative examples (Supplementary 7,8). We further demonstrate how these predicted off-target interactions impact gene expression concordance by comparing Xenium measurements with both Visium and scRNA-seq data, showing measurable changes in cross-platform agreement. Together, these results illustrate that allowing mismatches reveals biologically relevant off-target effects beyond those captured by perfect sequence homology alone.

      “Clarifications and updates for Figure 2A-B

      Xenium offers a resolution of up to 200 nanometers with continuous readout, without pixel gaps. However, the figures shown in Figure 2A-B appear pixelated - why is this the case? Could the authors clarify this discrepancy and, if possible, provide the raw feature intensity data for Xenium in the supplementary materials?

      Additionally, there appear to be no visible gaps in the Visium graphs. Could the authors update the figure panels to represent the true spot locations for Visium, to more accurately reflect the underlying data structure?”

      We thank the reviewer for the opportunity to clarify these points. The goal of Figure 2A-B is to facilitate a direct visual comparison of gene expression patterns between the Visium and Xenium platforms. To enable this comparison, we aggregated the single-cell Xenium data into spatial patches matching the effective resolution of Visium spots (55x55µm). Similarly, Visium spots were rendered as patches to produce a more continuous visual representation. As a result of this aggregation and visualization choice, the Xenium expression plots appear pixelated despite Xenium’s native subcellular resolution (up to ~200 nm with continuous readout). We have clarified this processing and visualization step in the Methods to avoid confusion.

      With respect to the Visium expression plots, the lack of gaps is also a consequence of rendering each spot as a filled patch rather than plotting traditional Visium spots. This was done intentionally to maintain visual consistency with the aggregated Xenium data and to emphasize spatial concordance rather than the underlying sampling geometry. We have now explicitly stated this design choice to improve clarity.

      “I found the format of the manuscript to be at times confusing and perhaps a bit of an odd fit for a general interest journal. A significant portion of the manuscript is spent critiquing a specific publication, "High resolution mapping of the tumor microenvironment using integrated single-cell, spatial and in situ analysis" published by Janesick et al. (of 10x Genomics, Inc.) in Nature Communications in 2023. This content would seem more appropriate as a Comment submitted to Nature Communications, potentially to be accompanied by a response from the authors of Janesick et al. at 10x.”

      I would like to address this important point as the corresponding author who takes primary responsibility for the unconventional decision to submit this manuscript to eLife as opposed to as a commentary suggested by the reviewer.

      Consistent with the reviewer, I did initially consider submitting this as a Matters Arising to Nature Communications. However, after consultation with other senior colleagues and co-authors, I decided to forgo this route on the basis that the information provided in a Matters Arising must be kept confidential. I was concerned that this would lead to long, drawn-out private exchanges. As we note in the manuscript, the Xenium platform's widespread use and high cost imposed a certain urgency that I believed warranted open and rapid dissemination.

      Therefore, we submitted to eLife with the hope that eLife’s unique continuous post-publication public peer review process will enable the rapid dissemination of these important financially-sensitive insights while permitting constructive criticisms from both industry and academic expert reviewers to be openly considered by all readers.

    1. eLife Assessment

      This important study developed a novel paradigm combined with EEG recordings to examine the neural mechanisms underlying temporal integration in perception and its modulation by prior history (i.e., the serial dependence effect). The results provide solid evidence that two key EEG features, namely the individual alpha frequency and the aperiodic slope, jointly and independently shape perceptual integration and its reliance on prior information. While additional control analyses would further strengthen the main conclusions, the findings will be of broad interest to researchers studying perception, decision-making, inter-individual differences, and brain rhythms.

    2. Reviewer #1 (Public review):

      Summary

      Alpha oscillations have been previously proposed to shape the temporal resolution of visual perception, with a higher alpha frequency providing a finer resolution. This study goes beyond by investigating three additional processes that could influence joint visual temporal perception: the aperiodic neural signal, the integration of recent perceptual experience (serial dependence), and subjective confidence. To address their question, they developed a novel task where two Gabor patches oriented in opposite directions are presented in a continuous stream. This allows for testing for robust perceptual integration while avoiding bias from suboptimal perception. Behavioral analyses revealed an association between confidence and individual temporal integration thresholds, and demonstrated that serial dependence biases visual temporal integration as well as its associated confidence. EEG analyses first replicated the previous findings showing that faster IAF provides higher temporal resolution. Interestingly, the aperiodic neural signal was associated with both perceptual and temporal precision. Finally, the authors show that serial dependence is reduced in individuals with faster IAF and enhanced in participants exhibiting a stronger aperiodic component. Together, these findings highlighted that visual temporal integration arises from an interplay between alpha oscillations, the aperiodic signal, serial dependance and subjective confidence.

      Strengths:

      (1) The novel task proposed in the study represents a substantial improvement over the two-flash fusion task previously used to investigate the role of alpha oscillations in visual temporal perception.

      (2) Serial dependence has attracted increasing interest in vision research in recent years. Testing whether recent visual inputs also influence temporal resolution is, therefore, a valuable and timely approach. In this regard, the authors provide evidence for a serial dependence effect.

      (3) Although the functional role of brain oscillations has been extensively studied over the past decade, the role of the aperiodic neural signal has long been overlooked. This study revealed that the aperiodic component plays a role in perceptual precision and temporal resolution, thus providing evidence for an important role of the aperiodic neural signal.

      (4) The mediation analysis demonstrates that the aperiodic and oscillatory neural components act independently, providing important insights for future studies aimed at understanding their respective role.

      Weaknesses

      It would have been valuable to record EEG continuously during the experiment to investigate how spontaneous alpha oscillations and aperiodic signal dynamically influence the temporal integration, serial dependance and confidence on a trial-by-trial basis.

      Appraisal

      The authors employed a novel and thoughtfully designed task, combined with appropriate analyses, to address their research question. Their results are convincing and provide strong support for their conclusions.

      Impact

      This study provides valuable insights into the role of the aperiodic neural signal in visual temporal integration. This is important because its contribution has likely been underestimated, and future research will likely uncover increasing evidence of its impact across multiple cognitive functions.

      It was also very interesting to observe how alpha oscillations are associated with serial dependence and confidence, extending beyond their well-known role in visual temporal resolution. This opens intriguing avenues for future research on the functional role of alpha oscillations.

    3. Reviewer #2 (Public review):

      Summary:

      This paper examines resting-state electroencephalography (EEG), the electrophysiological underpinnings of the temporal integration window in perception, and its modulation by priors (serial dependence) as measured through the perceptual fusion point of two continuous alternating stimuli. The study also includes a measure of perceptual confidence. Separating periodic from aperiodic EEG activity, the results show that the faster the individual alpha-frequency at rest and the steeper the aperiodic slope (previously linked to higher sampling/ lower noise), the lower the perceptual fusion point (corresponding to narrower integration windows), with independent contributions of the period and aperiodic activity to the integration window. The data also reveal that the point of fusion depends on prior history, and that the strength of this effect depends on individual alpha frequency and aperiodic slope: the lower the individual alpha frequency and the aperiodic slope, the stronger the serial dependence, with the two contributions being again independent. Higher alpha frequency also led to higher confidence. The results are interpreted to suggest that speed of alpha oscillations and aperiodic slope of the power spectrum (presumably reflecting rate/fidelity of visual sampling and the level of background noise) jointly shape the perceptual measure under study: high rate/ fidelity and low noise promote temporal precision in integration, while lower rate/fidelity and higher noise lead to a higher reliance on prior history. It is concluded that it is the interaction between two EEG features that shapes temporal integration and hence perceptual fusion.

      Strengths:

      The strength lies in the use of a continuous visual stream of two alternating stimuli whose timing shapes fusion or separation of the two stimulus precepts, avoiding some of the pitfalls of previous fusion probes through discrete (not continuous) stimulus pairs (missed detection of one stimulus of the pair may be misinterpreted as fusion). The results seem robust (based on n=83 participants), the results are interesting, and the interpretations are sound.

      Weaknesses:

      The main weakness lies in the reliance on resting state EEG for correlation with the behavioural measures. This captures trait-based relationships, but does miss out on the brain activity dynamics within/across trials, which could be used for a direct readout of evidence accumulation to a decision, for capturing spontaneous fluctuations of the processes under study, etc. Also, in terms of resting state EEG, both eyes-closed (EC) and eyes-open (EO) data have been recorded, but their links to perceptual fusion point/ confidence seem somewhat inconsistent across the results. This is a bit confusing. Are the EO and EC signals in any way related/ correlated, and if not, what are they supposed to represent? Would an analysis of these EEG measures during task performance (e.g., in a pre-stimulus = baseline time window) provide more consistent results? These points could be resolved by additional analyses and/or more elaborate discussions.

    4. Reviewer #3 (Public review):

      Summary:

      In this study, the authors seek to explain what influences the temporal resolution of visual perception and its associated metacognitive monitoring, interindividual differences in such processes, and the neural mechanisms associated with these interindividual differences. More specifically, they investigated the factors influencing the perception of a rapid alternating stream of visual patterns as a single fused percept versus two segregated stimuli, and how these factors relate to stable features of ongoing brain activity. They introduce a novel sustained-stream temporal integration paradigm designed to address limitations of traditional two-flash tasks, and combine this with resting-state electroencephalography (EEG) to examine how individual alpha peak frequency and the aperiodic component of the power spectrum relate to temporal integration thresholds, perceptual history effects, and subjective confidence. Their overarching aim is to move beyond a purely oscillatory account of temporal sampling and to test whether periodic (alpha) and non-periodic (aperiodic) neural dynamics jointly shape perceptual decisions.

      Strengths:

      The study has several notable strengths. First, the experimental paradigm represents a thoughtful and innovative refinement of earlier approaches. By presenting alternating gratings within a continuous stream and varying the duration of each element rather than introducing discrete blank intervals, the authors mitigate well-known confounds of classical two-flash paradigms, particularly the possibility that "fusion" reports reflect missed detections rather than genuine temporal integration. The psychometric functions are well characterized, and the sample size is large for an individual-differences EEG study, with an a priori power analysis supporting the adequacy of the sample. Second, the use of spectral parameterization to separate oscillatory alpha peak frequency from the aperiodic component of the spectrum is methodologically rigorous and timely, as this distinction is increasingly recognized as important to avoid confounds in oscillatory activity estimation and the measurement of neural noise/excitatory-inhibitory balance (i.e., the aperiodic component of the power spectrum). The present work contributes to this emerging direction by relating both to behavioral indices within the same dataset. Third, the integration of perceptual thresholds, serial dependence, and subjective confidence within a unified framework provides a richer account of temporal perception than studies focusing on a single measure. In particular, the demonstration that resting alpha frequency predicts integration thresholds and that the aperiodic exponent relates to variability of the psychometric function is broadly consistent with the authors' central claims.

      Weaknesses:

      (1) At the same time, several aspects of the interpretation require caution. One conceptual issue concerns the interpretation of the psychometric slope parameter as an index of "temporal precision." The manuscript consistently equates steeper slopes with higher perceptual precision or lower internal noise. However, the slope of a binary psychometric function does not uniquely index sensory temporal resolution. It reflects the steepness of the transition between response categories and can arise from multiple sources, including variability in sensory encoding, instability of decision criteria, lapse rates, or other decisional processes. Even in the literature cited by the authors, slope is often described more generally as reflecting perceptual variability or sensory and/or decision noise rather than a pure measure of perceptual precision. An abrupt transition from "fused" to "segregated" responses, therefore, does not necessarily imply finer temporal resolution at the sensory level; it may instead reflect more consistent categorization or reduced decisional variability. The present data convincingly demonstrate relationships between spectral measures and the steepness of behavioral transitions, but they do not by themselves establish that this steepness reflects perceptual temporal precision rather than broader sources of behavioral variability.

      (2) A related concern involves the causal language used to describe the relationship between neural measures and behavior. The EEG metrics are derived from resting-state recordings and therefore reflect stable, trait-like individual differences. Nonetheless, the Discussion sometimes adopts mechanistic phrasing suggesting that slower alpha rhythms or flatter spectra lead the brain to compensate by weighting prior information more heavily, or that neural noise is being "regulated." Such formulations imply within-task adaptive processes that are not directly measured. The results demonstrate robust between-participant associations, but further research is needed to establish whether individuals regulate neural noise or adjust prior weighting dynamically.

      (3) Another point that merits clarification concerns the control analyses. The authors appropriately use spectral parameterization to dissociate oscillatory alpha peak frequency from the aperiodic component in the main analyses; however, their subsequent control analyses examining other frequency bands appear to rely on conventional band-power measures. Because band power can be influenced by the aperiodic background, null effects in other bands are difficult to interpret without similarly accounting for aperiodic structure.

      (4) In addition, the temporal structure of the stimulus stream introduces an interpretational nuance. Varying the duration of each Gabor in a continuous alternation produces quasi-periodic stimulation rates, and several of these ISIs fall within the alpha frequency range. Rhythmic visual stimulation at alpha-range frequencies is known to produce strong stimulus-locked responses and can interact with intrinsic alpha rhythms in a frequency-dependent manner (Keitel et al., 2019; Gulbinaite et al., 2017). Although the present study does not record EEG during task performance and therefore cannot directly assess stimulus-driven steady-state responses, this aspect of the design complicates a purely intrinsic sampling interpretation. The observed relationship between resting alpha frequency and integration thresholds may reflect intrinsic sampling speed, but it could also be influenced by how closely an individual's alpha rhythm aligns with alpha-range temporal structure in the stimulus.

      Conclusion:

      Despite these limitations, the study achieves many of its primary aims. The sustained-stream paradigm reliably elicits graded temporal integration behavior and robust serial dependence effects. Individual alpha frequency is convincingly associated with integration thresholds, and the aperiodic exponent relates to behavioral variability measures. These findings support the broader conclusion that temporal perception reflects an interaction between rhythmic neural dynamics and the background spectral structure of ongoing activity. The work is likely to have a meaningful impact for researchers studying perceptual timing, perceptual history, individual differences in brain rhythms, and the functional role of aperiodic neural activity.

      References:

      Keitel, C., Keitel, A., Benwell, C. S., Daube, C., Thut, G., & Gross, J. (2019). Stimulus-driven brain rhythms within the alpha band: The attentional-modulation conundrum. Journal of Neuroscience, 39(16), 3119-3129.

      Gulbinaite, R., Van Viegen, T., Wieling, M., Cohen, M. X., & VanRullen, R. (2017). Individual alpha peak frequency predicts 10 Hz flicker effects on selective attention. Journal of Neuroscience, 37(42), 10173-10184.

    5. Author Response:

      (1) Clarification of the distinction between resting-state trait measures and ongoing neural dynamics

      All the Reviewers commented that this study provides a useful characterization of the relationship between trait-based resting-state neural dynamics and behavioral measures. At the same time, we agree that including ongoing EEG dynamics during task performance would have added important complementary information. In particular, task-related EEG would allow a more direct characterization of the relationship between ongoing neural activity and behavioral indices at the single trial level, thereby helping to clarify the role of ongoing neural dynamics in evidence accumulation and perceptual decision-making. It would also enable testing how pre-stimulus alpha oscillations and aperiodic activity dynamically influence temporal integration, serial dependence, and confidence on a trial-by-trial basis.

      However, we would like to emphasize that the primary aim of the present study was to investigate trait-level resting-state neural dynamics, which are known to be relatively stable and consistent within individuals, such as individual alpha frequency (e.g., Grandy et al., 2013; Wiesman & Wilson, 2019; Gray & Emmanouil, 2020) and aperiodic neural dynamics (Demuru and Fraschini, 2020; Pathania et al., 2021; Euler et al., 2024), and to examine whether these stable neural characteristics predict behavioral measures indexing temporal perception. Accordingly, the present study was designed to address how stable individual differences in resting-state neural dynamics shape temporal performance, rather than within-task neural fluctuations during the temporal task. We agree that combining resting-state and task-related EEG would be a valuable direction for future work, but this lies beyond the scope of the current dataset, as EEG was not recorded during task performance. Furthermore, we agree with the Reviewers that some of the wording in the Discussion can be clarified to emphasize the trait-level, rather than trial-level, nature of the task and potential interpretations.

      Additionally, we agree that the relationship between eyes-open (EO) and eyes-closed (EC) resting-state EEG, and their differential associations with behavior, warrants further discussion. In our data, EO resting-state activity emerged as a stronger predictor of behavioral performance than EC. Conceptually, resting-state EO and EC should not be considered interchangeable measures of the same underlying neural activity, but rather as related yet distinct brain states, with overlapping neural generators expressed under different state constraints. EC is typically associated with stronger posterior alpha activity and a more internally oriented mode, whereas EO reflects a more visually engaged and vigilant state, closer to the conditions under which perceptual judgments are formed. This may explain why, in our findings, brain–behavior associations are more evident in EO, consistent with the greater similarity between the EO condition and the task context. In this sense, EO may emphasize exteroceptive processing and visual readiness, whereas EC reflects a more internally oriented configuration. This difference in functional weighting could account for the stronger behavioral correlations observed in EO in the present study. The distinction between these resting states has been emphasized in previous EEG and neuroimaging work showing differences in power, topography, and large-scale network organization (e.g., Marx et al., 2004). Additionally, these state-related differences may reflect physiological changes related to sensory processing (El Boustani et al., 2009) and arousal (Lendner et al., 2020). Accordingly, the present dissociation may arise because EO provides a resting-state measure that is more proximal to the sensory and excitability conditions engaged during task performance (for similar findings, see also Deodato and Melcher, 2024). However, we agree with the reviewers that further clarification of these state-related differences is warranted. In the revised manuscript, we will (i) expand the Discussion to more clearly articulate the conceptual distinction between EO and EC and their expected links to perceptual and confidence measures, (ii) systematically describe EO–EC differences across all EEG measures analyzed, and (iii) quantify the relationship between EO and EC indices to directly assess the extent to which they share trait-like variance across individuals.

      In the revised manuscript, we will clarify these points by adjusting the text, strengthening the conceptual framing, and expanding the Discussion, including a more detailed outline of future research directions.

      (2) Functional interpretation of psychometric measures

      The Reviewers raised an important point regarding the interpretation of the psychometric parameters investigated in our study. In particular, we agree that the slope of a binary psychometric function does not provide a direct measure of sensory temporal resolution or perceptual sensitivity, and that our original wording may have overstated this interpretation. Rather, the slope reflects the steepness of the transition between response categories and indexes overall behavioural variability, which can arise from multiple sources, including variability in sensory encoding, decision criteria, and occasional response errors (e.g., Wichmann and Hill 2001; Prins 2012).

      We therefore agree that interpreting steeper slopes as necessarily reflecting “temporal precision” may be overly specific, and that there are other possible interpretations. In the revised manuscript, we will adopt more cautious terminology and describe the slope more generally as indexing behavioral variability in the transition between perceptual reports, which may reflect a combination of sensory and decisional factors. Importantly, our results demonstrate robust relationships between neural measures and the consistency or sharpness of perceptual categorization, rather than uniquely isolating sensory temporal resolution. While, in standard psychophysical frameworks, the slope is related to internal variability in the sensory representation, this relationship depends on model assumptions and does not uniquely isolate sensory precision (e.g., Prins, 2016). Following the reviewers’ suggestion, we will also refine our psychometric modeling by incorporating a lapse parameter. We agree with the Reviewer that accounting for occasional stimulus-independent errors (e.g., lapses) can improve parameter estimation and prevent biases in slope and threshold estimates when lapse rates are implicitly fixed to zero (Wichmann & Hill, 2001). In the revised manuscript, we will therefore (i) clarify the terminology used to describe psychometric parameters and (ii) report additional analyses including lapse rates.

      In addition, we agree that complementary modeling approaches could help disentangle perceptual and decisional contributions to the observed effects by providing access to latent parameters of perceptual decision-making. For example, within a signal detection framework, one could test whether EEG measures relate to perceptual sensitivity versus decision criterion, while sequential sampling models such as the diffusion model (e.g., Ratcliff and McKoon, 2008) could assess whether neural measures are associated with parameters such as drift rate, decision boundary, starting bias, or trial-to-trial variability. However, several characteristics of the present paradigm limit the direct applicability of these approaches. First, the task relies on a continuous manipulation of sensory evidence across stimulus durations (ISIs), and behavioral responses are summarized through psychometric functions rather than modeled at the single-trial level. As a result, the current framework does not provide direct access to trial-by-trial latent decision variables required by these models. Second, reaction times were not collected, which constrains the application of sequential sampling models that rely on joint modeling of accuracy and response times. Finally, while the task involves categorical judgments (integration vs. segregation), it does not include explicit signal-absent or catch trials, which can help constrain sensitivity and criterion estimates within classical signal detection formulations. Despite these limitations, we agree that these approaches could still provide useful insights. In the revised manuscript, we will explore whether alternative modeling approaches (e.g., signal detection-based metrics or Bayesian psychometric modeling) can help further characterize the contributions of perceptual sensitivity, decision criterion, and response variability to our behavioral measures. While these analyses will necessarily remain exploratory given the structure of the current dataset, they may provide initial insights into whether the observed effects reflect perceptual or decisional dynamics. A more definitive dissociation, however, is beyond the scope of the present study and will be an important direction for future work.

      (3) Control analyses and robustness of EEG–behavior relationships

      The Reviewers raised interesting points regarding the interpretation of our control analyses and the potential influence of stimulus structure on the observed EEG–behavior relationships. We agree that these aspects require clarification and additional analyses to strengthen the robustness of our findings.

      First, regarding the control analyses across frequency bands, we acknowledge that while our main analyses appropriately dissociate oscillatory and aperiodic components using spectral parameterization, the control analyses were based on conventional band-power measures. As correctly noted by the reviewers, band-limited power estimates can be influenced by the aperiodic background, which complicates the interpretation of null effects in the other frequency bands. In the revised manuscript, we will address this issue by extending our spectral parameterization approach to these control analyses. Specifically, we will recompute band-specific measures after removing the aperiodic component, allowing a clearer comparison across frequency bands and a more robust assessment of the specificity of alpha-related effects. Preliminary analyses suggest that these updated results are likely to be consistent with our initial findings, thereby reinforcing the robustness of the reported effects.

      Another important point raised by the reviewers concerns the temporal structure of the stimulus stream. We agree that the continuous alternation of Gabor stimuli at varying durations introduces quasi-periodic stimulation rates that may induce entrainment of neural oscillations. Notably, some inter-stimulus intervals correspond to frequencies within the alpha range, which raises the possibility that the observed relationship between resting alpha frequency and integration thresholds may not solely reflect intrinsic sampling speed, but could also be influenced by the degree of alignment between an individual’s alpha rhythm and the temporal structure of the stimulus. As highlighted in prior work (e.g., Gulbinaite et al., 2017; Keitel et al., 2019; Gallina et al., 2023; Duecker et al., 2024), rhythmic stimulation in the alpha range can interact with intrinsic alpha oscillations and modulate both neural and perceptual processing. Although our study does not include EEG recordings during task performance and therefore cannot directly assess stimulus-locked responses or neural entrainment, we agree that this factor should be explicitly considered in the interpretation of our findings. To address this point, in the revised manuscript we will perform additional control analyses to assess the robustness of the observed relationships while accounting for potential rhythmic stimulation confounds. Specifically, we will explore whether the strength of behavioral effects and their relationship with EEG measures depends on the alignment between each participant’s individual alpha frequency and the effective stimulation rate induced by the stimulus presentation. In addition, we will test whether the association between resting-state alpha frequency and behavioral measures is disproportionately driven by stimulus durations corresponding to alpha-range temporal frequencies. These analyses will help determine whether the observed effects primarily reflect intrinsic sampling properties or are modulated by resonance-like interactions between endogenous rhythms and stimulus timing. We will also address all additional recommendations raised by the reviewers in the revised manuscript.

      References

      Demuru, M., & Fraschini, M. (2020). EEG fingerprinting: Subject-specific signature based on the aperiodic component of power spectrum. Computers in Biology and Medicine, 120, 103748.

      Deodato, M., & Melcher, D. (2024). Correlations between visual temporal resolution and individual alpha peak frequency: Evidence that internal and measurement noise drive null findings. Journal of Cognitive Neuroscience, 36(4), 590-601.

      Duecker, K., Doelling, K. B., Breska, A., Coffey, E. B., Sivarao, D. V., & Zoefel, B. (2024). Challenges and Approaches in the Study of Neural Entrainment. Journal of Neuroscience, 44(40).

      El Boustani, S., Marre, O., Béhuret, S., Baudot, P., Yger, P., Bal, T., ... & Frégnac, Y. (2009). Network-state modulation of power-law frequency-scaling in visual cortical neurons. PLoS computational biology, 5(9), e1000519.

      Euler, M. J., Vehar, J. V., Guevara, J. E., Geiger, A. R., Deboeck, P. R., & Lohse, K. R. (2024). Associations between the resting EEG aperiodic slope and broad domains of cognitive ability. Psychophysiology, 61(6), e14543.

      Gallina, J., Marsicano, G., Romei, V., & Bertini, C. (2023). Electrophysiological and Behavioral Effects of Alpha-Band Sensory Entrainment: Neural Mechanisms and Clinical Applications. Biomedicines, 11(5), 1399.

      Grandy, T. H., Werkle‐Bergner, M., Chicherio, C., Schmiedek, F., Lövdén, M., & Lindenberger, U. (2013). Peak individual alpha frequency qualifies as a stable neurophysiological trait marker in healthy younger and older adults. Psychophysiology, 50(6), 570-582.

      Gray, M. J., & Emmanouil, T. A. (2020). Individual alpha frequency increases during a task but is unchanged by alpha‐band flicker. Psychophysiology, 57(2), e13480.

      Gulbinaite, R., Van Viegen, T., Wieling, M., Cohen, M. X., & VanRullen, R. (2017). Individual alpha peak frequency predicts 10 Hz flicker effects on selective attention. Journal of Neuroscience, 37(42), 10173-10184.

      Keitel, C., Keitel, A., Benwell, C. S., Daube, C., Thut, G., & Gross, J. (2019). Stimulus-driven brain rhythms within the alpha band: The attentional-modulation conundrum. Journal of Neuroscience, 39(16), 3119-3129.

      Lendner, J. D., Helfrich, R. F., Mander, B. A., Romundstad, L., Lin, J. J., Walker, M. P., ... & Knight, R. T. (2020). An electrophysiological marker of arousal level in humans. elife, 9, e55092.

      Marx, E., Deutschländer, A., Stephan, T., Dieterich, M., Wiesmann, M., & Brandt, T. (2004). Eyes open and eyes closed as rest conditions: impact on brain activation patterns. Neuroimage, 21(4), 1818-1824.

      Pathania, A., Euler, M. J., Clark, M., Cowan, R. L., Duff, K., & Lohse, K. R. (2022). Resting EEG spectral slopes are associated with age-related differences in information processing speed. Biological Psychology, 168, 108261.

      Prins, N. (2012). The psychometric function: The lapse rate revisited. Journal of Vision, 12(6), 25-25.

      Prins, N. (2016). Psychophysics: a practical introduction. Academic Press.

      Ratcliff, R., & McKoon, G. (2008). The diffusion decision model: theory and data for two-choice decision tasks. Neural computation, 20(4), 873-922.

      Wichmann, F. A., & Hill, N. J. (2001). The psychometric function: I. Fitting, sampling, and goodness of fit. Perception & psychophysics, 63(8), 1293-1313.

      Wiesman, A. I., & Wilson, T. W. (2019). Alpha frequency entrainment reduces the effect of visual distractors. Journal of cognitive neuroscience, 31(9), 1392-1403.

    1. eLife Assessment

      This important study presents convincing evidence that uncovers a novel signaling axis impacting the post-mating response in females of the brown planthopper. The findings open several avenues for testing the molecular and neurobiological mechanisms of mating behavior in insects, and in the revised version the authors provide further evidence supporting their conclusions.

    2. Reviewer #2 (Public review):

      Summary:

      The work presented by Zhang and coauthors in this manuscript presents the study of the neuropeptide corazonin in modulating the post-mating response of the brown planthopper, with further validation in Drosophila melanogaster. To obtain their results, the authors used several different techniques that orthogonally demonstrate the involvement of corazonin signalling in regulating the female post-mating response in these species.

      They first injected synthetic corazonin peptide into female brown planthoppers, showing altered mating receptivity in virgin females and a higher number of laid eggs after mating. The role of corazonin in controlling these post-mating traits has been further validated by knocking down the expression of the corazonin gene by RNA interference and through CRISPR-Cas9 mutagenesis of the gene. Further proof of the importance of corazonin signaling in regulating the female post-mating response has been achieved by knocking down the expression or mutagenizing the gene coding for the corazonin receptor.

      Similar results have been obtained in the fruit fly Drosophila melanogaster, suggesting that corazonin signaling is involved in controlling the female post-mating response in multiple insect species.

      The study of the signalling pathways controlling the female post-mating response in insects other than Drosophila is scarce, and this limits the ability of biologists to draw conclusions about the evolution of the post-mating response in female insects. This is particularly relevant in the context of understanding how sexual conflict might work at the molecular and genetic levels, and how, ultimately, speciation might occur at this level. Furthermore, the study of the post-mating response could have practical implications, as it can lead to the development of control techniques, such as sterilization agents.

      The study, therefore, expands the knowledge of one of the signalling pathways that control the female post-mating response, the corazonin neuropeptide. This pathway is involved in controlling the post-mating response in both Nilaparvata lugens (the brown planthopper) and Drosophila melanogaster, suggesting its involvement in multiple insect species.

      The study uses multiple molecular approaches to convincingly demonstrate that corazonin controls the female post-mating response. The data supporting the main claim of the manuscript are solid and convincing.

    3. Author Response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This important study presents convincing evidence that uncovers a novel signaling axis impacting the post-mating response in females of the brown planthopper. The findings open several avenues for testing the molecular and neurobiological mechanisms of mating behavior in insects, although broad concerns remain about the relevance of some claims.

      Thank you very much for your letter and the insightful, valuable comments from the reviewers on our manuscript. These suggestions have been instrumental in strengthening the quality and clarity of our work. We have carefully addressed each concern, performed additional experiments, revised the relevant sections thoroughly, and made extensive refinements to the Discussion to clarify future research directions. Below is our detailed point-by-point response.

      Public Reviews:

      Reviewer #1 (Public review):

      In this work, Zhang et al, through a series of well-designed experiments, present a comprehensive study exploring the roles of the neuropeptide Corazonin (CRZ) and its receptor in controlling the female post-mating response (PMR) in the brown planthopper (BPH) Nilaparvata lugen and Drosophila melanogaster. Through a series of behavioural assays, micro-injections, gene knockdowns, Crispr/Cas gene editing, and immunostaining, the authors show that both CRZ and CrzR play a vital role in the female post-mating response, with impaired expression of either leading to quicker female remating and reduced ovulation in BPH. Notably, the authors find that this signaling is entirely endogenous in BPH females, with immunostaining of male accessory glands (MAGs) showing no evidence of CRZ expression. Further, the authors demonstrate that while CRZ is not expressed in the MAGs, BPH males with Crz knocked out show transcriptional dysregulation of several seminal fluid proteins and functionally link this dysregulation to an impaired PMR in BPH. In relation, the authors also find that in CrzR mutants, the injection of neither MAG extracts nor maccessin peptide triggered the PMR in BPH females. Finally, the authors extend this study to D. melanogaster, albeit on a more limited scale, and show that CRZ plays a vital role in maintaining PMR in D. melanogaster females with impaired CRZ signaling, once again leading to quicker female remating and reduced ovulation. The authors must be commended for their expansive set of complementary experiments. The manuscript is also generally well written. Given the seemingly conserved nature of CRZ, this work is a significant addition to the literature, opening several avenues for testing the molecular and neurobiological mechanisms in which CRZ triggers the PMR.

      However, there are some broad concerns/comments I had with this manuscript. The authors provide clear evidence that CRZ signaling plays a major role in the PMR of D. melanogaster, however, they provide no evidence that CRZ signaling is endogenous, as they did not check for expression in the MAGs of D. melanogaster males. Additionally, while the authors show that manipulating Crz in males leads to dysregulated seminal fluid expression and impaired PMR in BPH, the authors also find that CRZ injection in males in and of itself impairs PMR in BPH. The authors do not really address what this seemingly contradictory result could mean. While a lot of the figures have replicate numbers, the authors do not factor in replicate as an effect into their models, which they ideally should do. Finally, while the discussion is generally well-written, it lacks a broader conclusion about the wider implications of this study and what future work building on this could look like.

      Thank you very much for your insightful and valuable comments on our manuscript. We have carefully addressed each of your concerns, revised the relevant sections thoroughly, and conducted additional experiments to further strengthen our conclusions. To better focus on the core finding of this study, the critical role of Crz/CrzR signaling in regulating the post-mating response (PMR) of female brown planthoppers (BPH), and to eliminate potential confusion associated with the male-related data, we have removed the experiments investigating CRZ function in males from the current version of the manuscript. These observations on male CRZ signaling will be explored in greater depth and presented as a standalone study in a separate manuscript in the future.

      Reviewer #2 (Public review):

      Summary:

      The work presented by Zhang and coauthors in this manuscript presents the study of the neuropeptide corazonin in modulating the post-mating response of the brown planthopper, with further validation in Drosophila melanogaster. To obtain their results, the authors used several different techniques that orthogonally demonstrate the involvement of corazonin signalling in regulating the female post-mating response in these species.

      They first injected synthetic corazonin peptide into female brown planthoppers, showing altered mating receptivity in virgin females and a higher number of eggs laid after mating. The role of corazonin in controlling these post-mating traits has been further validated by knocking down the expression of the corazonin gene by RNA interference and through CRISPR-Cas9 mutagenesis of the gene. Further proof of the importance of corazonin signalling in regulating the female post-mating response has been achieved by knocking down the expression or mutagenizing the gene coding for the corazonin receptor.

      Similar results have been obtained in the fruit fly Drosophila melanogaster, suggesting that corazonin signalling is involved in controlling the female post-mating response in multiple insect species.

      Notably, the authors also show that corazonin controls gene expression in the male accessory glands and that disruption of this pathway in males compromises their ability to elicit normal post-mating responses in their mates.

      Strengths:

      The study of the signalling pathways controlling the female post-mating response in insects other than Drosophila is scarce, and this limits the ability of biologists to draw conclusions about the evolution of the post-mating response in female insects. This is particularly relevant in the context of understanding how sexual conflict might work at the molecular and genetic levels, and how, ultimately, speciation might occur at this level. Furthermore, the study of the post-mating response could have practical implications, as it can lead to the development of control techniques, such as sterilization agents.

      The study, therefore, expands the knowledge of one of the signalling pathways that control the female post-mating response, the corazonin neuropeptide. This pathway is involved in controlling the post-mating response in both Nilaparvata lugens (the brown planthopper) and Drosophila melanogaster, suggesting its involvement in multiple insect species.

      The study uses multiple molecular approaches to convincingly demonstrate that corazonin controls the female post-mating response.

      Thank you very much for your valuable and insightful comments on our manuscript. We highly appreciate your recognition of the study’s value, including its focus on non-model insects, the evolutionary implications of corazonin signaling, and the rigorous use of multiple molecular techniques. We have carefully addressed your suggestions and revised the manuscript accordingly to enhance its clarity, accuracy, and depth. Below is our detailed response to your comments.

      Weaknesses:

      The data supporting the main claims of the manuscript are solid and convincing. The statistical analysis of some of the data might be improved, particularly by tailoring the analysis to the type of data that has been collected.

      Thank you for your valuable suggestion regarding statistical analysis. We fully agree that tailoring statistical methods to the specific type of data enhances the rigor and reliability of our findings.

      In response, we have comprehensively re-evaluated and revised the statistical analyses for all datasets in the manuscript:

      (1) For proportion-based data (e.g., female mating receptivity, re-mating rate), we replaced inappropriate tests (e.g., ANOVA) with chi-square tests for contingency tables, which are more suitable for comparing categorical variables.

      (2) For time-series data (e.g., receptivity at different time points post-injection), we adopted generalized linear models (GLM) with logit links followed by pairwise contrasts to address concerns of multiple testing, instead of hour-by-hour Mann-Whitney tests.

      (3) For continuous data (e.g., number of eggs laid, gene expression levels), we retained Student’s t-tests or one-way ANOVA after verifying normality, and used non-parametric tests (Mann-Whitney, Kruskal-Wallis) for non-normally distributed data.

      All revisions have been clearly described in the figure legends and Methods section, ensuring transparency and reproducibility. We believe these adjustments significantly improve the statistical robustness of our conclusions.

      In the case of the corazonin effect in females, all the data are coherent; in the case of CRISPR-Cas9-induced mutagenesis, the analysis of the behavioural trait in heterozygotes might have helped in understanding the haplosufficiency of the gene and would have further proved the authors' point.

      Thank you for this insightful suggestion. We fully agree that analyzing the behavioral traits of heterozygous mutants is crucial for understanding the haplosufficiency of the Crz and CrzR genes, and we regret overlooking this aspect in the initial submission.

      To address this gap, we have conducted additional behavioral assays using heterozygous Crz (+/ΔCrz) and CrzR (+/CrzR<sup>M</sup>) mutant females.

      (1) For re-mating receptivity: We found no significant differences in either re-mating rate or egg-laying output between +/ΔCrz females and wild-type females. By contrast, +/CrzR<sup>M</sup> females exhibited re-mating and oviposition phenotypes comparable to those of homozygous CrzR mutants, with no significant differences detected between these two genotypes.

      (2) These results indicate that the Crz loss-of-function phenotype is recessive, and that a single functional copy of Crz is sufficient to sustain a normal post-mating response (PMR), but the CrzR loss-of-function phenotype is dominant, and that a single functional copy of CrzR is insufficient to maintain a normal post-mating response.

      This supports our core conclusion that CRZ signaling is critical for mediating the female PMR, as even partial reduction of gene dosage impairs the response.

      The heterozygote data have been integrated into the revised manuscript, including updated figures (e.g., Figure 1J-K for Crz heterozygotes and Figure 3I-J for CrzR heterozygotes) and corresponding legends. We believe this addition strengthens the rigor of our genetic evidence and provides valuable insights into the gene dosage requirements for CRZ-mediated PMR regulation.

      Less consistency was achieved in males (Figure 5): the authors show that injection of CRZ and RNAi of crz, or mutant crz, has the same effect on male fitness. However, the CRZ injection should activate the pathway, and crz RNAi and mutant crz should inhibit the pathway, yet they have the same effect. A comment about this discrepancy would have improved the clarity of the manuscript, pointing to new points that need to be clarified and opening new scientific discussion.

      Thank you for highlighting this important discrepancy in the male-related CRZ signaling data. We fully acknowledge the inconsistency: CRZ injection (which was intended to activate the pathway) and Crz RNAi/mutagenesis (which was intended to inhibit the pathway) yielded similar effects on male fitness, and we regret not addressing this ambiguity in the initial submission.

      To resolve this confusion and refocus the current manuscript on its core objective—elucidating the role of endogenous CRZ/CrzR signaling in female post-mating response (PMR), we have removed all experiments, analyses, and discussions related to male CRZ function. This decision ensures that the manuscript maintains a clear, cohesive narrative centered on female reproductive physiology, as recommended by both reviewers and the editorial team.

      Regarding the observed discrepancy in males, we recognize its scientific significance and plan to investigate it thoroughly in a standalone follow-up study.

      Recommendations for the authors:

      Reviewing Editor Comments:

      The manuscript would be significantly strengthened by an explanation of the seemingly contradictory results obtained in males, where both CRZ injections and Crz silencing afford the same results. Additionally, Crz expression data in the MAGs of D. melanogaster males is necessary to support your conclusions of endogenous signaling in this species. Besides correcting several imprecisions and inconsistencies in the text and figures, to improve quality and accuracy, the abstract should be restructured and the discussion modified as recommended by reviewers.

      Thank you for your comprehensive letter and valuable guidance. We have carefully addressed all the points raised by the editorial team and reviewers, and the revised manuscript now incorporates substantial improvements to clarity, accuracy, and scientific rigor. Below is our detailed response to your specific requests:

      Contradictory Male-Related Results

      We fully acknowledge the importance of addressing the contradictory findings in male CRZ signaling, where both CRZ injection and Crz silencing/mutagenesis yielded similar effects on male fitness. To resolve this ambiguity and maintain the manuscript’s focus on its core objective, elucidating endogenous CRZ/CrzR signaling in the female post-mating response (PMR), we have removed all male-related experiments, analyses, and discussions from the revised manuscript. This decision ensures that the current work remains cohesive and centered on female reproductive physiology, as recommended by the reviewers.

      We recognize the scientific significance of the male-specific discrepancy and plan to investigate it in a standalone follow-up study in the near future.

      Crz expression data in D. melanogaster Male Accessory Glands (MAGs)

      To support our conclusion of endogenous CRZ signaling in D. melanogaster females, we have supplemented the manuscript with additional experiments verifying the absence of CRZ in male MAGs:

      (1) RT-PCR Analysis: We detected no Crz mRNA in dissected male MAGs, whereas Crz expression was confirmed in the male head (positive control).

      (2) Immunohistochemistry and GAL4 system: Using the GAL4–UAS system (Crz-Gal4/UAS-mCD8-GFP) to label CRZ-producing neurons, combined with anti-CRZ antibody staining, we observed no CRZ-specific signal in male MAGs.

      These results demonstrate that D. melanogaster male MAGs neither synthesize nor contain CRZ peptide, confirming that CRZ acts as an endogenous female signaling factor (rather than a male-transferred seminal fluid component) in this species. The new data are included in Figure 5H-I and described in the Results and Methods sections.

      Correction of Imprecisions and Inconsistencies

      We have systematically revised the manuscript to address text and figure inaccuracies:

      Text Revisions: Corrected typos (e.g., Line 854), standardized species names (replacing “Drosophila” with “D. melanogaster” throughout), removed redundant or inappropriate sentences, and refined terminology (e.g., replacing “expression” with “localization” for protein detection).

      Figure Corrections: Fixed inconsistent Y-axis labels and numerical ranges (e.g., aligning percentages/probabilities with appropriate scales), resolved color scheme confusion, standardized oviposition-related labels to “Per female egg numbers within 3 days,” and added details on sample sizes and replicates to all figure legends.

      Statistical Improvements: Re-evaluated statistical analyses for proportion-based datasets (applying chi-square tests for contingency tables) and time-series data (using generalized linear models to address multiple testing), with revised methods clearly described in the text and figure legends.

      Abstract Restructuring and Discussion Modification

      Abstract: We have restructured the abstract to group results thematically (rather than sequentially) for improved readability. The revised abstract emphasizes the core findings: CRZ/CrzR signaling is critical for female PMR in both N. lugens and D. melanogaster, acts endogenously in females, and is required for male seminal fluid factors to induce PMR. Male-related content has been removed since experimental data are deleted from the rest of the paper.

      Discussion: We have modified the discussion to include the evolutionary conservation of CRZ-mediated female PMR, the molecular and neurobiological implications of CRZ/CrzR signaling, and future research directions (e.g., dissecting downstream pathways in the female reproductive tract and brain). We have also reduced tangential content and clarified how our findings advance understanding of female endogenous signaling in PMR regulation. A new section was added at the end, which discusses outstanding questions related to CRZ and the PMR in both insect species.

      To both the above-mentioned sections and the Introduction we also added new text to emphasize that CRZ is a paralog of the vertebrate peptide gonadotropin-releasing hormone (GnRH), a hormone known to regulate reproduction in vertebrates (including humans), thus suggesting conservation of an ancient role in reproduction.

      All revisions in the manuscript are highlighted in red for easy reference. We believe these changes significantly strengthen the study’s focus, clarity, and scientific impact. Thank you again for your time and consideration.

      Reviewer #1 (Recommendations for the authors):

      (1) The abstract could benefit from some restructuring. Right now, it reads like a sequential reporting of the results, but clumping together results thematically would make it easier to read, in my opinion. Also, see above re: my concerns about no evidence for the signal being endogenous in D. melanogaster.

      Thank you for your constructive suggestions regarding the abstract and the evidence for endogenous CRZ signaling in D. melanogaster. We fully agree with your feedback and have addressed both points thoroughly in the revised manuscript:

      (1) Abstract Restructuring

      We have restructured the abstract to group results thematically, rather than sequentially, to enhance readability and highlight the core findings. The revised abstract now organizes key information into three cohesive sections:

      The context and significance of female post-mating response (PMR) regulation, emphasizing the gap in understanding endogenous female signaling pathways.

      The core findings across both study species (Nilaparvata lugens and D. melanogaster), including the critical role of CRZ/CrzR signaling in suppressing re-mating and promoting oviposition, and its requirement for male seminal fluid factors to induce a PMR.

      The conclusion regarding the evolutionary conservation of endogenous CRZ signaling in female PMR, reinforcing the study’s broader implications.

      We also added new text to emphasize that CRZ is a paralog of the vertebrate peptide gonadotropin-releasing hormone (GnRH), a hormone known to regulate reproduction in vertebrates (including humans), thus suggesting conservation of an ancient role in reproduction.

      This thematic structure eliminates the linear “result-by-result” narrative, making the abstract more concise and impactful while clearly communicating the study’s key contributions.

      (2) Evidence for Endogenous CRZ Signaling in female D. melanogaster

      To address your concern about the lack of evidence for endogenous signaling in female D. melanogaster, we have supplemented the manuscript with two sets of critical experiments confirming that CRZ is not derived from male accessory glands (MAGs) but acts endogenously in females:

      RT-PCR Analysis: We performed RT-PCR on dissected male MAGs, male heads (positive control), and female tissues. Results showed no detectable Crz mRNA in MAGs, confirming that males do not synthesize CRZ in this tissue.

      Immunohistochemical and Genetic Labeling: Using the GAL4–UAS system (Crz-Gal4/UAS-mCD8-GFP) to label Crz-expressing neurons, combined with anti-CRZ antibody labeling, we observed no crz/CRZ signal in male MAGs. This confirms that MAGs neither produce nor sequester mature CRZ peptide.

      These findings demonstrate that CRZ signaling in D. melanogaster females is endogenous, as the peptide cannot be transferred from males during copulation. The new data are presented in Figure 5H-I and described in the Results section, with corresponding methods detailed in the Methods section.

      The revised abstract integrates this new evidence to explicitly state the endogenous nature of CRZ signaling in both BPH and D. melanogaster females, aligning with the thematic structure and addressing your concerns comprehensively. We believe these changes significantly improve the clarity and rigor of the abstract and the manuscript overall.

      (2) The authors use Drosophila as a broad placeholder throughout the manuscript, while they are specifically referring to D. melanogaster in several places. I would go through the manuscript and switch with the appropriate Drosophila species/species'.

      Thank you for pointing out this important detail regarding species-specific terminology. We fully agree with your suggestion to ensure accuracy and consistency in referencing the Drosophila species studied.

      We have systematically reviewed the entire manuscript, including the abstract, introduction, results, discussion, methods, and figure legends, and revised all instances where the general term “Drosophila” was used. All references now explicitly specify “D. melanogaster” to accurately reflect the species utilized in our experiments.

      (3) For the figures, I think the number of replicates is a distracting addition to the plot. This is still useful information, but could instead be added in as a line/table, in my opinion.

      Thank you very much for your suggestion. We have added the information on the number of replicates and sample sizes to the corresponding figure legends, which we hope improves clarity and readability.

      (4) There are typos in the y-axis label of all of the oviposition figures. A better re-wording would be "Per female egg numbers within 3 days".

      Thank you very much for your suggestion. Following your recommendation, we have now standardized the Y-axis label for all oviposition-related figures to “Number of eggs per female within 3 days.”

      (5) In Figure 1B and Figure 1 - Supplement 3a, since the comparisons are solely between control vs treatment, I would not join means across treatments that I am not comparing.

      To address this, we have revised Figure 1B and Figure 1—Supplement 3a by removing the connecting lines between group means. The updated figures now display independent mean ± SEM values for each dose (Figure 1B) and time point (Figure 1—Supplement 3a), with significance markers only applied to the control vs. treatment comparisons we actually tested. This revision eliminates any implied relationships between non-comparative groups and ensures the data visualization aligns with our statistical approach. We appreciate the reviewer’s suggestion, which has improved the clarity of the data presentation.

      (6) The authors mention courtship rate in lines 511, but from a look at the methods, this is not the courtship rate! This is a measure of the number of males engaging in any form of courtship. Also, in Figure 5 Supplement 2A, it appears that under 1% of males are courting. This seems extremely low. Do the authors mean percentages? In that case, I would reformat from 0 to 100/relabel the y-axis.

      Thank you for your observation and valuable feedback on this terminology and figure presentation issue. We fully acknowledge the inaccuracies and have addressed them comprehensively:

      (1) Correction of "Courtship Rate" Terminology

      We agree that the term “courtship rate” in Line 511 was incorrect, as our measurement reflects the proportion of males engaging in any form of courtship (not a rate per unit time). However, since we have removed all male-related data (including this section and associated figures) from the revised manuscript to focus on the core finding of female post-mating response (PMR), this terminology error has been eliminated entirely.

      (2) Revision of Figure 5 Supplement 2A

      Consistent with the removal of all male-related experiments, Figure 5 and its supplementary materials (including Supplement 2A) have been excluded from the revised manuscript. This ensures the current work remains cohesive and centered on female PMR, while also resolving the Y-axis labeling ambiguity you identified.

      We appreciate your careful attention to these details, which helps enhance the accuracy and clarity.

      (7) It appears Figure 5A, 5D, and 5G are mislabeled? Aren't all rematings with wild-type males?

      Thank you for identifying this labeling inconsistency. You are absolutely correct, all re-mating assays in the original figures involved wild-type males, and the mislabeling was an oversight.

      However, we have removed Figure 5 (and its associated subpanels A, D, G) entirely from the revised manuscript, as part of our decision to exclude all male-related data.

      (8) I am not sure I understand why a 30-minute post-injection threshold was chosen and what this table means. Could the authors elaborate on the methodology here on how they quantified premature ejaculation?

      Thank you for your question regarding the 30-minute post-injection observation window and the methodology for quantifying premature ejaculation.

      While we have removed all male-related data (including the corresponding table and premature ejaculation analyses) from the revised manuscript to focus on our core finding, this is no longer included in the manuscript.

      (9) Line 29 - "distensible" seems an odd choice of word here.

      We have revised Line 29 and removed “distensible”. “Peptide injection and knockdown of CRZ expression by RNAi or CRISPR/Cas9-mediated mutagenesis demonstrate that CRZ signaling suppresses mating receptivity”.

      (10) Line 57 - delete "a" from "a post-mating response" and "A PMR" because the authors are referring to a very specific suite of post-mating behaviours.

      We have revised Line 57 (and other relevant instances throughout the manuscript) to delete the article "a" from these phrases.

      (11) Line 352, delete a from "and in a significantly".

      We have revised Line 356 to remove the extraneous "a", correcting the phrase to "and in significantly".

      Reviewer #2 (Recommendations for the authors):

      The work presented in this manuscript presents the study of the neuropeptide corazonin in modulating the post-mating response of the brown planthopper, with further validation in Drosophila melanogaster. To obtain their results, the authors used several different techniques, including dsRNA injection to induce RNA interference and CRISPR-CAS9-mediated site-specific mutagenesis. The experimental design is appropriate; the results are solid and support the conclusion of the manuscript. Overall, the merit of the manuscript is to present compelling evidence that the female post-mating response is mediated by corazonin, at least in the analysed species. There are multiple reports in multiple insect species, indeed, that male factors, particularly those secreted by male accessory glands, induce post-mating response in females, but the female pathways underlying this phenomenon are poorly understood.

      There are points the authors can consider to improve the manuscript quality.

      Thank you for your generous and insightful assessment of our manuscript. We deeply appreciate your recognition of the study’s strengths, including the appropriate experimental design, solid results, and meaningful contribution to understanding female endogenous pathways in post-mating response (PMR) regulation.

      We have carefully incorporated all your constructive suggestions (e.g., statistical analysis revisions, figure label standardization, text refinements) to further strengthen the manuscript’s rigor and clarity. By focusing on corazonin (CRZ/corazonin receptor (CrzR) signaling in female brown planthoppers (Nilaparvata lugens) and validating these findings in Drosophila melanogaster, we aim to provide a conserved model for female endogenous PMR regulation across insect species.

      Thank you again for your thoughtful and supportive feedback, which has been instrumental in refining our work. We believe the revised manuscript now more effectively communicates the significance of CRZ-mediated female signaling in bridging the gap between male-derived cues and PMR execution.

      (1) Line 20: "optimal offspring". This is not a zoological parameter. One can use "optimal fitness".

      We have revised Line 20 to replace "optimal offspring" with "optimal fitness" as recommended.

      (2) Line 36-40: I think that the main message of the manuscript is the involvement of the corazonin pathway in controlling the female post-mating response. The involvement of corazonin in the male reproduction is also of note, but out of topic (in my opinion). The male corazonin is not transferred during mating from males to females, and the involvement of corazonin in controlling the gene expression in the MAGs is of note, but it is poorly related to the effect of corazonin in the female. I am not suggesting removing these data from the paper; they are important. But I do not find them that important to include them in the abstract, also because it confounds the reader at first. A similar statement can be made for the discussion (lines 728-745): making this the first piece of data commented on takes the stage, but this is not the main take-home message of the paper.

      Thank you for this suggestion. We fully agree that including male-related CRZ data in the abstract and leading the discussion with these results distracted from the primary focus and risked confounding readers. In fact, we also removed the entire section on the role of CRZ in males. We have addressed this issue comprehensively in the revised manuscript as follows:

      (1) Abstract Revision

      We have completely removed all content related to male CRZ function from the revised abstract. The updated abstract now exclusively emphasizes the core findings:

      The requirement of CRZ/CrzR signaling for mediating key female PMR traits (suppression of remating, promotion of oviposition) in both Nilaparvata lugens and Drosophila melanogaster;

      Experimental evidence confirming that CRZ acts as an endogenous female signaling factor (not a male-transferred molecule);

      The evolutionary conservation of CRZ-mediated female PMR regulation across the two insect species.

      We also added a comment on the evolutionary conservation of CRZ and GnRH signaling in reproduction.

      (2) Discussion Section Restructuring

      We have restructured the Discussion to prioritize the core message of female PMR regulation:

      Lead paragraph adjustment: Lines 728–745 (originally focusing on male CRZ and MAG gene expression) have been deleted.

      Revised opening focus: The Discussion now only contain a synthesis of our key findings on female CRZ signaling, including its molecular mechanisms, cross-species conservation, and implications for understanding endogenous female pathways downstream of male seminal fluid cues.

      We appreciate your suggestions for the narrative focus of the manuscript.

      (3) Line 49: "Reproductive behavior is critical for population sustenance and survival of the species": I find this intro a little teleological evolutionary speaking, and I am not totally sure that this has ever been demonstrated as a concept. I would skip it, simply saying "Reproductive behavior in insects is influenced...".

      Following your suggestion, we have revised Line 49 to streamline the introduction and avoid “teleological language”. The updated sentence now reads: "Reproductive behavior in insects is influenced by a complex interplay of neural, hormonal, and environmental factors."

      (4) Line 58: "A PMR has been documented across diverse insect taxa, including Drosophila melanogaster, Anopheles gambiae, Aedes aegypti, and the brown planthopper (BPH), Nilaparvata lugens". There are many other insect species for which PMR has been shown: crickets, fruit flies, grasshoppers, etc. Therefore, I would say "for example" to underline that it is not a complete list. Being an incomplete list, I suggest that the authors pay attention to the cited literature: the literature cited in the case of Anopheles gambiae demonstrates the synthesis of hormones in the MAGs, but it has nothing to do with PMR; there is nothing cited for Aedes aegypti, even if the authors named the species.

      Thank you for this constructive feedback on the framing of PMR studies across insect taxa and the accuracy of our cited literature. We fully agree with your suggestions and have addressed these issues comprehensively in the revised manuscript:

      (1) Revision of the Sentence Structure

      We have modified Line 58 to explicitly indicate that the listed species are examples rather than a complete inventory of insects with documented PMR. The revised sentence reads:

      "The PMR has been documented across diverse insect taxa, for example, Drosophila melanogasterAnopheles gambiaeAedes aegypti, crickets (Gryllodes sigillatus), grasshoppers (Dichromorpha viridis), and the brown planthopper (BPH)Nilaparvata lugens"

      (2) Correction of Literature Citations

      We have thoroughly reviewed the citations associated with the listed species to ensure they directly support the role of PMR:

      For Anopheles gambiae: We have replaced the previously cited study (focused on MAG hormone synthesis) with two relevant references that explicitly characterize PMR traits—including mating-induced oviposition stimulation and remating suppression—in this mosquito species.

      For Aedes aegypti: We have added two newly published studies that document key PMR phenotypes (e.g., post-mating refractoriness and altered feeding behavior) and their underlying molecular mechanisms in this species.

      For crickets (Gryllodes sigillatus): We added a newly published study that documents PMR phenotypes in Gryllodes sigillatus.

      We have also verified that the citations for D. melanogaster and N. lugens remain directly relevant to PMR regulation, with no adjustments needed.

      All revised citations are properly formatted and integrated into the text, with corresponding updates to the reference list.

      (5) Line 111-132: I find this redundant: it is a long summary of the methods and the results. I do not think it is needed here, but I think the authors should point to the main message of their data.

      Thank you for pointing out the redundancy of Lines 111–132. We fully agree that this section, disrupted the flow of the introduction of our study.

      To address this, we have completely removed Lines 111–132 from the revised manuscript. In place of this redundant content, we have added a concise, focused paragraph that emphasizes the central hypothesis and key objective of our work: specifically, to identify the endogenous female signaling pathways that mediate the post-mating response (PMR) downstream of male-derived cues, and to validate the conserved role of corazonin (CRZ) signaling in this process across Nilaparvata lugens and Drosophila melanogaster.

      (6) Line 156: This sentence is not needed here.

      We have deleted the sentence in Line 156 from the revised manuscript.

      (7) Figure 1E, J supplementary 3A: The label of the Y axis is the percentage of the mating females (expected 0-100%), but the numbers show the fraction (0-1). On the contrary, in Figure 1 Supplement 4, the label says "probability of survival" and the probability goes from 0 to 1, while the number of the axis goes from 0 to 100 (percentage).

      Thank you very much for pointing out these inconsistencies. We have carefully reviewed all Y-axis labels and corresponding numerical ranges throughout the manuscript and corrected the mismatched axes.

      (8) Figure1B, C, F, K supp 2, 3A: I found this use of colours confounding. Why did the authors use the light blue for sCRZ, but the mean and SE are shown in pink, which is the colour for CRZ? Furthermore, it is not reported anywhere how many individuals have been used per replicate. There is the total number of insects, the number of replicates, but there is no indication about the minimum number of insects per replicate in this and many other subsequent experiments.

      Thank you for identifying these critical inconsistencies in figure color coding and missing details on sample allocation per replicate, and we greatly appreciate your meticulous review of our data presentation.

      We have addressed these issues in the revised manuscript as follows:

      (1) Standardization of Color Coding

      We apologize for the confusing color mismatch between group labels and data points in Figure 1B, C, F, K, and Supplements 2 and 3A. We have unified the color scheme across some figures to ensure consistency:

      The sCRZ (control) group is now consistently represented by light blue for both labels and mean ± SE data points.

      The CRZ (treatment) group is now consistently represented by pink for both labels and mean ± SE data points.

      For Figures 1C, F, K and Supplementary Figure 2, we were concerned that the mean and s.e.m. bars might be visually obscured by the data points. To improve their visibility, we therefore used the opposite color to display the mean and s.e.m.

      All figure legends have been cross-checked and updated to reflect this standardized color coding.

      (2) Addition of Sample Size per Replicate

      We acknowledge that the lack of information on the minimum number of insects per replicate was a key gap in our experimental reporting. We have supplemented this critical detail in this way:

      Figure Legends: For Figure 1B, C, F, K, and Supplements 2 and 3A (as well as all subsequent experiments), we have added explicit statements specifying the minimum number of insects per replicate, alongside the total sample size and number of replicates (e.g., “n = 3 replicates, with a minimum of 10 females per replicate; total N = 35 females”). All revised figures and their corresponding legends have been integrated into the updated manuscript, and we have cross-checked all other figures to avoid similar issues.

      (9) Figure 1C, F, K, Supplementary Figure 3B: Y axis labels - "Eggs numbers of per female...". I suggest changing it to "Number of eggs per female...".

      We have revised the Y-axis labels for Figure 1C, F, K and Supplementary Figure 3B to Number of eggs per female...” as recommended. Additionally, we cross-checked all other oviposition-related figures in the manuscript to ensure uniform use of this standardized label, eliminating any inconsistent phrasing across the dataset.

      (10) Legend Figure 1B: Mann Whitney test. How did the authors perform the test? Hour by hour? I am not sure this is the best way to analyse the data, because it is a case of multiple testing. Probably a linear model or a glm might be a better fit.

      Thank you very much for pointing out this issue. In Figure 1B, each concentration group was analyzed using data from independent individuals, and therefore the comparisons do not involve repeated measures across time; for this reason, we consider the Mann–Whitney test appropriate for this dataset. For Figure 1—Supplement 3A, however, our original analysis compared treatment and control groups hour by hour, which indeed raises concerns regarding multiple testing. Following your suggestion, we have removed the potentially misleading connecting lines and reanalyzed the dataset using a generalized linear model (GLM). The updated figure and revised legend have been included in the revised manuscript.

      (11) Legend Figure 1E: ANOVA test. These are proportions, not continuous variables of the samples. Tests for proportions might be a better fit (chi-square, etc.).

      To address this issue, we have re-analyzed the proportional data in Figure 1E using Pearson’s chi-square test of independence, which directly evaluates the association between treatment group (sCRZ vs. CRZ) and the binary mating status (mated vs. unmated) of females. This test is statistically robust for proportional data and avoids the assumptions of normality and homogeneity of variances required for ANOVA.

      (12) Knockout experiments: I agree with the authors that the data are strong enough to sustain the conclusions. However, is the corazonin knockout haplosufficient or is it recessive? What is the behaviour of the heterozygotes?

      Thank you for this insightful question regarding the genetic basis of the corazonin (CRZ) knockout phenotype.

      To address your query, we have supplemented experiments with additional phenotypic analyses of heterozygous CRZ knockout females (+/ΔCrz), and we clarify the genetic nature of the knockout as follows:

      (1) Genetic basis of the CRZ knockout:

      The CRZ knockout line was generated via CRISPR-Cas9-mediated deletion of the Crz coding region, resulting in a recessive loss-of-function mutation. Homozygous knockout females (ΔCrz) exhibited the full phenotypic suite reported in the manuscript (impaired post-mating suppression of remating, reduced oviposition rate, and disrupted CRZ signaling in the reproductive tract).

      (2) Phenotype of heterozygous females:

      Behavioral and physiological assays of +/ΔCrz heterozygotes revealed no significant differences compared to wild-type (+/ΔCrz) females across all measured post-mating traits. Specifically:

      Remating rates of +/ΔCrz females were indistinguishable from wild-type controls at 48 h post-mating.

      Oviposition output of +/ΔCrz females matched wild-type levels over a 3-day assay period.

      (3) Updates to the manuscript:

      We have added these heterozygote data as figure1J and K in the revised manuscript, with corresponding descriptions in the Results and Methods sections. We have also explicitly noted the recessive nature of the Crz mutation in the Genetic Manipulation subsection, ensuring clarity for readers.

      These results confirm that the Crz knockout phenotype is fully recessive and that one functional copy of the Crz gene is sufficient to maintain normal post-mating responses—supporting our conclusion that CRZ signaling is required for mediating female PMR.

      We thank you again for raising this important point, which has strengthened the genetic rigor of our study.

      (13) Figure 1, Supplementary 1: I do not understand why the authors point out the fact that these are Protostomia. These are all Arthropoda, there is not a single species outside this Phylum. Caerostris darvini should be Caerostris darwini.

      Thank you for this feedback regarding Figure 1 and Supplementary Figure 1. We fully agree and have addressed these issues in the revised manuscript:

      (1) Removal of the "Protostomia" designation

      We have deleted all references to Protostomia from the figure legends and associated text.

      (2) Spelling correction of Caerostris darwini

      We apologize for the typographical error in the species epithet. We have corrected the misspelling Caerostris darvini to the taxonomically accurate Caerostris darwini (Darwin's bark spider) across all instances in Figure 1, Supplementary Figure 1, and their corresponding legends. We have also cross-checked all other species names in the manuscript to eliminate similar typographical errors.

      (14) Line 299: CRZ expression: I found this confounding, given that the authors were talking about the expression of the gene. I would use the term localization, referring to the protein/peptide (is it what the authors were pointing at?).

      To resolve this ambiguity, we have revised Line 299 to replace CRZ expression with CRZ peptide localization, which accurately describes the experimental focus (immunofluorescence staining and confocal imaging of the CRZ protein). We have also cross-checked the entire manuscript to standardize this terminology:

      We use Crz gene expression exclusively when referring to transcriptional analyses (e.g., qRT-PCR results).

      We use CRZ peptide localization when describing the spatial distribution of the protein (e.g., immunostaining assays).

      (15) Figure 2C: The expression is relative to...? I would make it explicit on the axis.

      Thank you for this helpful comment. We apologize that the normalization reference was not sufficiently clear in the original version. In the revised manuscript, we now explicitly state that RT–qPCR data were first normalized to the reference genes Actin and 18SrRNA, and then expressed relative to the mean expression level of the tissue showing the highest Crz expression, which was set to 1. We have clarified this information in the figure legend and the Methods section.

      We have revised Figure 2C as follows:

      Updated the Y-axis label to explicitly state the reference: “Relative Crz gene expression”.

      Added a supplementary note in the figure legend to confirm that relative expression values were calculated using the 2<sup>⁻ΔΔCt</sup> method, with the reference gene serving as the internal control for normalization.

      Additionally, we have cross-checked all other qRT-PCR-related figures in the manuscript to ensure that the reference for relative expression is clearly indicated on the corresponding axes, standardizing this key detail across all gene expression datasets.

      (16) Figures 3B, E, I, L, M, N: Percentage and proportions, as in Figure 1; furthermore, please provide the minimum number of individuals per replicate. Furthermore, as in Figure 1, the data are proportions, and I would use statistical tests that are studied for this kind of data.

      Thank you for this helpful suggestion. We have reviewed and corrected the Y-axis labels and corresponding numerical ranges in these figures, and we have added the number of replicates and the minimum number of individuals per replicate to the figure legends. In addition, following your recommendation, we have reanalyzed these proportion data using chi-square tests for contingency tables.

      (17) Figure 3: As in Figure 1, it would be interesting to know which is the behaviour of the heterozygotes.

      Thank you for suggesting to complement the data in Figure 3 with heterozygote phenotypic analyses.

      To address this, we have conducted additional behavioral and physiological assays of heterozygous CrzR knockout females (+/CrzR<sup>M</sup>) and integrated these data into the revised Figure 3 and its legend:

      Phenotypic characterization of heterozygotes: Across all traits measured in Figure 3 (e.g., remating rate and oviposition efficiency,), +/CrzR<sup>M</sup> females exhibited no significant differences compared to homozygotes.

      This confirms that the CrzR knockout phenotype is dominant and that one functional copy of the CrzR gene can’t to maintain normal post-mating response (PMR).

      Manuscript updates:

      We added heterozygote data in Figure 3I and J. Accordingly, we updated the Results text to reflect the revised panel labeling.

      We supplemented the figure legend with statistical comparisons between heterozygotes and wild-type groups (using chi-square tests for proportional data).

      We included a brief description of heterozygote phenotypes in the Results section to contextualize the genetic basis of the CrzR-mediated PMR regulation.

      (18) Figure 3 Supplement 1: Can the authors indicate which model for maximum likelihood they chose? Did they perform a pre-test to assess which substitution model was the best for their data?

      Thank you for this critical question regarding the model selection for maximum likelihood (ML) phylogenetic analysis in Figure 3 Supplement 1. We fully agree that specifying the substitution model and validation process is essential for ensuring the reproducibility and rigor of phylogenetic inferences.

      To address this, we have supplemented the manuscript with detailed information on the model selection and validation steps, as follows:

      (1) Substitution model selection

      Prior to constructing the ML tree, we performed a model selection pre-test using the ModelFinder tool integrated in IQ-TREE 2, which evaluates the fit of candidate nucleotide substitution models to the CrzR amino sequence alignment via the Bayesian Information Criterion (BIC). The model selection procedure identified the LG+G model as the best-fit substitution model for our dataset. This model uses the Le and Gascuel (LG) amino-acid substitution matrix and incorporates a gamma-distributed rate variation among sites (G) to account for among-site rate heterogeneity.

      (2) Manuscript updates

      We have added this detailed model selection process and the final LG + G model specification to the legend of Figure 3 Supplement 1.

      We have also included information on bootstrap validation (10000 ultrafast bootstrap replicates) to support the node support values reported in the phylogenetic tree.

      (19) Figure 4 Supplement 1: I would be explicit about what it is relative to (which gene).

      Thank you for this helpful comment, In the revised manuscript, we now explicitly state that RT–qPCR data were first normalized to the reference gene Actin, and then expressed relative to the mean expression level of the tissue showing the highest CrzR expression, which was set to 1. This normalization strategy provides a robust and biologically representative reference. We have clarified this information in the figure legend and the Methods section.

      (20) Line 518 and Line 525 and Figure 5: The authors show that injection of CRZ and RNAi of crz or mutant crz has the same effect on male fitness. How do the authors explain this contradiction? The CRZ injection should activate the pathway, and crz RNAi and mutant crz should inhibit the pathway, but nevertheless, they have the same effect. I would probably test the expression of some of the genes whose expression is altered in crz mutant males (next paragraph) to see if an altered CRZ signalling pathway (both ways) might affect gene expression in the MAGs in the same way.

      Thank you for raising this important point. As explained above, we have removed all data related to CRZ function in male BPHs from the current version.

      (21) Figure 5, Figure 7: As in Figures 1 and 3, please pay attention to the percentages and proportions and the statistical tests.

      Thank you for pointing out these issues. We have carefully reviewed and corrected the percentage/proportion labeling in the relevant figures, including the Y-axis descriptions and numerical ranges, as well as revised the corresponding figure legends. In addition, we have reanalyzed the data using statistical tests appropriate for proportion data. All corresponding revisions have been incorporated into the updated manuscript.

      (22) Line 728-745: As already stated for the abstract, the male effect of crz is, to me, a side product, and I am not sure the male crz signalling has something to do with the female crz signalling. It is interesting, nobody showed that CRZ affects expression in the MAGs, but this is not the main message of the paper, and it confuses the reader. I would reduce the discussion about this aspect and move it to the end, but this is my own take.

      We have removed all data related to CRZ function in males for the reasons outlined above.

      (23) Material and methods/results: as a general suggestion, I would be explicit about the timing of receptivity inhibition in the species. I've seen the authors have established this in precedent work, and I would refer to that work and make the reader aware of how the receptivity works in the species (i.e., that it is not permanent and lasts for a few days after first mating). This allows a better understanding of the experimental design.

      Thank you for this valuable and constructive suggestion. We fully agree that explicitly describing the timing of receptivity inhibition in Nilaparvata lugens, and linking it to our earlier work, will strengthen the rigor and clarity of the manuscript.

      To address this, we have revised the Materials and Methods and Results sections as follows:

      (1) Materials and Methods (Experimental Design subsection)

      We have added a dedicated paragraph that explicitly defines the temporal dynamics of post-mating receptivity inhibition in N. lugens, with direct reference to our prior work[1]. The text clarifies:

      “In N. lugens, mating induces a transient suppression of female receptivity that is not permanent. Females typically start regain remating willingness 72 h after the first mating, as documented in our previous study[1]. This temporal window guided the design of our remating assays, in which females were paired with naive males at 48 h post-initial mating to capture both the suppressed and recovered phases of receptivity.”

      (2) Results (Post-mating Receptivity section)

      We have incorporated a brief contextual sentence at the start of the section to reinforce this key species-specific trait, ensuring that readers connect our assay timings to the temporal dynamics of receptivity in N. lugens.

      These revisions ensure that the rationale behind our experimental timing is transparent and well-supported, allowing readers to fully grasp how our assays were tailored to the biological characteristics of N. lugens.

      (24) Line 854: There is a typo "CRZ peptide. virgin female", the dot should be a comma.

      We have revised Line 854 to correct the punctuation: the dot has been replaced with a comma, resulting in the phrasing "CRZ peptide, virgin female". In addition, we have changed the wording in this sentence to ensure scientific rigor and to avoid colloquial expressions.

      (1) Zhang, Y.J., Zhang, N., Bu, R.T., Nässel, D.R., Gao, C.F., and Wu, S.F. (2025). A novel male accessory gland peptide reduces female post-mating receptivity in the brown planthopper. Plos Genet 21, e1011699. 10.1371/journal.pgen.1011699.

    1. eLife Assessment

      This study addresses an important question about how large-scale brain networks interact, and specifically how the default mode network exchanges information with the sensory cortex. The analyses are sophisticated, but at present provide incomplete evidence for the claims made in the paper.

    2. Reviewer #1 (Public review):

      Summary:

      This paper leverages 7T fMRI data from the Natural Scenes Dataset to investigate whether retinotopic coding, the position-selective organization of visual response structures, spontaneous resting-state interactions between the Default Network (DN) and the Dorsal Attention Network (dATN). Using individualized network parcellations and population receptive field (pRF) modeling, the authors show that DN voxels can be split into two subpopulations based on their response to visual stimulation: those with position-specific positive BOLD responses (+pRFs) and those with position-specific negative BOLD responses (-pRFs). Critically, these subpopulations relate differently to the dATN during rest: -pRFs are anticorrelated with the dATN, +pRFs are positively correlated, and non-retinotopic DN voxels show no coupling. The anticorrelation (and positive correlation) is enhanced when DN and dATN voxels share visual field preferences. An event-triggered analysis suggests that retinotopic coding shapes both "top-down" (DN-initiated) and "bottom-up" (dATN-initiated) spontaneous activity transients, supporting the claim that the retinotopic scaffold is intrinsic to the DN. These findings challenge the prevailing view of global DN-dATN antagonism and suggest retinotopic coding as an organizing principle for cross-network communication.

      Strengths:

      The central finding that what looks like network-level independence between DN and dATN decomposes into structured, bivalent interactions organized by voxel-level visual field preferences is a compelling demonstration that macro-scale network descriptions can hide meaningful substructure. The logic of the analysis is clean: pRF properties are estimated from retinotopic mapping data and then used to predict resting-state coupling in completely independent scanning sessions. This cross-session, cross-modality design rules out many circularity concerns.

      The use of individualized multi-session hierarchical Bayesian parcellation (Kong et al.) to define DN and dATN boundaries within each subject is the right methodological choice for this question. Network boundaries in posterior cortex, where DN and dATN interdigitate most closely, vary considerably across individuals, and group-average approaches would introduce exactly the kind of misassignment that would most confound the result.

      The matched-vs-random pRF analysis is well-controlled. The authors demonstrate that cortical distance between matched and randomly-matched dATN pRFs does not differ, effectively ruling out spatial proximity on the cortical surface as a confound. tSNR controls further show that signal quality differences do not drive the effect.

      The event-triggered analysis (Figure 3) is creative and adds genuine value. Showing that retinotopically-specific coupling persists during DN-initiated activity transients, not only dATN-initiated ones, is the key piece of evidence for the claim that the code is intrinsic to the DN rather than passively inherited through bottom-up visual drive.

      The result is observed consistently across all individual participants, which provides strong evidence for the robustness of the qualitative pattern despite the small sample size inherent to densely-sampled designs.

      Weaknesses

      (1) The nature of negative pRFs requires more scrutiny

      The entire interpretive framework depends on treating negative pRFs in the DN as genuine position-selective neural responses (suppression). However, negative BOLD signals are well known to arise from non-neural sources, specifically, vascular stealing (where activation in nearby tissue diverts blood from adjacent voxels) and macrovascular draining vein effects that produce spatially displaced signal inversions. These concerns are amplified at 7T, where T2*-weighted GE-EPI carries substantial macrovascular weighting. The DN and dATN interdigitate extensively in the posterior cortex, often within millimeters. A negative pRF in a DN voxel adjacent to a positive dATN voxel could, in principle, reflect the hemodynamic shadow of its neighbor rather than an independent neural response.

      The spatial dispersion control (matched vs. random pRFs have similar cortical distribution) is valuable but addresses long-range confounds, not *local* hemodynamic crosstalk. The reliability of sign and center position across runs is reassuring but does not exclude a vascular origin, as vascular architecture is itself stable across sessions. I would encourage the authors to test whether the matched-vs-random effect survives exclusion of voxels near large pial vessels (identifiable from T2* contrast or the venograms available in the NSD). These analyses would not be dispositive, but they would meaningfully strengthen the neural interpretation.

      (2) Amount of retinotopic mapping data and choice of pRF pipeline

      The NSD includes 6 runs of retinotopic mapping (~5 minutes each; 3 bar-aperture, 3 wedge/ring). The authors use only the 3 bar-aperture runs (~15 minutes total per subject) and fit their own pRFs using AFNI's 3dNLfim procedure, rather than using the pRF estimates provided as part of the NSD release (which were fitted using the analyzePRF toolbox with all 6 runs).

      Fifteen minutes of bar data is quite limited for reliable voxel-wise pRF estimation, especially in regions far from the early visual cortex, where signal-to-noise is inherently lower. Standard recommendations for robust pRF mapping in higher-order regions generally suggest substantially more data. The variance-explained threshold is close to the noise floor by design, meaning that a non-trivial number of the "retinotopic" DN voxels may be poorly estimated. Given that the core analyses depend on both the sign and the center position of these pRFs, the limited data is a significant concern.

      The authors do not explain why they chose to re-fit pRFs rather than use the NSD-provided estimates. If the motivation was methodological (e.g., the NSD pRF pipeline does not readily yield signed amplitude, or the bar-only fits were judged more appropriate for detecting negative responses), this should be made explicit. If the NSD-provided pRFs can reproduce the key findings, this would substantially increase confidence in the results. If they cannot, that divergence itself would be important to understand. I would ask the authors to address this choice and, if feasible, to report whether the core results replicate using the NSD-provided pRF estimates and/or whether using all 6 runs of retinotopy data changes the findings.

      (3) pRF model adequacy for the Default Network

      The isotropic Gaussian pRF model was developed for and validated in early and mid-level visual cortex, where it captures the dominant spatial selectivity of neuronal populations. In DN voxels where the model explains comparatively little variance, it is less clear that the model is capturing the right quantity. Specifically, the negative pRFs could conceivably be described by a model with a dominant suppressive surround (e.g., a difference-of-Gaussians model), in which what appears as a "negative pRF" in the standard model is actually the surround component of a center-surround mechanism whose center is poorly resolved. This distinction matters: a genuine inverted code (negative center response) implies a qualitatively different computation than inherited surround suppression from nearby visual cortex.

      The authors should consider discussing why the standard model is sufficient for the questions asked, or ideally, testing whether the sign distinction survives under alternative pRF model specifications.

      (4) Interpreting resting-state transients as top-down vs. bottom-up

      The event-triggered analysis labels high-amplitude DN pRF activations as "top-down events" and dATN activations as "bottom-up events." This is a reasonable inference given experience-sampling studies showing that rest involves alternation between internal and external attention, but it remains an inference. Without concurrent experience sampling, eye-tracking, or physiological monitoring, we cannot establish that a spontaneous DN transient reflects memory retrieval or internally-directed thought rather than a global arousal fluctuation. Similarly, dATN transients during rest could reflect covert shifts of spatial attention to remembered or imagined locations rather than bottom-up processing per se. I would ask the authors to soften this framing or to discuss what additional data would be needed to validate the top-down/bottom-up attribution.

      (5) The "retinotopic code" vs. "visual field bias" distinction

      The paper uses the language of a "retinotopic code" throughout and correctly distinguishes this from a "retinotopic map," noting that DN voxels do not form a continuous topographic representation on the cortical surface. This distinction deserves greater emphasis. In vision science, retinotopic maps carry computational significance through their topographic continuity and relationship to cortical wiring. A distributed collection of voxels with coarse visual field preferences but no cortical topography is a fundamentally different organizational feature. Recent reviews have drawn an explicit distinction between *retinotopic maps* and *visual field biases* (Groen, Dekker, Knapen & Silson, TiCS 2022), and the present findings may be more accurately characterized as the latter. Perhaps the authors think that the distinction is merely a signal-to-noise distinction, in which case I would invite them to clearly speak to this interpretation. In any case, this is not a criticism of the findings themselves, but clarity on this point would prevent conflation of two different organizational principles and would help position the work for both the vision and network neuroscience communities.

    3. Reviewer #2 (Public review):

      Summary:

      Using a public dataset of retinotopic mapping and resting-state data, the authors find that the default mode network has voxels that respond (positively or negatively) to visual stimulation at specific retinotopic positions, and that resting-state activity in these voxels is correlated with activity in more traditional sensory voxels with the same visual-location preference. The retinotopic specificity is bidirectional, such that high activity in default mode voxels drives activity only in voxels with matching receptive fields in sensory cortex, and vice versa. These findings are at odds with traditional views of the default mode network as having abstract (non-retinotopic) representations and competing (rather than cooperating) with external sensory representations.

      Strengths:

      This study continues an intriguing line of research about how default mode regions interact with the sensory cortex. Demonstrating that there are structured interactions between these regions at rest, and that these interactions are in fact organized according to retinotopic location (as opposed to traditional views of representational format in the default mode network), provides a new framework for thinking about large-scale internal and external brain networks. The authors make use of a well-powered public dataset that allows for precise estimates of pRFs and individual-specific resting-state networks, and develop a number of interesting analyses that characterize the relationships between DN and dATN voxels. The findings are exciting and could have a major impact on future studies in cognitive neuroimaging.

      The authors mention that these findings could shed light on internal/external interactions such as "anticipatory saccades or memory-guided attention," which is true, though I would argue that constructing DN representations of external stimuli is in fact even more fundamental than these specific cases (e.g., see Barnett and Bellana, 2025, "Situation models and the default mode network"). The "highways" identified in this study could play a vital role in real-world perceptual processes that are constantly translating external input into internal mental models.

      Weaknesses:

      (1) The criterion used for defining voxels as retinotopic seems very liberal. The authors show that only 5% of voxels have R^2>0.14 in a null analysis, and therefore define voxels with R^2>0.14 as retinotopic. Although all the networks in 1C show voxel distributions that differ from the null, the number of false positives above R^2>0.14 seems problematic, especially for the DN positive pRFs (red distribution) and to a lesser extent the DN negative pRFs (blue distribution). From visual inspection of the plot, the false discovery rate (fraction of voxels labeled as retinotopic that are false positives) looks like it would be greater than 50% for the DN-positive pRFs. The authors do show that the positive pRF voxels have above-chance consistency across runs, again providing evidence that there are true positive voxels in this set, but perhaps a stricter criterion (such as having consistent negative fits across runs) would provide more targeted identification of the DN voxels with true retinotopic sensitivity.

      (2) The claim that "opponency at rest between the DN and dATN appears to be driven by the subset of DN voxels with negative retinotopic tuning" is not well supported. The fraction of DN voxels with negative pRFs is small: 9.42% of DN voxels have pRFs, and 58.77% are negative, so about 6% of DN voxels have negative pRFs. The fact that any DN voxels have negative pRFs is notable, but the authors do not provide evidence that these 6% are driving the overall behavior of the DN. They do show (e.g., in Figure 2B) that negative and positive pRFs have opposing influences, but the overall correlation with dATN does not look similar to the negative pRF connectivity. I'm also unsure whether "opponency" is a reasonable description for two networks that are "independent (i.e., not correlated)" in this analysis.

      (3) The event-triggered analysis is effective at testing the bidirectional relationship between DN and dATN, with high activity in either network triggering a response in the other network. However, it would be helpful to show more validation that these "events" are meaningful windows of time to study. First, is 13 TRs a typical length of time that activity is elevated during one of these events? Second, the top-down and bottom-up terminology is perhaps too loaded and not well-justified; if the negative pRFs in the DN reflect a meaningful coding system, then couldn't low (rather than high) activity indicate a top-down event?

      (4) The framing of this paper relative to the authors' past week, such as Steel et al. 2024 ("A retinotopic code structures the interaction between perception and memory systems"), could be improved. The existence of negative pRFs in the DN and a functional relationship between these pRFs and the sensory pRFs have already been described in prior work. My understanding of the primary novelty here is that this paper examines resting-state data, showing that there are widespread spontaneous interactions between broad internal and external networks, but this distinction is not made explicit in the Introduction.

      (5) The definition of the default mode (DN) in this study aligns with past research, but the definition of the dorsal attention network (dATN) seems at odds with standard terminology. For example, the authors cite Fox et al. 2006, which depicts the dATN as including regions such as IPS, FEF, SMA, and MT+. Here, however, the "dATN" seems to be primarily lateral and ventral visual cortex (e.g., Figure S5). The exact location of these sensory pRFs is not critical to the authors' claims, but this labeling seems incorrect, and the motivation for defining/selecting the sensory network in this way is not described.

    4. Reviewer #3 (Public review):

      Summary:

      This paper addresses an important question (the relationship between DN and dATN, and the role of retinotopic coding) and uses a set of novel analyses.

      Strengths:

      Important question, novel analytical approaches (pRF-informed functional connectivity analysis).

      Weaknesses:

      Some of the key claims are not fully supported by the data presented. There is also a concern about over-interpretation of the results. Key issues:

      (1) The authors claim that retinotopic coding scaffolds the interaction between DMN and dATN. However, retinotopically tuned voxels account for a mere 9% of DMN voxels. So this appears to be a major overstatement. For instance, the statement that "these findings would position retinotopy as a unifying framework for brain-wide information processing" is not justified given the presented data.

      (2) Given that positive pRF voxels in DMN positively correlate with dATN voxels and negative pRF voxels in DMN negatively correlate with dATN voxels, there is a concern that these results could be contributed to by imprecise brain network parcellations. E.g., could some of the positive pRF voxels in DMN be erroneously assigned to DMN and actually belong to one of the other task-positive networks? There is insufficient validation of network parcellation to put this worry to rest, especially since it depends on ICA, which has a degree of arbitrariness built in.

      (3) The claim that retinotopic coding is intrinsic to the DN network is not supported by rigorous analysis and results. The analysis here has many arbitrary factors, including: the threshold of the 99th percentile of resting-state distribution; the designation of DN as "top-down" and dATN as "bottom-up"; the definition of "anti-matched" voxels instead of using randomly selected voxels; and the statistics being paired between matched and anti-matched voxels instead of using comparisons to baseline. Overall, I do not think that the result supports the conclusion that retinotopic coding in DN is intrinsic instead of being bottom-up-driven, given the very high threshold (99%) used and the fact that many other networks could also send bottom-up input to DN. Furthermore, the idea that bottom-up inputs only occur when the dATN (or any other RSN)'s spontaneous BOLD activity is above a certain threshold is a huge and unvalidated assumption.

    1. eLife Assessment

      This important study addresses a discrepancy between population-level growth laws and single-cell correlations. It shows, for flagellar and synthetic genes in E. coli, that while gene expression of certain genes reduces population-average growth, expression levels positively correlate with growth at the single-cell level. The measurements are mostly convincing, and the proposed mechanism-inheritance of growth factors such as ribosomes during asymmetric division- explains this observation. The theoretical analysis would benefit from clearer explanations and robustness checks.

    2. Reviewer #1 (Public review):

      Summary:

      Garcia-Alcala, Kratz and Cluzel investigate to what extent our understanding of bacterial physiology in bulk experiments can be applied to single-cell observations. They find that intrinsic noise may be powerful enough to even inverse the trends found in the bulk. The authors hypothesize that the asymmetric distribution of ribosomes to daughter cells during cell division plays the dominant role in the intrinsic noise and is able to generate the observed phenomenon. They do not show it directly, but the data and its agreement with the model are sufficient to support this claim.

      Strengths:

      The experimental part is convincing: the positive correlation between the elongation rate and promoter activity of unnecessary protein is clear, as well as the negative correlation between the mean values while changing the promoter strength. This was demonstrated in both rich and poor media. The causality between the growth rate and the promoter activity was shown using the negative lag time of the cross-correlation function. A simple, reasonable model accounts well for the data. This paper demonstrates an interesting phenomenon and provides a plausible theory for it, advancing our understanding of bacterial physiology on the single-cell level.

      Weaknesses:

      (1) Mean-reversion timescales were assumed to be longer than the simulation time and much longer than the cell cycle time. It is not clear whether the results are robust in case mean-reversion timescales become of the order of the cell-cycle or smaller. If not, is there an argument for such practically infinite reversion timescales?

      (2) It is not easy to understand the simulation part unless one reads Ref. [14]. k(t) is assumed Equation (1) from Reference [14]? Is it crucial that the ribosome noise appears only at the division? The ribosome noise strength \sigma_R=0.06 - is it lower or higher than the naively expected binomial division? Also, a more intuitive explanation of the Simpson paradox would help the reader.

      (3) It would be useful for the reader to see the raw data and not only the filtered one to appreciate the measurement noise level.

      (4) Negative lag time of the cross-correlation function is visible, but consider adding a statistical test for it.

      (5) Can you make similar cross-correlation plots using the model? Can you infer by using it, whether the data agrees better with the assumption that ribosomal noise appears only at division or continuous fluctuations during the cell cycle?

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by Garcia-Alcala et al. reports an interesting paradox: the cost of gene expression slows the population-average growth rate, whereas at the single-cell level, expression levels from these genes positively correlate with the growth rate. The effect is observed in the expression of flagellar genes and a gene under a synthetic promoter in E. coli. The findings are explained by the inheritance of growth factors, including ribosomes, during asymmetric division.

      Strengths:

      (1) The manuscript adds strength to an emerging body of literature showing that the population-level bacterial growth laws do not match correlations based on single-cell data. The evidence presented here is more striking than in previous works (such as Pavlou et al., Nat. Commun. 2025), as the trends in population-level data and single-cell data are reversed.

      (2) A relatively simple model correctly explains the trends in the data.

      Weaknesses:

      (1) It is not clear whether flagellar proteins are expressed proportionally to the reporter signal. Furthermore, it is questionable if E. coli bacteria in the mother machine channels are flagellated. If they are, they could potentially swim out of the channels, which is not the case when they do not carry the MotA E98K mutation. The authors should provide some evidence that E. coli expresses the actual filament proteins in the channels.

      (2) It is unclear what fraction of the total proteome mVenus represents in different measurements. Some quantification is needed (for example, using the Coomassie staining). Using f_U as high as 14.4% in simulations is questionable.

      (3) The data from the MC4100 strain does not directly match the trends of MG1655. The justification for filtering out the low-frequency components of MC4100 is not particularly convincing. It appears unlikely that ribosomes or other growth factors partition significantly differently in the MC4100 strain than in the MG1655 strain. Further discussion and a plot similar to Figure 1 (Left) for this strain are warranted.

      (4) The model needs to be described in more detail. A closed set of equations that has been simulated must be presented, along with all values of the model parameters and their sources. The authors should consider depositing their code on GitHub or another publicly accessible repository.

    1. eLife Assessment

      This important study measures single-unit activity in the middle temporal area (MT) of awake-behaving monkeys to test the idea that sensory adaptation contributes to flexible evidence accumulation during decision-making. Solid evidence is provided, showing that adaptation to different temporal contexts shapes both perceptual judgements and neural responses, but analyses aimed at establishing a direct link between them are less persuasive. This work has the potential to be of interest to a broad range of researchers working on visual perception, plasticity, and decision making.

    2. Reviewer #1 (Public review):

      Summary:

      Effective decision-making in dynamic environments requires the brain to flexibly adjust how sensory evidence is accumulated over time, a process often modeled as an adaptive "leak." McGaughey and Gold propose that this flexibility is not solely a property of downstream integrators but is also supported by stimulus-specific sensory adaptation in the middle temporal area (MT). By recording single-unit activity in rhesus macaques during a motion direction-discrimination task, the authors found that more rapidly changing environments lead to reduced sensory encoding and discriminability in MT, which they argue accounts partially for a "leakier" integration. Furthermore, the study identifies pupil-linked arousal as a parallel, independent mechanism contributing to this adaptive process.

      Strengths:

      The study addresses an important question in cognitive neuroscience by exploring the neural substrates of perceptual flexibility. A major strength is the novel focus on how sensory adaptation, rather than just downstream integration, contributes to behavioral changes in dynamic environments. By shifting the perspective toward the encoding stage, the authors provide a more comprehensive account of how the brain manages evidence accumulation. This conceptual advance is supported by a rigorous experimental approach that combines human-like psychophysics with large-scale single-unit recordings in the middle temporal area (MT) and pupillometry.

      Weaknesses:

      (1) Alternative mechanisms for performance differences

      The authors assume that the difference in performance between the low-switch (LS) and high-switch (HS) frequency conditions is explained by a change in the "leakiness" of integration. However, several other mechanisms could potentially explain this effect:

      (i) Temporal Uncertainty: Integration might start later in the HS condition, leading to lower performance.

      (ii) Reduced Efficiency: Integration could be less efficient in the HS condition (i.e., lower signal-to-noise ratio) without a change in the leak parameter itself.

      (iii)Evidence Contamination: Motion information from the adapting stimulus in the HS condition may be integrated rather than ignored, which might be the case since the transition from the adapting to the test stimulus is not externally cued.

      To distinguish between these alternatives, I suggest two possible analyses. First, a formal model comparison could be performed, though I acknowledge this may be inconclusive in the absence of response-time data. Second, an analysis of motion energy kernels could be revealing; the leak hypothesis makes the specific prediction that for long test stimuli, early samples should contribute more to the choice in the LS condition than in the HS condition, relative to late samples.

      (2) Independence of neural and pupil-linked signals


      The authors take the lack of session-wise correlation between context-dependent contributions from neural and pupil terms as evidence that these two signals provide independent contributions to the behavioral effect. However, could this lack of correlation simply be a result of high variability or noise in these estimates? The data shown in Figure 7B suggests that measurements are very noisy, which might obscure a potential relationship.

    3. Reviewer #2 (Public review):

      McGaughey & Gold trained rhesus macaque monkeys to perform a motion-direction discrimination task in which a behaviorally irrelevant adapting stimulus with either fast or slow direction alternations preceded a variable-duration test stimulus, while simultaneously recording single-unit activity in area MT and pupil diameter. They report that adaptation to the more rapidly changing stimulus was associated with reduced behavioral sensitivity, attenuated test-evoked MT responses, and larger pupil-linked arousal signals. The authors interpret these behavioral changes as evidence for a more "leaky" evidence-accumulation process, and argue that this apparent leak is implemented in part through context-dependent sensory adaptation in MT and in part through arousal-related mechanisms. More broadly, they conclude that flexible evidence accumulation in dynamic environments arises from distributed adjustments across sensory encoding and neuromodulatory systems rather than solely from changes within a downstream accumulator. If correct, this interpretation has significant implications not only for our understanding of the neural mechanisms of perceptual decision-making but also for broader theories concerning the functional role of sensory adaptation.

      The conclusions of the paper are mostly well supported by the data. Evidence for robust adaptation-induced changes in sensory encoding, behavior, and pupil dynamics is convincing, but further clarification and refinement are needed to establish a clear mechanistic link between these effects and decision-making processes.

      Aspects of the behavioral analysis would benefit from a tighter connection between theoretical claims about evidence accumulation and the empirical features of the psychometric functions. For example, the rightward shifts observed across adapting conditions are interpreted as consistent with a reset of accumulation on switch trials, but similar patterns could also arise from failures to detect the test stimulus on a subset of trials, leading responses to default to the final adaptor direction. Likewise, changes in psychometric slope and asymptote are attributed to differences in evidence accumulation without explicit modelling or consideration of alternative explanations. Clarifying how specific features of the psychometric functions map onto distinct components of the decision process will strengthen the link between the theoretical framework and the behavioral data.

      A slight concern is the lack of a consistent analytical approach for relating behavioral changes to neural and pupil-linked measures. Different sections of the manuscript rely on different behavioral metrics-such as differences in accuracy within a selected stimulus-duration range (e.g., Figure 5C) or psychometric slope differences (Figure 6C) - without clear justification for these choices. The analytical approach likewise varies between simple correlational analyses (Figure 5C, Figure 6C), pseudo-experimental group comparisons (Figures 5D, E), and the inclusion of neural or pupil terms in the behavioral psychometric regression model (Figure 7B). While each metric and approach may be defensible in isolation, adopting a more consistent framework will help convince readers that the reported effects are robust and not contingent on the selective choice of metric or analysis.

    4. Reviewer #3 (Public review):

      Summary:

      Environments change over time; therefore, optimal decision-making ought to discount older observations of the environment in favor of newer ones in a manner consistent with the amount of temporal instability. Computational models of perceptual decision-making model this temporal discounting with a 'leak' parameter that determines the rate at which older information is discarded. In this study, McGaughey and Gold examine the neurophysiological mechanisms that could underlie adaptation to different degrees of temporal instability. They developed a novel variant of the well-established perceptual decision-making random-dot-motion paradigm, in which the stimulus being evaluated was preceded by an 'adapting' stimulus with either high or low temporal stability. When the test stimulus was preceded by the adapting stimulus with lower temporal stability, NHPs showed reduced psychometric slopes, indicative of increased temporal discounting ('leak'). While the NHPs performed this task, single-unit neural activity was recorded in area MT, along with pupillometric data. The authors use these neural and pupil datasets to investigate two potential sources of adaptive discounting under varying amounts of temporal instability: sensory adaptation (changes in instantaneous evidence encoding), and arousal-related changes in evidence accumulation. MT neurons respond differently to the test stimulus under conditions of high vs low temporal stability of the adapting stimulus - when the adapting stimulus is more stable, MT neurons have larger and more selective responses to the test stimulus. In addition, evoked pupil responses to the test stimulus were modulated by the adapting stimulus. Both the strength of the difference in MT responses across contexts and the difference in pupil diameter across contexts were correlated with context-dependent modulation of the monkeys' behavior over sessions. The paper concludes that both sources appear to independently contribute to adaptive evidence accumulation, likely operating at different processing stages in the brain.

      Strengths:

      (1) While computational models of perceptual decision-making have been very useful for explaining behavior and neural responses in decision-making areas, we are still in search of some of the neural mechanisms that could implement such models. Studies such as this one, which aim to identify neural correlates of simplified model parameters, are quite crucial.

      (2) Analysis is generally careful and well-executed.

      (3) Prompts some interesting follow-up questions that could be answered with simultaneous recordings and causal manipulations, as the authors state in the Discussion - e.g., which areas are affected by arousal-related neuromodulation correlated with evoked pupil size and how.

      Weaknesses:

      (1) The task design may not be optimal. While the amount of time the monkey is exposed to each motion direction during the adapting stimulus is matched, it's hard to know if the reduced MT responses to the test stimulus are truly due to the greater frequency of switches during the HSF adapting stimulus or because the monkeys have been exposed to more repetitions of the stimulus. It's increased sensory adaptation in either case, but it makes it problematic to interpret this as temporal context-dependent adaptation specifically. I think this could potentially be partially addressed by an analysis that is in the paper, but could potentially be emphasized/fleshed out more, specifically the results shown in Figure 4D that seem to show that most of the reduction in neural response for adapting units occurs between the first and second stimuli.

      (2) The pupillometric analysis seems to be an indirect way of assessing whether the accumulator itself might be modulated by temporal context, but the link could be made clearer. The authors show that context-dependent behavior is related to pupil size, which is related to arousal/neuromodulation, but it would be helpful to have some idea of what neural mechanisms underlying adaptive decision-making are actually impacted by this neuromodulation. Lacking neural data to address this question (e.g., from a brain region proposed to be involved in the accumulation process), at least more discussion of this would be helpful. Essentially, I'm unsure of how to interpret the pupil results: the argument that temporal context affects instantaneous evidence encoding in MT that then drives the accumulator is very clear, but I am a bit confused about what, mechanistically, I should think about the effect of neuromodulation doing.

    1. eLife Assessment

      The valuable study aims to differentiate between foveal and peripheral attentional mechanisms in visual and frontal brain regions in monkeys engaged in a free-gaze visual search task. The authors interpret differences in responses between target and nontarget conditions as feature-based attention; however, this may not be the correct interpretation. The authors do not provide enough information on how they distinguish foveal and peripheral RFs. Consequently, the study provides only incomplete evidence that does not support the authors' conclusions, and the significance of the findings is not strong.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript aims to differentiate between foveal and peripheral attentional mechanisms in visual and frontal brain regions in monkeys engaged in a free-gaze visual search task.

      Strengths:

      The manuscript is clearly written, the question is important, and the behavioral task is interesting.

      Weaknesses:

      I have two major concerns.

      (1) The authors interpret divergence in neural responses to target vs nontarget as attention. But it is not. The subject has to attend to both target and nontarget stimuli to determine the stimulus category and thereby decide on the next action. Thus, divergence between target and nontarget responses could reflect categorical discrimination, but I am not sure this can be interpreted as attentional modulation. While it may be tempting to suggest that finding a stimulus of a specific category is "feature attention", analogous to, e.g., attending to the red stimulus, I don't believe this is correct. For the former, the animals have to attend to a stimulus, and examine the stimulus to determine the stimulus category, unlike a simpler discrimination, which may pop out. Given this, I am unconvinced that the interpretations in this manuscript are valid.

      (2) Regarding the RF classification of foveal and peripheral RFs for IT and PFC, prior work suggests that neurons in IT cortex (especially AIT) and PFC have RFs that largely include the foveal visual field. So, it would be important to include figures that show the RFs of neurons classified as foveal versus peripheral for all three areas.

    3. Reviewer #2 (Public review):

      Summary:

      In natural visual behavior, such as when one is looking for a face in the crowd, the eyes are moved from site to site, seeking possible matching targets. This involves attention both to the current view at the center of vision (the foveal location) as well as to upcoming views via attention to targets in the periphery. While it has been established that attention generally enhances neuronal response (compared to simple visual activation) at the attended spatial location, this study provides solid evidence that attention during active visual search leads to neuronal response enhancement only when the eye moves towards targets that exhibit the desired feature and category. This study thus moves the field towards understanding the neural encoding of active vision.

      This study examines the neuronal basis of feature-selective attention during active, freely behaving visual search. Traditional electrophysiological studies on visual attention in monkeys commonly used an eye fixation with a covert attention paradigm, but have not sufficiently addressed the roles of both foveal and peripheral attention in play during natural looking behavior. Here, the authors present a novel paradigm in which, during eye-movement mediated search, neuronal receptive fields are recorded in multiple cortical areas (sensory V4, temporal, and prefrontal areas). In this manner, as the eye foveates, items in the array fall into foveal or non-foveal recorded sites. Thus, the experimental paradigm is elegant, offering the opportunity to make multiple types of comparisons: target/distractor, towards/away from fovea, and areal. Specifically, following a category cue (face, house, hand, flower), freely initiated saccades are made to locate a categorically matching 'target' in an array of distractors. Feature attention is assessed by comparing eye saccades made to targets vs to distractors. Spatial attention is assessed by comparing saccades made 'towards' vs 'away' from targets. Statistics are rigorous and nicely designed. The detailed association of simultaneously obtained eye movement sequences and neural parameters is well done. These are valuable data that will contribute to our understanding of attentional modulation in visual search.

      Strengths:

      The significance of these findings is fundamental. Decades of attention research in vision have been based on the paradigm of visual fixation and covert peripheral attention. However, increasingly, the field has moved towards understanding how the visual system works during active vision. Here, the authors use an active visual search paradigm and record from multiple areas (V4, IT, PFC). They find enhancement of attention both in the foveal and peripheral locations, and, furthermore, a high degree of feature and categorical specificity. This provides valuable data for the concept of a foveal-peripheral attentional window in natural vision. The controls (comparisons of neuronal response during looks to targets vs distractors, and looks towards and away from the target) and statistical rigor make these findings quite compelling.

      Weaknesses:

      While the study is generally quite strong, there are a few weaknesses to be addressed.

      (1) Little rationale is provided for recording in the selected areas, V4, IT, and PFC. Given the respective roles in sensory, object recognition, and goal-directed behavior, some rationale for this design should be offered, and commonalities/distinctions between these areas should be discussed.

      (2) Given the reliance of all analyses on saccadic behavior (towards target/distractor, towards/away from target), additional description and summaries of eye movement behavior during single trials and across trials should be provided.

      (3) The dependency of findings on top-down (categorical & feature-specific) task design should be discussed.

    4. Reviewer #3 (Public review):

      In this manuscript, the authors investigate the role of attention in foveal processing during a naturalistic task. They record neural activity from extrastriate visual areas V4 and inferotemporal cortex, as well as from the lateral prefrontal cortex, in macaques performing a free-gaze visual search task. In this task, animals searched for a face or house target among multiple complex stimuli, with no constraints on eye movements. Unlike classic studies of visual attention, which often rely on controlled fixation, this work examines neural activity in both foveal and peripheral receptive fields during naturalistic eye movements.

      The main question addressed by the authors is how feature-based attention is distributed and coordinated across foveal and peripheral visual fields during active search, and how this attentional processing influences saccade behavior. The authors show that foveal units in visual areas exhibit feature-based attentional enhancement, with stronger responses when a fixated stimulus is a target compared to when the same stimulus serves as a distractor. Peripheral units in visual and prefrontal areas show both feature-based and spatial attentional modulation, consistent with prior work. Finally, the authors show that attentional modulation depends primarily on stimulus category rather than response magnitude, with neurons showing similar enhancement for all images within the target category regardless of how strongly individual images drive the cell.

      There are several notable strengths of this paper, including:

      (1) Disentangling feature-based and spatial attention during naturalistic vision remains a central challenge. This paper tackles both simultaneously, parsing neural populations by object selectivity (face-selective, house-selective, non-selective) and RF position (foveal vs. peripheral).

      (2) The unconstrained search task (Figure 1A) moves beyond the dominant fixed-gaze, cued-attention designs (Zhou & Desimone, 2011) to study attention as it operates during natural behavior, with sequential fixations and voluntary saccades.

      (3) The scale of the multi-area recordings is a major strength and is well aligned with current trends in primate and human neuroscience toward large-scale, multi-area recordings. Simultaneous recordings from visual and prefrontal areas, comprising over 4,900 foveal units and more than 1,500 peripheral units, enable meaningful cross-area latency comparisons and area-specific analyses of attentional modulation. This study builds on the authors' previous analyses of this dataset by expanding the scope to show that feature-based attention generalizes across neuronal classes and operates on categorical identity rather than response magnitude.

      (4) The combination of simultaneous multi-area recordings and a rich behavioral paradigm provides a dataset that is well-suited for population decoding, cross-area interaction analyses, and trial-by-trial prediction of saccade choices, which could substantially deepen mechanistic understanding beyond the largely univariate comparisons presented here.

      While the data broadly support the paper's main conclusions, several issues limit the strength of the mechanistic interpretation and should be taken into consideration:

      (1) Receptive field size is not explicitly quantified and may confound foveal-peripheral comparisons. Units are classified as foveal or peripheral based on responsiveness to the cue versus the search array (Methods, p. 17), but the manuscript lacks essential information about receptive field sizes, eccentricities, and the number of search stimuli falling within each receptive field and related proper controls. This is critical because receptive fields in visual area V4 at foveal eccentricities are relatively small (Gattass et al., 1988; Desimone & Schein, 1987), whereas receptive fields in inferotemporal cortex can span several degrees to tens of degrees and often include the fovea (Op de Beeck & Vogels, 2000; DiCarlo & Maunsell, 2003; Zoccolan et al., 2007). Given the 2{degree sign} × 2{degree sign} stimulus size, multiple search items could potentially fall simultaneously within peripheral receptive fields. This introduces a potential confound, as attentional modulation is known to be strongest when multiple stimuli appear within a single receptive field (Reynolds et al., 1999). Although the authors acknowledge this issue for visual area V4 (p. 17), it is neither quantified nor controlled for. Without explicit receptive field mapping relative to the search array, comparisons between foveal and peripheral units, as well as between visual areas, are difficult to interpret cleanly.

      (2) Attentional modulation is difficult to dissociate from saccade planning and decision-related signals. The free-gaze paradigm enhances ecological validity but introduces a temporal confound: mean distractor fixation durations are approximately 156 ms (p. 9), while attentional effects emerge between 137 and 170 ms after fixation onset (Figure 2). As a result, the reported attentional modulation coincides with the preparation of the subsequent saccade. Neural activity measured in the primary analysis window (150-225 ms; p. 19), therefore, likely reflects a mixture of visual, attentional, motor planning, target recognition, and behavioral relevance signals, all of which are known to modulate responses in visual areas at similar latencies (e.g., Chelazzi et al., 1998). Moreover, target fixations (~257 ms) and distractor fixations (~156 ms) occur on fundamentally different behavioral timescales, which may inflate apparent foveal attentional effects. While the authors suggest that these timing differences support the idea that foveal feature-based attention facilitates prolonged fixation on target stimuli, this interpretation is not fully supported by the current analyses. That said, the saccade-aligned analyses of peripheral units (Figure S3) partially mitigate this concern by demonstrating that feature-based modulation persists through saccade execution.

      (3) The "attention-out" condition for spatial attention lacks directional control. In the spatial attention analyses (Figures 4D-F), the "attention-out" condition appears to include all fixations followed by saccades directed away from the receptive field, regardless of saccade direction. This differs from classic spatial attention designs, which typically use controlled anti-saccades or saccades to fixed locations opposite the receptive field (e.g., Moore & Armstrong, 2003; Gregoriou et al., 2009). Saccades directed toward locations adjacent to, but outside, the receptive field may still partially engage spatial attention mechanisms near the receptive field via broad attentional fields or motor preparation gradients (Bisley & Goldberg, 2010). In addition, the "attention-out" condition likely contains a heterogeneous mixture of trials in which the stimulus in the receptive field is either a target or a distractor, since feature-based attention effects are derived from this same pool of trials. As a result, spatial and feature attention effects are not fully orthogonal, and variance related to feature attention may already be embedded in the spatial attention baseline.

    1. eLife Assessment

      This valuable study introduces a new framework for improving the automated sorting of extracellular action potentials. However, the evidence is incomplete; the biophysical model used for simulation is based on one simulation that does not necessarily reflect real experimental data, the test datasets are insufficiently diverse, and essential algorithmic details are currently missing. This work will be of interest to neuroscientists using high-density multichannel electrophysiology.

    2. Reviewer #1 (Public review):

      Summary:

      This work presents a flexible spike-sorting framework that allows users to run, swap, and benchmark individual modules commonly used in spike sorting. The paper argues and demonstrates that "opening the black box" is essential for understanding which components drive performance differences and for making progress toward more accurate and transparent spike sorting.<br /> Using this modular benchmarking pipeline, the work identifies electrode drift as a primary bottleneck for accurate sorting and introduces an end-to-end sorter ("Lupin") that combines the best-performing modules and is reported to outperform existing spike-sorting packages on their benchmark.

      Overall, this is a strong tool/resource contribution with clear potential to accelerate spike-sorting development and enable more rigorous comparisons. However, several claims, particularly around Lupin's or individual modules' superiority, are not yet supported robustly enough for the strength of the conclusions stated.

      Strengths:

      This work has high community value and practical utility. The effort to make benchmarking and spike sorting modules accessible and standardized is substantial and likely to be broadly useful.<br /> Treating spike sorting as a set of interchangeable modules is a useful approach to some extent, and it enables targeted improvements rather than 'new sorters' popping up, which are difficult to fully understand.

      Implementing this resource within SpikeInterface, an already widely used tool, will facilitate uptake and community contributions.

      Overall, I am positive about this manuscript as a resource paper. The core framework is compelling and timely.

      Weaknesses:

      (1) The main concern is the limited support for the claim that 'Lupin' and individual modules' outperform existing spike sorters.

      (2) Evidence is primarily from a single benchmark based on an intentionally simplified simulation. While the authors discuss the trade-offs between simulated and real data, the current evaluation does not provide enough diversity to justify claims of superiority.

      (3) While improving individual modules that run in a serial fashion could aid overall spike sorting performance, acknowledging that some end-to-end sorters work in an iterative fashion across multiple of these modules would be fair. Perhaps the optimal spike sorter is not a serial set of modules.

      (4) There is also a risk of benchmark overfitting. A modular approach makes it easy to select components that excel on specific benchmarks (or a specific project's data characteristics) without generalizing.

      Concrete ways to strengthen this work:

      (1) Evaluate on multiple simulation regimes, consider adding at least one biophysically detailed simulation, benchmark on multiple probe-geometries with neurons also clustered in different depth profiles (as this will affect drift solutions), and provide real-data validation. Even without full ground truth, real-data can be evaluated with expert curation, functional validation (e.g., refractory violations, quality metrics, unit waveform consistency), agreement across sorters, and consistency across time.

      (2) Related to real-data applicability, it is also important to acknowledge that modulatory approaches can enable overfitting to the needs of individual projects. Without real-data benchmarking (or benchmark diversity), it is unclear how the framework will guide users towards generalizable 'best practices' rather than optimized configurations that work for their specific conditions.

    3. Reviewer #2 (Public review):

      Summary:

      Spike sorting, that is, assigning events detected in extracellular electrophysiology data to the firing of individual neurons, is an inherently difficult computational problem involving multiple steps. The difficulty arises from low signal-to-noise, instability in signal due to the relative motion of the tissue and recording sites, and large volumes of data. Experimental ground truth data - where the correct assignment of spikes is known - is not available in large enough quantities to test algorithms. This paper describes a tool for creating fully synthetic ground truth data and benchmarking the individual steps of spike sorting to dissect the impact of signal-to-noise, firing rate, and motion correction on each step. This information is used to construct an optimized algorithm for sorting the ground truth data. One result of particular interest is the dominant role of motion correction in degrading accuracy. Another important technical result is that motion correction via interpolation of the voltage traces yields similar accuracy to interpolation of the spike templates.

      Strengths:

      The paper clearly shows the benefits of analyzing the complex process of spike sorting step by step. While this analysis has also been done in papers presenting spike sorters (for example, reference [32]), the tools presented here allow users and developers to do similar studies for their own work. This toolset will be very useful to many labs, especially those working in less studied brain areas or model systems, cases where the tuning of standard spike sorting tools is not a good match to the data.

      Weaknesses:

      The model ground truth data used in the paper does not need to be a perfect match to experimental data to provide useful benchmarking. However, as with all measurements of spike sorting accuracy, extrapolation to experimental data can be complicated. Users of these tools will need to assess how well the simulated data matches their recordings.

    4. Reviewer #3 (Public review):

      Overview:

      In this manuscript, the authors describe two additions to an existing toolbox (SpikeInterface, Buccino et al., 2020, eLife). The first addition is an empirical simulator for extracellular recordings, in which spikes from predefined templates are added up with Gaussian noise. The second addition involves granting user-level access to intermediate processing steps along spike sorting algorithms. The authors demonstrate the toolbox by evaluating functions (e.g., event detection) or sets of functions (e.g., feature extraction + clustering) on their simulated data, and suggest that a specific combination of function implementations provides performance improvement relative to kilosort4 (Pachitariu et al., 2024, Nature Methods).

      If the authors are interested in making this manuscript a suitable scientific contribution, the entire work has to be revised extensively. In particular, the simulator has to be extended and improved; the implementation of existing spike sorters has to be improved; the feedforward architecture of the modules has to be extended; the reporting of results has to follow standard reporting standards; new algorithms have to be explained in sufficient detail; and the manuscript has to undergo extensive proofreading.

      Notably, even assuming perfect implementation and descriptions, it is unclear to me whether the scope of the present work warrants a publication in a scientific journal, or is more suitable for an internal technical report or an e.g., a GitHub version release. To go beyond a scientifically-sound technical report, the authors may choose to demonstrate the utility of their new proposed sorter ("Lupin") and compare it to existing tools on multiple datasets.

      General comments:

      (1) The simulator itself has to be improved and extended. Right now, it simply generates, for every unit, a mother waveform from a sum of exponentials, scales that over channels, and then adds up multiple instantiations of every unit on every channel, along with noise. This is not a biophysical simulator: it is an ad hoc procedure, and the sentence "we firmly believe that.." (lines 482-483) does not make the procedure convincing. To make the simulator credible, the authors should: (1) use a set of biophysical equations, with multi-compartmental modeling of currents and return currents; (2) use noised data from extracellular recordings; or (3) some combination thereof.

      (2) The simulated dataset has to be extended in time. Maybe I missed something, but 500 units over 10 minutes, with some units having firing rates as low as 0.1 spikes/s, corresponds to some of the units firing an expected 60 spikes. This is clearly too short, and does not replicate the standard situation in extracellular experiments.

      (3) The simulated dataset has to be extended in space. The choice of using NeuroPixels 1.0 geometry is a poor one. Many labs use other monolithic electrode arrays (MEAs, silicon probes, other rigid arrays); tetrodes remain a major tool, and flexible probes (polyimide, mesh) are evolving. Assessing algorithms over a single spatial architecture is likely to lead to local maxima in performance and potentially erroneous conclusions.

      (4) The existing spike sorters evaluated are not completely described. Some sorters (e.g., SpyKING Circus and KS4) were described in previous publications, but it is unclear whether the implementation that was used for the present tests is exactly the same as those previously published. More importantly, some of the sorters evaluated (e.g., TDC, TDC2, SpyKING Circus 2) were never described in a peer-reviewed paper. This does not mean that they cannot be evaluated - but if they are, they must be described in full. Relying on the fact that the code is open source cannot replace a complete and accurate scientific description.

      (5) Related to the above, all relevant code should be made available online in permanent repositories, not only in author-controlled ones.

      (6) It is unclear why SpyKING Circus 2 and TDC2 are evaluated - these could potentially be described as straw men. I recommend reorganizing the manuscript so that after every module is evaluated separately based on a limited ground truth dataset, a single "best" sorter would be constructed, and then tested extensively (and compared to the de facto state of the art). Such reorganization would both demonstrate the utility of a modular approach and clarify the general usefulness of the outcome.

      (7) The new algorithms developed, for example, clustering and template matching, have to be described in more detail, and demonstrated graphically on simple datasets. This can be done in supplementary material if the authors prefer not to extend the manuscript too much.

      (8) This reviewer finds the description and interpretation of the results to be inadequate. As an example, focusing on Figure 5: The results in Figure 5A have to be supplemented and summarized as a scalar point estimate (e.g., median accuracy), an estimate of dispersion (e.g., using MAD, IQR, or SD), evaluated over multiple runs, and compared using statistical tests between tools and conditions (e.g., using a multi-dimensional analysis of variance, a mixed effect model, etc.). The results in Figure 5D must have an indication of dispersion. Any conclusions based on the numerical experiments must be based on these metrics and statistical evaluations.

      (9) The entire MS would benefit from expert proofreading; there are many language errors, mostly in indefinite articles and grammatical numbers.

    1. eLife Assessment

      This valuable study presents a real-time system for identifying multiple unrestrained marmosets in a home cage setting using a combination of face detection and color-coded beads. However, there is incomplete evidence regarding the generalizability and robustness of the system to unconstrained multi-animal environments.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript by Yang, Wang, and Cléry presents a lightweight pipeline for real-time identification of common marmosets in a laboratory setting. Models were trained and evaluated on data derived from a family of three closely related adults and a set of juvenile twins. Freely moving animals entered an enclosed space fixed to the housing cage door, which permitted the entry of individual animals for data acquisition. Utilizing YOLOv8-nano, identification was improved through the introduction of uniquely colored collar beads. Analyses of facial similarity showed close morphological relatedness amongst individuals and highlighted the need for highly discriminative classification. Overall, the authors offer a framework for identity tracking that prioritizes real-time inference. The authors demonstrate that combining facial detection with visual markers enables adequate identity assignment under controlled laboratory conditions with minimal cross-individual misclassification.

      Strengths:

      (1) The proposed pipeline offers a solution for real-time identity tracking in common marmosets. Its lightweight design enables deployment across a wide range of hardware configurations. Furthermore, if similar strategies are employed, this methodology is likely adaptable for other species with minimal modification.

      (2) Evaluation of closely related individuals provides a necessary stress test for the discrimination of facial identity tracking.

      Weaknesses:

      (1) The pipeline's reliance on controlled animal isolation and small visual markers raises questions about the approach's generalizability to unconstrained multi-animal environments. The provided confusion matrices (Figures 6-8) indicate that the most common misclassifications are background-related, possibly suggesting that detection specificity is the primary source of error. All things considered, these findings raise concerns about performance in its use in socially dynamic and visually complex environments.

      (2) The manuscript claims performance comparable to that of human experimenters but provides no explicit evidence to support these claims. While it is plausible that human experimenters may be less accurate in facial recognition tasks involving closely related marmosets, the authors don't provide evidence. Moreover, while that might be the case, the color-coded beads provide a salient identity cue for the model, which complicates the interpretation of this comparison grounded in facial recognition.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, Yang et al. develop a real-time system for automatic face detection and identification of multiple unrestrained common marmosets in a home cage setting.

      Strengths:

      The study aims to address an unmet need in behavioral neuroscience: the ability to non-invasively identify animals is crucial to the automated and rigorous study of neural behaviors; this is especially true for common marmosets, which are rapidly becoming a model system of choice for the study of complex social cognition. By using a YOLOv8 backbone, the study achieve human level performance, both in terms of precision and recall of the trained models.

      Weaknesses:

      The robustness of the system is not clear from the limited datasets presented. The use of color-coded beads undercuts the study's premise that the system achieves truly non-invasive tracking. Although the system achieves good performance in face detection, it does not perform as well for classification using faces alone (especially when the faces are similar, as in twin animals). Here, too, the color-coded beads play a key role in identity discrimination. The stated goals of the study and the actual results presented are therefore at odds.

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript, Yang et al introduce a new method for automatically identifying marmosets in their home cage using a supervised deep learning method that recognizes the face and colored beads on marmoset collars. The authors show a high precision rate of identifying marmosets to levels comparable to a human experimenter. The method overall seems robust at identifying marmosets at different life stages and different settings; however, given the current form, I'm struggling to see the generalizability and experimental utility of this method.

      Strengths:

      (1) The authors provide a near-perfect automatic identification of marmosets in their home cage.

      (2) This method is robust across lightning, camera angles, etc., making it potentially useful for marmoset (and other NHP) identification outside the housing cage as well

      Weaknesses:

      (1) Despite the almost perfect precision, in its current form, I'm failing to see how this method can be useful to other labs.

      (2) This is a nice methods manuscript, but the authors do not present results to show how their method can be used outside of identifying marmosets inside their home cages in a small field of view.

      (3) Reading the manuscript is strenuous, given its repetitive nature. Consolidating and shortening the results, as well as adding some definitions to the results section, would be helpful.

    1. eLife Assessment

      This useful study addresses the interesting question of how immune cells recognise infected erythrocytes in malaria. It proposes the parasite protein PfGBP-130 as an interaction partner of the human cell surface protein LFA 1, which could help explain how NK cells recognize infected erythrocytes. The conclusions are partially supported by pull-down and cell-based activation data. However, the overall evidence of direct interaction at the cell-cell interface and downstream effects is incomplete; stronger evidence is required to demonstrate surface exposure of PfGBP-130, as well as a direct role of this antigen in killing.

    2. Reviewer #1 (Public review):

      In this manuscript, the authors aim to determine the ligand on Plasmodium falciparum-infected erythrocytes for the NK cell integrin, LFA-1, following up on previous evidence that LFA-1 is important for immune cell-mediated recognition of iRBCs.

      They start by incubating LFA-1 with iRBCs and show by flow analysis that a substantial population of these iRBCs binds to the LFA-1 (Figure 1C). They do conduct the control with uninfected RBCs, but put this in the supplementary material. As this is a critical control, I think that it should be moved to Figure 1C as it is essential to allow interpretation of the iRBC data. The authors also do not state which strain of P. falciparum they used (line 144). This is critical information as different strains have different variant surface antigens and should be included. With these changes, this data seems convincing.

      They next incubated LFA-1 with the iRBCs, cross-linked and conducted a pulldown, identifying GP130 as a binding partner. Using cross-linkers is a dangerous strategy as it risks non-specific cross-linking. Did they try without cross-linking and find an interaction?

      They raised antibodies to PfGBP and showed IFA, which reveals that these antibodies stain iRBCs (Figure 2Ciii). This experiment lacks a critical control of uninfected RBCs, which needs to be included to show that the staining is specific. Without this, it is not possible to conclude that there is iRBC-specific staining with PfGBP.

      They then conduct a pulldown using LFA-Fc, which does show GP130 only in the presence of the LFA-Fc, but not when empty beads are used. This is convincing. BLI measurements are also used to study this interaction (Figure 2Ci). The BLI data is presented in such a way that any association phase is obscured by the y-axis, which makes it impossible to know whether there is binding here. I think that the data needs to be shown with some baseline before the addition of the ligand so that the association can be seen. The data is also a bit messy with a downward drift and the curves showing different shapes, for example, with the 1.0uM curve seeming to have a different association rate. Also, is this n=1? I think that this data needs to be repeated and replicated. As this is the only data which shows a direct interaction between LFA1 and GBP, as pulldowns are done with lysates, which might mean bridging components. I think that it is important to repeat the BLI or use additional biophysical methods to assess binding, to obtain more convincing data.

      The authors next do some modelling of the putative complex. This is done by homology modelling and docking, which is not the most up-to-date method and is overinterpreted. Personally, I would remove this data as I did not find it convincing, and it is not important for the story. If the authors wish to include it, then I think that they should validate the modelling by mutagenesis to show that the residues which the models indicate might bind are involved in the interaction.

      They next made GP130 and tested the binding of this to THP-1 cells, which are often used as a model for macrophages. They observe greater binding of PfGBP-Fc to these cells when compared with hIgG and show that LFA-1 siRNA reduces this binding. I was a little confused about how the flow plots related to the graph in the bottom right corner of Figure 3Bii. In the flow plots, hIgG control shows 12.8% of cells in the gated region, while the unstained cells has 5.63%, but the MFI data shows a decrease in binding for hIgG vs unstained cells. How is this consistent? Also, the siRNA reduces the number of cells in the gated region from 66.6% to 25.9%, which is still substantially more that 5.63% in the unstained control. This also doesn't seem quite consistent with the MFI data. Could the authors explain this? Also, perhaps an additional experiment would be to add soluble LFA-1 into this assay as an additional control to determine whether this blocks PfGBP binding to the THP-1 cells? It could be that there are additional mechanisms of binding which indicate why the siRNA has a partial effect. The same is true for the NK cell experiments in Figure 3Ci, in which the siRNA has a partial effect. The authors also test binding to HEK, HepG2 and 'stem' cells and claim 'only background levels of binding', but in each case, there is more binding to these cells by PfGBP-Fc than by hIgG, albeit less than in THP-1 and NK cells. Why have the authors decided that these increases are not significant? All in all, these experiments do indicate a role for the GBP-LFA1 interaction in the binding of immune cells to iRBCs, but perhaps not as absolutely as is suggested.

      The authors next produce CHO cells with PfGBP on the surface. These cells bind to LFA-1 specifically. When these cells were incubated with primary NK cells, they did see increases in activation markers, which were reduced by the addition of anti-CD11a, suggesting these to be specific. They also conduct the same experiment with anti-GBP with iRBCs, but this is in a different figure. It would be easier for the reader if Figure 5B were in the same figure as Figure 4B, as it is related data using the same method. I found this data convincing, showing that the LFA1:GBP interaction does contribute to immune cell recognition and activation.

      The authors next conduct an experiment in which they assess parasite growth in the presence of NK cells and in the presence of anti-GBP. They use Heochst staining as a measure of parasite growth and claim that NK cells reduce the number of parasites, but that anti-GBP abolishes this effect (Figure 5A). I found this experiment very unconvincing as there are small effects and no demonstration of significance. More commonly used approaches to study parasite growth are lactate dehydrogenase GIA assays or calcein-AM labelling. I did not find this experiment convincing and would either remove or supplement with additional data using a more robust assay, with repeats and tests of statistical significance.

      In summary, the authors present a set of data which comes together to indicate an interaction between LFA1 and PfGBP on the Plasmodium-infected erythrocyte surface. Pulldown studies show convincingly that these two proteins co-precipitate, and BLI data suggest that this is direct. Also convincing is that NK cell activation can be reduced using antibodies against either LFA1 or PfGBP, indicating that this interaction does play a role in immune cell recognition of iRBCs.

    3. Reviewer #2 (Public review):

      Summary:

      The authors used an LFA-1 αI-Fc fusion protein to pull down potential ligands and LC-MS/MS, leading to the selection of PfGBP-130 as a potential membrane protein on the surface of infected cells. PfGBP-130 antibodies were raised and used to support the surface localization. This putative ligand interacted strongly with LFA-1 (Kd = 15 nM). A presumed PfGBP-130 ectodomain interacts with monocytes and NK cells but not cells that lack LFA-1. PfGBP-130 antibodies also interfered with NK cell-mediated infected cell killing; the effect, although statistically significant, is modest. The authors propose that NK cells recognize infected cells via LFA-1 interaction with PfGBP-130 exposed on the host cell and that this interaction is critical to initiation of NK cell activation and killing of infected cells.

      Major points:

      (1) PfGBP-130 is proposed to be a membrane protein based on a single predicted transmembrane domain. Figures 2b and 3a show ribbon schematics with this TM domain at residues 51-68, in agreement with TM prediction algorithms such as TMHMM 2.0 and Phobius. However, this predicted TM is upstream of the PEXEL motif (residues 84-88, sequence RILAE), a conserved sequence for parasite protein export to host cytosol that is proteolytically processed at its 4th residue. Thus, residues 1-87 are removed from PfGBP-130 prior to export, yielding a mature protein without predicted TMs. Prior studies have determined that the mature PfGBP-130 lacks TMs and is retained as a soluble protein in host cell cytosol (PMID: 19055692, 35420481). Thus, the authors' model of PfGBP-130 as a surface-exposed membrane protein conflicts with both computational analysis of the mature protein and these prior reporter studies. An important simple experiment would be to evaluate PfGBP-130 membrane association in immunoblots using the authors' PfGBP-130 antibody after hypotonic lysis (PMID: 19055692) and after alkaline extraction (e.g. 100 mM NaCO3, pH 11 as frequently used, PMID: 33393463). If the prior studies and computational analyses are correct, the protein will be predominantly in the soluble and/or alkaline supernatant fractions.

      (2) Many findings rely on the specificity of antibodies generated against PfGPB-130 or NK cell receptors. Although the authors have included key controls (use of isotype control antibodies, lack of anti-PfGBP-130 binding to uninfected cells), cross-reactivity between P. falciparum antigens is well-recognized and could significantly undermine the interpretation of experiments (PMID: 2654292 and 1730474 provide key examples of antigens recognized by antibodies raised against other proteins). For example, the surface localization in IFA experiments (Figure 2B(iii)) could reflect anti-PfGBP-130 binding to an unrelated parasite surface antigen, a possibility not addressed by any of the authors' controls. As another example, the iRBC lysate immunoblot using this antibody in Fig. 2B(iv) suggests a MW of 95 kDa, which corresponds to the unprocessed pre-protein before export; cleavage in the PEXEL motif yields a processed mature protein of 85 kDa, which should be readily resolved from the pre-protein in immunoblots (PMID: 19055692). A better immunoblot using immature infected cell stages might show both the pre-protein and the mature protein as a doublet band.

      (3) PfGBP-130 is not essential for in vitro cultivation (PMID: 18614010 and MIS of 1.0 in the piggyBac mutagenesis screen as tabulated on plasmodb.org, indicating a highly dispensable gene). The authors should use the knockout line as a control in their IFA localization experiments to address antibody specificity. More fundamentally, their model predicts that NK cells should not recognize or kill infected cells from the knockout line when compared to their untransfected parent. Such results with the knockout line would compellingly support the authors' model without reliance on antibodies that may cross-react with other parasite antigens. PMID: 18614010 reported that the PfGBP-130 knockout exhibited increased membrane rigidity, suggesting an intracellular scaffolding protein rather than a surface localization and use as a ligand for LFA-1 interaction and NK cell-mediated killing.

      (4) PfGBP-130 non-essentiality raises the question of why the gene would be retained if it triggers NK cell-mediated killing of infected cells in vivo. Presumably, this killing would pose strong selective pressure against retention of PfGBP-130. Some speculation is warranted to support the model.

    4. Reviewer #3 (Public review):

      Summary:

      Malhotra and colleagues present evidence that the integrin LFA-1 on NK cells is a ligand for the Plasmodium falciparum protein GBP130 on the infected erythrocyte surface and that this interaction plays a role in the clearance of infected erythrocytes by NK cells.

      The authors first select a subdomain contained within the CD11a subunit of LFA-1 as a probe to discover possible binding proteins on the infected erythrocyte surface. Parasite-infected erythrocytes stained positively with this probe; the level of staining increased as the parasites progressed through the life cycle. Using the LFA-1-based probe in cross-linking pull-down experiments, GBP130 was identified by mass spectrometry as a co-purifying parasite protein. The N-terminal portion of GBP130 was recombinantly expressed and shown to interact with LFA-1 alpha-I by biolayer interferometry experiments. The full-length extracellular domain of GBP130 was then recombinantly expressed and used to stain primary human NK cells and THP-1 cells. Knocking down LFA-1 by siRNA reduced staining by GBP130. To assess the contribution of GBP130 to the activation of NK cells, CHO cells exogenously expressing GBP130 were incubated with primary NK cells. Transfecting CHO cells with GBP130 led to increased activation of co-incubated NK cells compared to mock-transfected and compared to GBP130 transfected cells, with the inclusion of anti-CD11a to block NK cell adhesion. Finally, CHO cells expressing GBP130 led to increased activation of NK cells compared to mock-transfected CHO cells.

      Overall, although the authors present data from NK cell killing assays that include appropriate controls, the data suggesting a direct interaction between PfGBP-130 and LFA-1 does not include the same necessary controls, for example, the use of blocking antibodies. Most critically, the biolayer interferometry experiments use a recombinant fragment of PfGBP-130, which does not include the residues predicted to be important for mediating specific interaction with LFA1. The biolayer interferometry data instead suggest non-specific interactions between PfGBP-130 and LFA1, as binding does not reach saturation.

  2. Mar 2026
    1. eLife Assessment

      This article presents valuable findings on how the timing of cooling affects the timing of autumn bud set in European beech saplings. The study leverages extensive experimental data and provides an interesting conceptual framework for the various ways in which warming can affect but set timing. The statistical analysis is compelling, but indicates some factors that may temper the authors' claims, while the designs of experiments offer incomplete support for the current claims as they rely on one population under extreme conditions for only one year each while a confounding effect (time in a chamber) sometimes lacks a control.

    2. Reviewer #1 (Public review):

      Summary:

      This study provided key experimental evidence for the "Solstice-as-Phenology-Switch Hypothesis" through two temperature manipulation experiments.

      Strengths:

      The research is data-rich, particularly in exploring the effects of pre- and post-solstice cooling, as well as daytime versus nighttime cooling, on bud set timing, showcasing significant innovation. The article is well-written, logically clear, and is likely to attract a wide readership.

      Comments on revisions:

      This is the second round of review, and I am generally very satisfied with the authors' revisions. However, a few detailed issues still require attention:

      The authors identified the summer solstice (June 21) as a phenological "switch point", but the flexibility of this switch point remains poorly understood. A more precise explanation of what "flexibility" means in this context is needed, along with a description of the specific experimental results that would demonstrate this flexibility.

      The experiment did not directly measure the specific date of the phenological switch point. Instead, it was inferred by comparing temperature effects before and after the solstice. The manuscript should clearly state that this switch point remains an inferred conceptual node rather than a directly measured variable.

      In Experiment 1, the effect of bud type (terminal vs. lateral) was inconsistent across the overall model and the different leafing groups. The authors should provide a more thorough discussion of potential reasons for this inconsistency. In addition, the statistical model for Experiment 1 indicates that the measured variables (summer cooling and leaf emergence date) explain only 23.4% of the variation in bud formation timing. This leaves over 76% of the variation unexplained, suggesting that other important factors are involved. The discussion should address this limitation in greater depth, moving beyond a focus on the measured variables.

    3. Reviewer #2 (Public review):

      In 'Developmental constraints mediate the summer solstice reversal of climate effects on European beech bud set [their original title]' Rebindaine and co-authors report on two experiments on Fagus sylvatica where they manipulated temperatures of saplings between day and night and at different times of year. I think the experiments are interesting, but I found the exact methods of them somewhat extreme compared to how the authors present them. Further, given that much of the experiment happened outside, I am not sure how much we can generalize from one year for each experiment, especially when conducted on one population of one species. I was also very concerned by the revisions.

      I expand briefly on these concerns and a few others for readers of the paper (see `The below comments relate to my original review'). Subsequent edits to the paper addressed some of these by providing a new figure and moving around the methods. Further, I am at a loss about their hypothesis, when they write in their letter: "Importantly, the Solstice-as-Phenology-Switch hypothesis does not assume that the reversal is fixed to June 21." Why on earth reference the solstice if the authors do not mean to exactly reference the solstice?

      The comments below relate to my original review with many of them still applying.

      Methods: As I read the Results I was surprised the authors did not give more info on the methods here. For example, they refer to the 'effect of July cooling' but never say what the cooling was. Once I read the methods I feared they were burying this as the methods feel quite extreme given the framing of the paper. The paper is framed as explaining observational results of natural systems, but the treatments are not natural for any system in Europe of which I have worked in. For example a low of 2 deg C at night and 7 deg C during the day through end of May and then 7/13 deg C in July is extreme. I think these methods need to be clearly laid out for the reader so they can judge what to make of the experiment before they see the results.

      I also think the control is confounded with growth chamber experience in Experiment 1. That is, the control plants never experience any time in a chamber, but all the treatments include significant time in a chamber. The authors mention how detrimental chamber time can be to saplings (indeed, they mention an aphid problem in experiment 2) so I think they need to be more upfront about this. The study is still very valuable, but -- again -- we may need to be more cautious in how much we infer from the results.

      Also, I suggest the authors add a figure to explain their experiments as they are very hard to follow. Perhaps this could be added to Figure 1?

      Finally, given how much the authors extrapolate to carbon and forests, I would have liked to see some metrics related to carbon assimilation, versus just information on timing.

      Fagus sylvatica: Fagus sylvatica is an extremely important tree to European forests, but it also has outlier responses to photoperiod and other cues (and leafs out very late) so using just this species to then state 'our results likely are generalisable across temperate tree species' seems questionable at best.

      Measuring end of season (EOS): It's well known that different parts of plants shut down at different times and each metric of end of season -- budset, end of radial expansion, leaf coloring etc. -- relate to different things. Thus I was surprised that the authors ignore all this complexity and seem to equate leaf coloring with budset (which can happen MONTHS before leaf coloring often) and with other metrics. The paper needs a much better connection to the physiology of end of season and a better explanation for the focus on budset. Relatedly, I was surprised the authors cite almost none of the literature on budset, which generally suggests is it is heavily controlled by photoperiod and population-level differences in photoperiod cues, meaning results may different with a different population of plants.

      Somewhat minor comments:<br /> (1) How can a bud type -- which is apical or lateral -- be a random effect? The model needs to try to estimate a variance for each random effect so doing this for n=2 is quite odd to me. I think the authors should also report the results with bud type as fixed, or report the bud types separately.<br /> (2) I didn't fully see how the authors results support the Solstice as Switch hypothesis, since what timing mattered seemed to depend on the timing of treatment and was not clearly related to solstice. Could it be that these results suggest the Solstice as Switch hypothesis is actually not well supported (e.g., line 135) and instead suggest that the pattern of climate in the summer months affects end of season timing?

    4. Author Response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This article presents valuable findings on how the timing of cooling affects the timing of autumn bud set in European beech saplings. The study leverages extensive experimental data and provides an interesting conceptual framework of the various ways in which warming can affect bud set timing. The support for the findings is incomplete, though extra justifications of the experimental settings, clarifications of the interpretation of the results, and alternative statistical analyses can make the conclusions more robust.

      We thank the editors and reviewers for their expert assessment of our findings and their interest in our conceptual framework. Below we respond to the specific reviewer and editor comments.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study provided key experimental evidence for the "Solstice-as-PhenologySwitch Hypothesis" through two temperature manipulation experiments.

      Strengths:

      The research is data-rich, particularly in exploring the effects of pre- and postsolstice cooling, as well as daytime versus nighttime cooling, on bud set timing, showcasing significant innovation. The article is well-written, logically clear, and is likely to attract a wide readership.

      Thank you for your generous description of our study and the manuscript.

      Weaknesses:

      However, there are several issues that need to be addressed.

      (1) In Experiment 1, significant differences were observed in the impact of cooling in July versus August. July cooling induced a delay in bud set dates that was 3.5 times greater in late-leafing trees compared to early-leafing ones, while August cooling induced comparable advances in bud set timing in both early- and late-leafing trees.

      The study did not explain why the timing (July vs. August) resulted in different mechanisms. Can a link be established between phenology and photosynthetic product accumulation? Additionally, can the study differentiate between the direct warming effect and the developmental effect, and quantify their relative contributions?

      We thank the reviewer for pointing out that we could improve our explanation of the different responses to July and August cooling in experiment 1. Whilst we incorporated this in the conceptual model and the figure caption (Fig. 1b), we now also address this topic in more depth in the discussion section, focussing on daylength and photosynthetic assimilation as the possible mediators of this change in responses (L350-371).

      For the early-season development effect vs the late-season temperature effect we can use the leaf-out day-of-year (as a proxy for development), and the summer cooling treatments (direct temperature effect) to assess the relative importance of these two components of our model. We have now included a variance partitioning analysis following this logic, see L246-252 for methods, L278-281 for results.

      (2) The two experimental setups differed in photoperiod: one used a 13-hour photoperiod at approximately 4,300 lux, while the other used an ambient day length of 16 hours with a light intensity of around 6,900 lux. What criteria were used to select these conditions, and do they accurately represent real-world scenarios? Furthermore, as shown in Figure S1, significant differences in soil moisture content existed between treatments - could this have influenced the conclusions?

      This question may reflect a misunderstanding regarding the light availability that we hope to address with improved clarification. The duration and intensity of the lighting in these experiments was always set to reflect the average conditions experienced in Zurich for those respective times of the year. Day length in spring is shorter than it is in summer, so the durations were simply adjusted to reflect this reality. The 13-hour, 4,300 lux conditions in experiment 1 were only for the April-May period, when we reduced developmental rates for the late-leafing trees (L125-129). In July, the photoperiod was set to 16 hours and light intensity was approximately 7,300 lux (L150-154). This is equitable to experiment 2–when treatments were applied in June and July–where photoperiod was 16 hours and light intensity approximately 6,900 lux (L206-207). These conditions reflect the average daylengths in Zurich, and the maximum light intensity output by the chambers.

      As mentioned in our initial author response, we do not think small differences in soil moisture levels should influence our conclusions. All pots were watered sufficiently to avoid water deficit, and all efforts were made to minimise differences in water availability. A Tukey honest significant difference test showed that only one treatment pair (6 - Late_July_Extreme vs. 7 - Early_August_Moderate, difference = 6%, p < 0.05) had significantly different soil water content, a pair whose responses are not compared. We have added words to this effect in the figure legend of Fig. S1.

      (3) The authors investigated how changes in air temperature around the summer solstice affected primary growth cessation, but the summer solstice also marks an important transition in photoperiod. How can the influence of photoperiod be distinguished from the temperature effect in this context?

      We agree that photoperiod likely plays a central role. Our conceptual model (Fig. 1) explicitly incorporates photoperiod as the framework within which temperature responses are regulated (L72-75, L627-629 & L638-641). The Solstice-as-Phenology-Switch hypothesis assumes that the annual progression of daylength sets the physiological “window” for trees’ responsiveness to temperature. Our experiments therefore focused on how temperature responses differ before versus after the solstice, while recognising that this reversal is likely enabled by the photoperiod signal. In other words, photoperiod provides the regulatory backdrop, and our results identify how diel and seasonal temperature cues are interpreted within that photoperiodic framework.

      (4) The study utilized potted trees in a controlled environment, which limits the generalization of the results to natural forests. Wild trees are subject to additional variables, such as competition and precipitation. Moreover, climate differences between years (2022 vs. 2023) were not controlled. As such, the conclusions may be overgeneralized to "all temperate tree species", as the experiment only involved potted European beech seedlings. The discussion would benefit from addressing species-specific differences.

      We agree that extrapolation from our experiments on Fagus sylvatica to other species and natural forests requires caution. However, it is precisely the controlled nature of our design that allowed us to isolate the precise mechanisms that appear to underpin the solstice switch, highlighting the role of diel and seasonal temperature variation. In natural systems, additional variables such as competition, precipitation, and soil heterogeneity can strongly influence phenology, but they also make it difficult to disentangle causal mechanisms. By minimising these confounding factors, our experiment provided a clear test of how temperature before and after the solstice regulates growth cessation.

      To acknowledge the limitation, we have toned down statements about generalisation (e.g. “likely generalisable” to “other temperate tree species may display similarities”; L409-411) and explicitly call for follow-up studies across species and forest contexts (L413–414). At the same time, we highlight that our findings align with independent evidence from manipulative experiments, satellite observations, flux measurements, and ground-based phenology, which suggests the mechanisms we report may extend beyond the specific populations studied here.

      Reviewer #2 (Public review):

      In 'Developmental constraints mediate the summer solstice reversal of climate effects on European beech bud set', Rebindaine and co-authors report on two experiments on Fagus sylvatica where they manipulated temperatures of saplings between day and night and at different times of year. I enjoyed reading this paper and found it well written. I think the experiments are interesting, but I found the exact methods somewhat extreme compared to how the authors present them. Further, given that much of the experiment happened outside, I am not sure how much we can generalize from one year for each experiment, especially when conducted on one population of one species. I next expand briefly on these concerns and a few others.

      Thank you for the kind comments. We appreciate your concerns regarding the severity of our treatments and the generalisability of our results, and you can find our detailed responses below.

      Concerns:

      (1) As I read the Results, I was surprised the authors did not give more information on the methods here. For example, they refer to the 'effect of July cooling' but never say what the cooling was. Once I read the methods, I feared they were burying this as the methods feel quite extreme given the framing of the paper. The paper is framed as explaining observational results of natural systems, but the treatments are not natural for any system in Europe that I have worked in. For example, a low of 2 {degree sign}C at night and 7 {degree sign}C during the day through the end of May and then 7/13 {degree sign}C in July is extreme. I think these methods need to be clearly laid out for the reader so they can judge what to make of the experiment before they see the results.

      We understand the concern regarding the structure of the manuscript and note that the methods section was moved to the end of the paper in accordance with eLife’s recommended formatting. We have now moved the methods section before the results to ensure that readers are familiar with the treatments before encountering the outcomes.

      We recognise that our temperature treatments were severe and do not mimic real world scenarios. They were deliberately designed to create large contrasts in developmental rates, thereby maximising our ability to detect the mechanisms underpinning the solstice switch. For example, the severe cooling between 4 April and 24 May was specifically designed to slow spring development as much as possible without damaging the plants (L129-L133). We have added text in the Methods to clarify this aim (L129-131 & L156-161).

      Regarding presentation, treatment details are now described in both the Methods and the relevant figure legends. Given this structure, we have chosen not to restate the full treatment conditions in the main Results text to avoid repetition.

      (2) I also think the control is confounded with the growth chamber experience in Experiment 1. That is, the control plants never experience any time in a chamber, but all the treatments include significant time in a chamber. The authors mention how detrimental chamber time can be to saplings (indeed, they mention an aphid problem in experiment 2), so I think they need to be more upfront about this. The study is still very valuable, but again, we may need to be more cautious in how much we infer from the results.

      We appreciate the reviewer’s concern about the potential confounding effect of chamber exposure in experiment 1. We have now discussed this limitation more explicitly, adding further explanation to the Methods (L146-148) and Discussion (L345-346).

      Note that chamber-related problems (e.g. aphid infestations) primarily occurred under warm chamber conditions, whereas our experiment 1 cooling treatments maintained low temperatures that suppressed such issues. This means that an equivalent “warm chamber control” could have been associated with its own artefacts, as trees kept under warm chamber conditions would have been exposed to additional stressors that were not present under natural growing conditions. To address this point, we included a chamber control in experiment 2. While aphid abundance was indeed higher in the warm chamber controls, chamber exposure itself had no detectable effect on autumn phenology. This suggests that the main findings of experiment 1 are unlikely to be artefacts of chamber conditions (L141145).

      Nevertheless, we agree that chamber exposure remains a potential limitation of experiment 1, which requires clear acknowledgement. We now state this more explicitly in the manuscript while also emphasising that our results are supported by experiment 2 and by converging lines of external evidence.

      (3) I suggest the authors add a figure to explain their experiments, as they are very hard to follow. Perhaps this could be added to Figure 1?

      We have now added figures to the methods section to depict the experimental timelines and settings more clearly (Figs. 2 and 3).

      (4) Given how much the authors extrapolate to carbon and forests, I would have liked to see some metrics related to carbon assimilation, versus just information on timing.

      We agree that including more data on photosynthetic assimilation would be valuable for interpreting phenological responses. Indeed, it was our intention to collect this information. However, unfortunately, we experienced technical challenges with the equipment available to us during the experimental period, which prevented us from collecting a full dataset. Nevertheless, we were able to obtain measurements during pre-solstice cooling (now presented as Fig. S12, including data for all treatments), which show that cooling treatments strongly reduced assimilation rates compared to controls. Importantly, these strong reductions occurred across all cooling treatments, yet their phenological outcomes differed markedly, demonstrating that assimilation alone cannot explain the observed responses. As we discuss, our findings are consistent with previous manipulative and observational studies reporting a weak role of late-season assimilation in controlling autumn phenology.

      (5) Fagus sylvatica is an extremely important tree to European forests, but it also has outlier responses to photoperiod and other cues (and leafs out very late), so using just this species to then state 'our results likely are generalisable across temperate tree species' seems questionable at best.

      We agree that Fagus sylvatica has a stronger photoperiod dependence than many other European tree species. As we note in our response to Reviewer 1 (comment 4), our findings align with previous research across temperate northern forests. Within our framework, interspecific variation in leaf-out timing would not alter the overall response pattern, though it could shift the specific timing of effect reversals. For example, earlier-leafing species may approach completion of development sooner and thus show sensitivity to late-season cooling earlier than F. sylvatica. Nevertheless, we acknowledge the importance of not overstating generality. We have therefore revised the manuscript to phrase conclusions more cautiously (L409411) and highlight the need for further research across species (L413–414).

      (6) Another concern relates to measuring the end of season (EOS). It is well known that different parts of plants shut down at different times, and each metric of end of season - budset, end of radial expansion, leaf coloring, etc - relates to different things. Thus, I was surprised that the authors ignore all this complexity and seem to equate leaf coloring with budset (which can happen MONTHS before leaf coloring often) and with other metrics. The paper needs a much better connection to the physiology of end of season and a better explanation for the focus on budset. Relatedly, I was surprised that the authors cite almost none of the literature on budset, which generally suggests it is heavily controlled by photoperiod and population-level differences in photoperiod cues, meaning results may be different with a different population of plants.

      We thank the reviewer for pointing out that our discussion of the responses of different EOS metrics needs more clarity. We agree with much of this perspective, and we have added an additional analysis of leaf chlorophyll content data to use leaf discolouration as an alternative EOS marker (L179-195 for methods, L296-311 for results). On this we would like to make two important points:

      Firstly, we agree that bud set often occurs before leaf discolouration, although this can depend on which definition of leaf discolouration is used. In experiment 1, bud set occurred on average on day-of-year (DOY) 262 and leaf senescence (50% loss of leaf chlorophyll) occurred on DOY 320. However, we do not necessarily agree that this excludes the combined discussion of bud set and leaf senescence timing. Whilst environmental drivers can affect parts of plants differently, often responses from different end-of-season indicators (e.g. bud set and loss of leaf chlorophyll) are similar, even if only directionally. Figure S11 shows how, across both experiments, treatment effects were tightly conserved (R<sup>2</sup> = 0.49) amongst the two phenometrics. In accordance with these revisions, we have updated the manuscript title to “Developmental constraints mediate the summer solstice reversal of climate effects on the autumn phenology of European beech” (L1-2).

      Secondly, shifts in bud set timing remain the primary focus of the manuscript as these shifts are of direct physiological relevance to plant development and dormancy induction, whereas leaf discolouration may simply follow bud set as a symptom of developmental completion. This is supported by our results, which show stronger responses of bud set than leaf senescence (Figs. 4 & 5 vs. Figs. S9 & S10).

      Following the reviewer’s suggestion, we have included more references on the topic of bud set and its environmental controls. The reviewer rightly stresses that photoperiod is considered the most important factor. As mentioned above (see Reviewer 1 comment 3), photoperiod is therefore key in our conceptual model. However, the responses we observed in F. sylvatica cannot be explained by photoperiod alone. For example, in experiment 1, July cooling delayed the autumn phenology of late-leafing trees but had negligible impact on early-leafing trees, even though both experienced the exact same photoperiod. Moreover, in experiment 2, day, night and full-day cooling showed substantial variations in their effects despite equal photoperiod across the climate regimes. This is why we suggest that the annual progression of photoperiod modulates the responses to temperature variations instead of eliciting complete control.

      (7) I didn't fully see how the authors' results support the Solstice as Switch hypothesis, since what timing mattered seemed to depend on the timing of treatment and was not clearly related to the solstice. Could it be that these results suggest the Solstice as Switch hypothesis is actually not well supported (e.g., line 135) and instead suggest that the pattern of climate in the summer months affects end-of season timing?

      We interpret this concern as relating to the flexibility in reversal timing that we observed. Importantly, the Solstice-as-Phenology-Switch hypothesis does not assume that the reversal is fixed to June 21. Rather the hypothesis implies that reversal occurs around the solstice, when photoperiod cues cause tree individuals to shift from accelerating to decelerating their seasonal development. Our conceptual model (Fig. 1) explicitly incorporates this flexibility by showing how the timing of the reversal depends on developmental speed: Individuals that develop more slowly (or leaf out later) cross the compensatory point later in the summer, whereas fast developing individuals reach it earlier.

      Our experiments support this framework: pre-solstice full-day cooling delayed bud set, whereas post-solstice full-day cooling advanced it, with differences between early- and late-developing individuals consistent with the model. Moreover, the contrasting impacts of daytime vs. night time cooling demonstrate how diel conditions can further shape when the reversal is expressed. Thus, rather than contradicting the Solstice-as-Phenology-Switch hypothesis, our findings reinforce it and extend it by showing how flexibility arises from interactions between developmental progression, diel temperature responses, and photoperiod.

      We have added an additional section in the Discussion that elaborates on how our results support the Solstice-as-Phenology-Switch hypothesis (L416-432).

      Recommendations for the authors:

      Reviewing Editor (Recommendations for the authors):

      (1) The current strength of evidence is incomplete. Extra justifications of the experimental settings, clarifications of the interpretation of the results, and alternative statistical analyses could make the conclusions more solid.

      We agree with the vast majority of the reviewer comments and have made the relevant edits. We believe that these have dramatically improved the clarity of the manuscript. The revised analyses have not changed our conclusions, though we have toned down generalisations.

      (2) The Solstice as Switch hypothesis is about the effect of temperature warming. However, the two experiments did not simulate warming but rather cooling. Although a temperature difference can be obtained compared to the control in both cases, the impacts on plant physiology and phenology should still be different between the two scenarios.

      Thank you for raising this point, which requires clearer communication in our manuscript. The Solstice-as-Phenology-Switch hypothesis posits that changes in temperature before and after the summer solstice have opposite effects on the autumn phenology of northern forest trees. While the hypothesis has most often been framed in terms of warming, the underlying mechanism concerns whether development is accelerated or slowed relative to ambient conditions. In essence, we are exploring the effect of changes in temperature – not warming per se. In warmer springs, development begins earlier and/or proceeds faster, while in colder springs the opposite occurs; the same logic applies to post-solstice conditions. We have extended our explanation in the Introduction (L69-71).

      In our experiments, we applied cooling to create strong contrasts in developmental rates without damaging the trees. These treatments allow us to test the direction of phenological responses relative to ambient conditions. Thus, although we used cooling rather than warming, the results are directly informative for the Solstice-as Switch framework, which concerns the relative effect of temperature changes rather than the absolute direction of manipulation.

      (3) The number of groups for bud type and summer temperature treatment is too small to be used as a random effect; it would be more appropriate to treat them as fixed-effect terms.

      We have revised the analysis to include bud type as a fixed effect. There are only very minor numerical adjustments (e.g. rounding to 4.8 days instead of 4.9, see L271) and inferences are not altered. We also report the bud type effects for experiment 1 (L262-266) and experiment 2 (L292-293)

      (4) Please add more clarifications for Figure 4 about what this figure is for and how you derived this figure, whether the data were from your experiments or others.

      We have rewritten the caption for Figure 6 (Fig. 4 in the previous manuscript) to clarify where the data came from and how the figure was generated (L687-693). This figure serves as a visual guide to aid the understanding of the processes that may govern the patterns we have observed. Figure 6a uses data from previous studies on diel patterns in F. sylvatica, specifically growth (Zweifel et al., 2021) and photosynthetic assimilation rates (Urban et al., 2014). To aid visualisation, we linearly interpolated between measurements points, converted the values to a relative percentage (compared to observed maximum), and then smoothed the resulting curves. Based on the evidence from experiment 2, we suggest there may be a temperature threshold below which overwintering responses (e.g. bud set) are induced in F. sylvatica. Figure 6b depicts a theoretical diel pattern of this potential threshold. In simple terms, the threshold must be lower at night because nights are typically colder than days.

      Reviewer #2 (Recommendations for the authors):

      (1) How can a bud type -- which is apical or lateral -- be a random effect? The model needs to try to estimate a variance for each random effect, so doing this for n=2 is quite odd to me. I think the authors should also report the results with bud type as fixed, or report the bud types separately.

      See point (3) in reviewing editor’s recommendations for the authors.

      (2) Could the authors move the methods earlier and remind readers of them in the results?

      We have addressed this issue, please see detailed response under reviewer 2’s concerns.

      Urban O, Klem K, Holišová P, Šigut L, Šprtová M, Teslová-Navrátilová P, Zitová M, Špunda V, Marek MV, Grace J. 2014. Impact of elevated CO2 concentration on dynamics of leaf photosynthesis in Fagus sylvatica is modulated by sky conditions. Environmental Pollution 185: 271–280.

      Zweifel R, Sterck F, Braun S, Buchmann N, Eugster W, Gessler A, Häni M, Peters RL, Walthert L, Wilhelm M, et al. 2021. Why trees grow at night. New Phytologist 231: 2174–2185.

    1. eLife Assessment

      The authors previously identified SLAP as a key suppressor of the Src tyrosine kinase and a tumor suppressor. In this important study, the authors show SLAP functions in a cell-autonomous fashion in colon stem cells and propose solid evidence that SLAP reduces tumorigenesis by inhibiting an EphB2-SRC axis.

    2. Reviewer #1 (Public review):

      Naim et al. use genetically engineered mouse models and tissue culture cell lines to investigate the role of the SLAP adaptor protein in colonic epithelium and colon tumour formation. The SLAP adaptor protein is known to be a negative regulator of tyrosine kinase signaling in hematopoietic cells, but its role outside the immune system is less well defined. Here, the authors use genetically engineered SLAP-deficient mice, tissue-specific SLAP KO, and colonic organoids to demonstrate that SLAP is expressed in cells of the colonic epithelium, where it acts as a cell-autonomous regulator of proliferation and differentiation. In addition, they provide biochemical evidence that loss of SLAP expression in cultured colonic organoids results in increased Src family kinase activity and global tyrosine phosphorylation, consistent with its known role as a suppressor of tyrosine kinase activity in immune cells. Consistently, treatment with an SRC kinase inhibitor inhibited the growth of SLAP-deficient organoids. These data provide solid evidence of a cell-autonomous role of SLAP in the colonic epithelium.

      This work would be improved by further description and interpretation of the SLAP expression pattern shown in the constitutive and tissue-specific KO to further support the conclusions made. In Supplementary Figure 1, magnification of the colon epithelium areas with SLAP expression shown by b-gal and anti-SLAP staining, highlighting regions of interest, would better support the conclusions regarding SLAP expression in specific regions of the colon epithelium. In Supplementary Figure 1B, the authors should indicate that the SLAP staining referred to is epithelial and in resident immune cells, as is mentioned in the text. Also, magnification of the boxed area of LRG5 staining in Figure 1 would improve this figure.

      Using a chemically induced model of colitis-associated cancer, the authors demonstrate that inactivation of SLAP shows a trend toward increased tumor formation (though this did not reach significance) as well as increased Src family kinase activity within tumors. Tumor spheres from SLAP-deficient animals showed enhanced growth that was suppressed by treatment with a Src family kinase inhibitor. Of note, the latter effect was specific to SLAP-deficient tumor spheres. These observations are convincing and support the authors' conclusion that SLAP has a tumor suppressor role in CRC through inhibition of SFK signaling.

      Mechanistically, elevated expression of the RTK, EphB2, was detected in immunoblots of SLAP KO colonic crypts, while overexpression of SLAP in CRC cell lines downregulated EphB2 protein levels. Using an EPHB2 inhibitor, the role of EPHB2 in the growth of SLAP-deficient colonic organoids was demonstrated. While these data generally support the authors' conclusion that SLAP limits colonic organoid growth by downregulating RTKS such as EphB2 and downstream Src family kinase activity, they do not show which cell types/regions in the colonic epithelium have increased EPHB2 protein and how this relates to SLAP and phospho-SRC expression, as shown in Figure 1 and Figure S1 immunocytochemistry. The expression of EphB2 and its role in colonic tumorsphere growth were not investigated.

      Overall, this work provides evidence of SLAP adaptor function in restricting tyrosine kinase signaling in the colonic epithelium, and suggests that loss of SLAP expression could promote tumorigenesis in this context.

    3. Reviewer #2 (Public review):

      Summary:

      Protein tyrosine kinases are subject to diverse regulatory mechanisms controlling their activity in normal situations. The authors previously identified SLAP (Src-like adaptor protein), a negative regulator of receptor tyrosine kinase (RTK) signaling, as a key suppressor of the cytoplasmic tyrosine kinase SRC in the normal colon and demonstrated that SLAP is downregulated in a majority of colorectal cancers (CRCs).

      In this study, the authors further explored SLAP functions in mouse models using constitutive and inducible epithelial-specific Slap deletion (villin-CreERT2 model). They found that loss of SLAP augments colonic epithelial cell proliferation and that induction of tumorigenesis by the AOM/DSS protocol mimicking CRC leads to more aggressive tumors in the absence of SLAP. This effect is apparently cell-autonomous as growth of normal and tumoral colonic organoids is SLAP-dependent in in vitro settings. Finally, the authors define that, in colon, SLAP represses EphB2, an RTK lying upstream of SRC, and show that inhibitors of EphB2 can partially limit tumorigenic development in vitro.

      Strengths:

      The manuscript is clearly and concisely written, making it easy to follow. The data obtained in the mouse models are very convincing.

      Weaknesses:

      Direct evidence that EphB2 is activated/phosphorylated in the absence of SLAP is lacking, as conclusions are only based on results obtained with inhibitors. Some other issues have to be addressed before acceptance, in particular, the relevance of the findings in CRC patients.

    4. Author Response:

      Public Reviews:

      Reviewer #1 (Public review):

      Naim et al. use genetically engineered mouse models and tissue culture cell lines to investigate the role of the SLAP adaptor protein in colonic epithelium and colon tumour formation. The SLAP adaptor protein is known to be a negative regulator of tyrosine kinase signaling in hematopoietic cells, but its role outside the immune system is less well defined. Here, the authors use genetically engineered SLAP-deficient mice, tissue-specific SLAP KO, and colonic organoids to demonstrate that SLAP is expressed in cells of the colonic epithelium, where it acts as a cell-autonomous regulator of proliferation and differentiation. In addition, they provide biochemical evidence that loss of SLAP expression in cultured colonic organoids results in increased Src family kinase activity and global tyrosine phosphorylation, consistent with its known role as a suppressor of tyrosine kinase activity in immune cells. Consistently, treatment with an SRC kinase inhibitor inhibited the growth of SLAP-deficient organoids. These data provide solid evidence of a cell-autonomous role of SLAP in the colonic epithelium.

      This work would be improved by further description and interpretation of the SLAP expression pattern shown in the constitutive and tissue-specific KO to further support the conclusions made. In Supplementary Figure 1, magnification of the colon epithelium areas with SLAP expression shown by b-gal and anti-SLAP staining, highlighting regions of interest, would better support the conclusions regarding SLAP expression in specific regions of the colon epithelium. In Supplementary Figure 1B, the authors should indicate that the SLAP staining referred to is epithelial and in resident immune cells, as is mentioned in the text. Also, magnification of the boxed area of LRG5 staining in Figure 1 would improve this figure.

      We thank the reviewer for their positive and constructive evaluation of our work.

      We agree that a more detailed description and visualization of SLAP expression in the colonic epithelium would strengthen our conclusions. In response, we will revise Fig 1 and S1 to better highlight SLAP expression patterns. Specifically, we will include higher-magnification images of the colonic epithelial regions in Suppl Fig 1, with clearly indicated regions of interest. We will also clarify in the legend of Suppl Figure 1B that SLAP staining is observed in both epithelial and resident immune cells, as described in the text. Additionally, we will provide a magnified view of the boxed area showing LGR5 staining in Figure 1 to improve clarity.

      Using a chemically induced model of colitis-associated cancer, the authors demonstrate that inactivation of SLAP shows a trend toward increased tumor formation (though this did not reach significance) as well as increased Src family kinase activity within tumors. Tumor spheres from SLAP-deficient animals showed enhanced growth that was suppressed by treatment with a Src family kinase inhibitor. Of note, the latter effect was specific to SLAP-deficient tumor spheres. These observations are convincing and support the authors' conclusion that SLAP has a tumor suppressor role in CRC through inhibition of SFK signaling.

      Mechanistically, elevated expression of the RTK, EphB2, was detected in immunoblots of SLAP KO colonic crypts, while overexpression of SLAP in CRC cell lines downregulated EphB2 protein levels. Using an EPHB2 inhibitor, the role of EPHB2 in the growth of SLAP-deficient colonic organoids was demonstrated. While these data generally support the authors' conclusion that SLAP limits colonic organoid growth by downregulating RTKS such as EphB2 and downstream Src family kinase activity, they do not show which cell types/regions in the colonic epithelium have increased EPHB2 protein and how this relates to SLAP and phospho-SRC expression, as shown in Figure 1 and Figure S1 immunocytochemistry. The expression of EphB2 and its role in colonic tumorsphere growth were not investigated.

      Overall, this work provides evidence of SLAP adaptor function in restricting tyrosine kinase signaling in the colonic epithelium, and suggests that loss of SLAP expression could promote tumorigenesis in this context.

      We also thank the reviewer for their positive comments regarding our tumor studies and the role of SLAP in regulating SFK signaling.

      Regarding the mechanistic insights involving EphB2, we appreciate the reviewer’s suggestion to further define its spatial expression and relationship with SLAP and phospho-SRC. To address this, we plan to extend our analysis to assess the effect of Slap depletion on EphB2 protein levels throughout the intestinal epithelium.

      We recognize that directly testing EphB2’s role in murine colonic tumorsphere formation would require a new cohort of SLAP knockout mice treated with AOM/DSS for 90 days, which is not feasible in the short term. To address this, we will instead use human colorectal cancer models to assess how SLAP modulation affects the response of tumoroids derived from cell lines to EphB2 inhibition, providing complementary mechanistic insights.

      Overall, we believe these additions will strengthen the manuscript and more fully address the reviewer’s concerns.

      Reviewer #2 (Public review):

      Summary:

      Protein tyrosine kinases are subject to diverse regulatory mechanisms controlling their activity in normal situations. The authors previously identified SLAP (Src-like adaptor protein), a negative regulator of receptor tyrosine kinase (RTK) signaling, as a key suppressor of the cytoplasmic tyrosine kinase SRC in the normal colon and demonstrated that SLAP is downregulated in a majority of colorectal cancers (CRCs).

      In this study, the authors further explored SLAP functions in mouse models using constitutive and inducible epithelial-specific Slap deletion (villin-CreERT2 model). They found that loss of SLAP augments colonic epithelial cell proliferation and that induction of tumorigenesis by the AOM/DSS protocol mimicking CRC leads to more aggressive tumors in the absence of SLAP. This effect is apparently cell-autonomous as growth of normal and tumoral colonic organoids is SLAP-dependent in in vitro settings. Finally, the authors define that, in colon, SLAP represses EphB2, an RTK lying upstream of SRC, and show that inhibitors of EphB2 can partially limit tumorigenic development in vitro.

      Strengths:

      The manuscript is clearly and concisely written, making it easy to follow. The data obtained in the mouse models are very convincing.

      Weaknesses:

      Direct evidence that EphB2 is activated/phosphorylated in the absence of SLAP is lacking, as conclusions are only based on results obtained with inhibitors. Some other issues have to be addressed before acceptance, in particular, the relevance of the findings in CRC patients.

      We thank the reviewer for their positive and constructive evaluation of our work.

      We agree that our conclusions regarding the SLAP–EphB2–SRC signaling axis rely in part on pharmacological inhibition. As outlined in the manuscript, EphB2 was selected primarily as a proof-of-concept receptor to illustrate how SLAP may indirectly regulate SRC activity through modulation of upstream receptor tyrosine kinases. We note that the use of two distinct classes of EphB inhibitors supports the robustness of our observations.

      To further strengthen this aspect of the study, we will assess EphB2 phosphorylation status in SLAP-deficient conditions, which will provide more direct evidence of its activation state and its contribution to SRC signaling.

    1. eLife Assessment

      This study presents an important study of the relationship between morphogen signaling and cell fate choices in the forming zebrafish neural tube, addressing a topical question in developmental biology. The authors provide a solid characterization of the precision limit for gene regulatory networks interpreting Shh, with single-cell resolution and state-of-the-art in vivo approaches. While the depth of analysis is restricted, particularly by the number of cell traces, the study will be of interest to developmental biologists interested in cellular decision-making.

    2. Reviewer #1 (Public Review):

      [Editors' note: This version has been assessed by the Reviewing Editor without further input from the original reviewers. Given the time elapsed since the original data collection, the authors have addressed the previous concerns by providing a more nuanced discussion of their results and acknowledging the limitations of the study to ensure the conclusions are supported by the existing data.]

      Throughout the paper, the authors do a fantastic job of highlighting caveats in their approach, from image acquisition to analysis. Despite this, some conclusions and viewpoints portrayed in this study do not appear well-supported by the provided data. Furthermore, there are a few technical points regarding the analysis that should be addressed.

      (1) Analysis of signaling traces

      - Relevance of "modeled signaling level": It is not clear whether this added complexity and potential for error (below) provides benefits over a more simple analysis such as taking the derivative (shown in Figure 3C). Could the authors provide evidence for the benefits? For example, does the "maximal response" given a simpler metric correlate less well with cell fate than that calculated from the fitted response?

      - Assumptions for "modeled signaling level": According to equation (1) Kaede levels are monotonically increasing. This is assumed given the stability of the fluorescent protein. However, this only holds for the "totally produced Kaede/fluorescence". Other metrics such as mean fluorescence can very well decrease over time due to growth and division. Does "intensity" mean total fluorescence? Visual inspection of the traces shown in Figure 2 suggests that "fluorescence intensity" can decrease. What does this mean for the inferred traces?

      - Estimation of Kaede reporter half-live: It is not clear how the mRNA stability of Kaede is estimated. It sounds like it was just assessed visually, which seems not entirely appropriate given the quantitative aspects of the rest of the study. Also, given that Shh signaling was inhibited on the level of Smoothened, it is not obvious how the dynamics of signaling shutdown affect the estimate. Most results in Figure 7 seem to be quite robust to the estimate of the half-live. That they are, might suggest that the whole analysis is unnecessary in the first place. However, not all are. Thus, it would be important to make this estimate more quantitative.

      (2) Assignment of fates and correlations

      - Error estimate for cell-type assignment: Trying to correlate signaling traces to cell fate decisions requires accurate cell fate assignment post-tracking. The provided protocol suggests a rather manual, expert-directed process of making those decisions. Can the authors provide any error-bound on those decisions, for example comparing the results obtained by two experts or something comparable? I am particularly concerned about the results regarding the higher degree of variability in the correlation between signaling dynamics and cell fate in the posterior neural tube. Here, the expression of Olig2 does not seem to segregate between different assigned fates, while it does so nicely in the anterior neural tube. This would suggest to me that cells in the posterior neural tube might not yet be fully committed to a fate or that there could be a relatively high error rate in assigning fates. Thus, the results could emerge from technical errors or differences in pure timing. Could the authors please comment on these possibilities?

      - Clustering and fates: One approach the authors use to analyze the correlation between signaling and fate is clustering of cell traces and comparison of the fate distributions in those clusters. There is a large number of clusters with only single traces, suggesting that the data (number of traces) might not be sufficient for this analysis. Furthermore, I am skeptical about clustering cells of different anterior-posterior identities together, given potential differences in the timing of signal reception and signaling. I am not convinced that this analysis reveals enough about how signaling maps to fate given the heterogeneity in traces in large clusters and the prevalence of extremely small clusters.

      - Signaling vector and hand-picked metrics: As an alternative approach, that might be better suited for their data, the authors then pick three metrics (based on their model-predicted signaling dynamics) and show that the maximal response is a very good predictor of fate for different anterior-posterior identities. Previous information-theoretic analysis of signaling dynamics has found that a whole time-vector of signaling can carry much more information than individual metrics (Selimkhanov et al, 2014, PMID: 25504722). Have the authors tried to use approaches that make use of the whole trace (such as simple classifiers (Granados et al, 2018, PMID: 29784812), or can comment on why this is not feasible for their data? The authors should at least make clear that their results present a lower bound to how accurately cells can make cell-fate decisions based on signaling dynamics.

      (3) Consequences of signaling heterogeneity

      The authors focus heavily on portraying that signaling dynamics are highly variable, which seems visually true at first glance. However, there is no metric used or a description given of what this actually means. Mainly, the variability seems to relate to the correlation between signaling and fate. However, given the data and analysis, I would argue that the decoding of signaling dynamics into fate is surprisingly accurate. So signaling dynamics that seem quite noisy and variable by visual inspection can actually be very well discriminated by cells, which to me appears very exciting.

      Indeed, simple features of signaling traces can predict cell fate as well as position (for anterior progenitors). Given that signaling should be a function of position, it naively seems as if signaling read-out could be almost perfect. It might be interesting to plot dorsal-ventral position vs the signaling metrics, to also investigate how Shh concentration/position maps to signaling dynamics, this would give an even more comprehensive view of signal transmission.

      There remains the discrepancy between signaling traces and fate in the posterior neural tube. The authors point towards differences in tissue architecture and difficulties in interpreting a "small" Shh gradient. However, the data seems consistent with differences in timing of cell-fate decisions between anterior and posterior cells. The authors show that fate does initially not correlate well with position in the posterior neural tube. So, signaling dynamics should likely also not, as they should rather be a function of position, given they are downstream of the Shh gradient. As mentioned above, not even Olig2 expression does segregate the assigned fates well. All this points towards a difference in the time of fate assignment between the anterior and posterior. Given likely delays in reporter protein production and maturation, it can thus not be expected that signaling dynamics correlate better with cell fate than the reporter "83%". Can the authors please discuss this possibility in the paper?

      Thus, while this paper represents an example of what the community needs to do to gain a better understanding of robust patterning under variability, the provided data is not always sufficient to make clear conclusions regarding the functional consequences of signaling dynamics.

    3. Reviewer #2 (Public Review):

      Summary:

      In this work, Xiong and colleagues examine the relationship between the profile of the morphogen Shh and the resulting cell fate decisions in the zebrafish neural tube. For this, the authors combine high-resolution live imaging of an established Shh reporter with reporter lines for the different progenitor types arising in the forming neural tube. One of the key observations in this manuscript is that, while, on average, cells respond to differences in Shh activity to adopt distinct progenitor fates, at the single cell level there is strong heterogeneity between Shh response and fate choices. Further, the authors showed that this heterogeneity was particularly prominent for the pMN fate, with similar Shh response dynamics to those observed in neighboring LFP progenitors.

      Strengths:

      It is important to directly correlate Shh activity with the downstream TFs marking distinct progenitor types in vivo and with single cell resolution. This additional analysis is in line with previous observations from these authors, namely in Xiong, 2013. Further, the authors show that cells in different anterior-posterior positions within the neural tube show distinct levels of heterogeneity in their response to Shh, which is a very interesting observation and merits further investigation.

      Weaknesses:

      This is a convincing work, however, adding a few more analyses and clarifications would, in my view, strengthen the key finding of heterogeneity between Shh response and the resulting cell fate choices.

    4. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Throughout the paper, the authors do a fantastic job of highlighting caveats in their approach, from image acquisition to analysis. Despite this, some conclusions and viewpoints portrayed in this study do not appear well-supported by the provided data. Furthermore, there are a few technical points regarding the analysis that should be addressed.

      We thank the reviewer for the comments, due to the age of the work and logistic constraints, we are unable to perform further experiments and analysis to address some of the concerns. We revised conclusions and viewpoints accordingly to reflect reviewer concerns.

      (1) Analysis of signaling traces

      Relevance of "modeled signaling level": It is not clear whether this added complexity and potential for error (below) provides benefits over a more simple analysis such as taking the derivative (shown in Figure 3C). Could the authors provide evidence for the benefits? For example, does the "maximal response" given a simpler metric correlate less well with cell fate than that calculated from the fitted response?

      We think the benefits of modeled signaling level are the conceptual accuracy to the extent possible with the data. It’s true that the assumptions brought-in may cause certain biases. We perform this and the simplest (raw data averaging, Fig.2). Intermediate results in between (such as the first derivative in Fig.3C) may correlate well or less well, but cannot be interpreted biologically.

      Assumptions for "modeled signaling level": According to equation (1) Kaede levels are monotonically increasing. This is assumed given the stability of the fluorescent protein. However, this only holds for the "totally produced Kaede/fluorescence." Other metrics such as mean fluorescence can very well decrease over time due to growth and division. Does "intensity" mean total fluorescence? Visual inspection of the traces shown in Figure 2 suggests that "fluorescence intensity" can decrease. What does this mean for the inferred traces?

      Yes the segmentations measure intensity in a fixed volume inside a cell, therefore it’s a spatial average (concentration) and is susceptible to cell volume changes. This has been noted in the revision. The raw measurement does fluctuate and can decrease, we think the short-time-scale fluctuations are likely measurement variations/errors rather than underlying big changes in concentration.

      Estimation of Kaede reporter half-live: It is not clear how the mRNA stability of Kaede is estimated. It sounds like it was just assessed visually, which seems not entirely appropriate given the quantitative aspects of the rest of the study. Also, given that Shh signaling was inhibited on the level of Smoothened, it is not obvious how the dynamics of signaling shutdown affect the estimate. Most results in Figure 7 seem to be quite robust to the estimate of the half-live. That they are, might suggest that the whole analysis is unnecessary in the first place. However, not all are. Thus, it would be important to make this estimate more quantitative.

      Yes we agree. Unfortunately we don’t have the quantitative data required to better estimate Kaede mRNA stability. The timing of Cyc inhibition to the ceasing of ptch mRNA production is roughly estimated but not necessarily precise in this context.

      (2) Assignment of fates and correlations

      Error estimate for cell-type assignment: Trying to correlate signaling traces to cell fate decisions requires accurate cell fate assignment post-tracking. The provided protocol suggests a rather manual, expert-directed process of making those decisions. Can the authors provide any error-bound on those decisions, for example comparing the results obtained by two experts or something comparable? I am particularly concerned about the results regarding the higher degree of variability in the correlation between signaling dynamics and cell fate in the posterior neural tube. Here, the expression of Olig2 does not seem to segregate between different assigned fates, while it does so nicely in the anterior neural tube. This would suggest to me that cells in the posterior neural tube might not yet be fully committed to a fate or that there could be a relatively high error rate in assigning fates. Thus, the results could emerge from technical errors or differences in pure timing. Could the authors please comment on these possibilities?

      This is a very insightful point. We did examine the posterior data again (cross-checked by 2 co-authors) to make sure the mixed situation has correct cell fate assignment. As established by others’ and our previous studies (See also Fig.1A), the identification of MFPs and LFPs in zebrafish spinal cord is very robust. The MFPs are the apical constricted single column of cells along the midline on top of the notochord, and the LFPs are the 2 columns of cells next to MFP on both sides. LFPs’ expression of olig2:gfp did vary more in the posterior (timing of response/commitment could be a factor as the reviewer pointed out), but eventually the cells at those positions will be V3 interneurons or floor plates and have not been observed to make motoneurons. There are 3 low Olig2:GFP pMNs in the anterior dataset (Fig.2B’) and 3 high Olig2:GFP LFPs in the posterior dataset (Fig.2D’) that we checked carefully. The heterogeneity argument is based on the verified tracking and final positioning of these cells.

      Clustering and fates: One approach the authors use to analyze the correlation between signaling and fate is clustering of cell traces and comparison of the fate distributions in those clusters. There is a large number of clusters with only single traces, suggesting that the data (number of traces) might not be sufficient for this analysis. Furthermore, I am skeptical about clustering cells of different anterior-posterior identities together, given potential differences in the timing of signal reception and signaling. I am not convinced that this analysis reveals enough about how signaling maps to fate given the heterogeneity in traces in large clusters and the prevalence of extremely small clusters.

      We agree. Due to the age of the work and logistic constraints, we are unable to perform further experiments and analysis to enrich the tracks for this revision. We are aware of upcoming, independent studies with many more systematic tracks and analysis which will address these concerns. We have added the caveats the reviewer raised.

      Signaling vector and hand-picked metrics: As an alternative approach, that might be better suited for their data, the authors then pick three metrics (based on their model-predicted signaling dynamics) and show that the maximal response is a very good predictor of fate for different anterior-posterior identities. Previous information-theoretic analysis of signaling dynamics has found that a whole time-vector of signaling can carry much more information than individual metrics (Selimkhanov et al, 2014, PMID: 25504722). Have the authors tried to use approaches that make use of the whole trace (such as simple classifiers (Granados et al, 2018, PMID: 29784812), or can comment on why this is not feasible for their data? The authors should at least make clear that their results present a lower bound to how accurately cells can make cell-fate decisions based on signaling dynamics.

      Thanks for these suggestions. We are limited by the measurement noise, coverage window of the traces and the number of tracks to make use of the full dynamics in a more informative manner.

      (3) Consequences of signaling heterogeneity

      The authors focus heavily on portraying that signaling dynamics are highly variable, which seems visually true at first glance. However, there is no metric used or a description given of what this actually means. Mainly, the variability seems to relate to the correlation between signaling and fate. However, given the data and analysis, I would argue that the decoding of signaling dynamics into fate is surprisingly accurate. So signaling dynamics that seem quite noisy and variable by visual inspection can actually be very well discriminated by cells, which to me appears very exciting.

      Yes – we agree that most cells are actually accurate in such a highly dynamic tissue. In the literature, the view has been more focused on how the GRN enables this accuracy. We therefore highlighted the heterogeneity and limit of accuracy of the GRN here. We added this point to make our presentation more balanced.

      Indeed, simple features of signaling traces can predict cell fate as well as position (for anterior progenitors). Given that signaling should be a function of position, it naively seems as if signaling read-out could be almost perfect. It might be interesting to plot dorsal-ventral position vs the signaling metrics, to also investigate how Shh concentration/position maps to signaling dynamics, this would give an even more comprehensive view of signal transmission.

      We’d refer readers to our earlier study Xiong et al., 2013 where ptch2:kaede, nkx2:gfp and olig2:gfp were plotted against position over time in single cell tracks. It was found that position was not a good predictor of signaling levels or cell fates at early stages when the cell fates were specified.

      There remains the discrepancy between signaling traces and fate in the posterior neural tube. The authors point towards differences in tissue architecture and difficulties in interpreting a "small" Shh gradient. However, the data seems consistent with differences in timing of cell-fate decisions between anterior and posterior cells. The authors show that fate does initially not correlate well with position in the posterior neural tube. So, signaling dynamics should likely also not, as they should rather be a function of position, given they are downstream of the Shh gradient. As mentioned above, not even Olig2 expression does segregate the assigned fates well. All this points towards a difference in the time of fate assignment between the anterior and posterior. Given likely delays in reporter protein production and maturation, it can thus not be expected that signaling dynamics correlate better with cell fate than the reporter "83%". Can the authors please discuss this possibility in the paper?

      Yes this is an important point/caveat of live signaling and fate tracking. As discussed in the manuscript, due to the sensitivity limit of fluorescent imaging, it’s difficult to determine the time when cells start to respond to the signal, and how variable that is from cell to cell. The posterior cells may be more variable in either spatial or temporal responses compared to the anterior and we are not able to distinguish that. However, signaling dynamics is not necessarily a good function of position or time either, there is no evidence for that in our results here. The 83% correlation is thus striking for the posterior progenitors indicating a certain robust logic in the GRN to capture a strong (even short-lived) response to Shh, regardless of position or time. This is an interest possibility (we do not claim it a mechanism as we have not tested it with perturbations) that challenges the prevailing view in the field that these progenitors integrate Shh exposure over time, or that they acquire positional information by reading a gradient.

      The discussion has been modified to be more nuanced about these points.

      Thus, while this paper represents an example of what the community needs to do to gain a better understanding of robust patterning under variability, the provided data is not always sufficient to make clear conclusions regarding the functional consequences of signaling dynamics.

      We quite agree. Together with the reviewer, we look forward to seeing the publication of some recent, independent progresses overcoming the challenges in our work by other colleagues.

      Reviewer #2 (Public Review):

      Summary:

      In this work, Xiong and colleagues examine the relationship between the profile of the morphogen Shh and the resulting cell fate decisions in the zebrafish neural tube. For this, the authors combine high-resolution live imaging of an established Shh reporter with reporter lines for the different progenitor types arising in the forming neural tube. One of the key observations in this manuscript is that, while, on average, cells respond to differences in Shh activity to adopt distinct progenitor fates, at the single cell level there is strong heterogeneity between Shh response and fate choices. Further, the authors showed that this heterogeneity was particularly prominent for the pMN fate, with similar Shh response dynamics to those observed in neighboring LFP progenitors.

      Strengths:

      It is important to directly correlate Shh activity with the downstream TFs marking distinct progenitor types in vivo and with single cell resolution. This additional analysis is in line with previous observations from these authors, namely in Xiong, 2013. Further, the authors show that cells in different anterior-posterior positions within the neural tube show distinct levels of heterogeneity in their response to Shh, which is a very interesting observation and merits further investigation.

      Weaknesses:

      This is a convincing work, however, adding a few more analyses and clarifications would, in my view, strengthen the key finding of heterogeneity between Shh response and the resulting cell fate choices.

      We thank the reviewer for the comments, due to the age of the work and logistic constraints, we are unable to perform further experiments and analysis to address some of the concerns. We revised conclusions and viewpoints accordingly to reflect reviewer concerns.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for The Authors):

      Minor comments:

      y-axis label suddenly changes to Ptch2-reporter level in Figure 5. Is what is plotted different from what is seen as examples in Figure 3?

      Thanks! Figure 5 tracks are as Figure 3B, this has been annotated in the figure legends.

      There are random bounding boxes in some of the figures.

      Sometimes the m in "More dorsal" is stylized with a capital M and sometimes not. It is somewhat confusing as a name for cell types but it is fine if no alternative can be found.

      This study unfortunately does not include markers that distinguish the interneurons dorsal to pMNs. We categorized them collectively as “more dorsal”.

      Response-time is defined as "the amount of time with an above-basal Shh response". This seems to me as the definition of response duration. I would assume that response-time, means the time it takes until a response is first observed. Please consider changing this.

      We did not use “duration” because a response time course recorded in these tracks may include multiple durations (on and off). The duration of exposure/response has been specifically used in the field as a single period of response. So it’s a sum of active responding time here. Clarified in the text.

      Reviewer #2 (Recommendations for The Authors):

      (1) The authors address several possible setbacks of transforming the measured fluorescence intensity of the patched reporter into a readout of the Shh signaling activity over time, however, one aspect that isn't directly addressed is the potential effect of differences in the z position of analyzed cells. These could, at least in principle, be sufficient to introduce significant noise in the fluorescence measurements. Can the authors subset their datasets by initial, as well as average, z position and then re-examine the measured trends for both Shh activity and the intensity of the cell fate reporters used in the study?

      The zebrafish early neural plate/tube has a small thickness in z in dorsal-ventral imaging and the tissue is transparent. The depth-associated scattering contributes very little, if at all to the fluorescent signals in the imaged time window. This can be seen in the nuclear/membrane signal of the movies, which is largely uniform across the tissue in z in the neural tissue. It can also be seen that the notochord cells, further ventral, appears to be dimmer.

      (2) It is critical for the validity of this study that the intensity of the patched reporter introduced by the authors in 2012, and used again in this study, faithfully represents the signaling activity of Shh. In this study, the authors provide measurements of the transcriptional rate of Kaede and additional modeling for this purpose. However, an important point is to determine how sensitive is the reporter to changes in Shh signaling of different magnitudes?

      We consider this BAC reporter line a good (probably still the best live reporter) one as it resolves the endogenous gradient up to the dorsal interneuron domains (Huang et al., 2012, Xiong et al., 2013) and responds well to perturbations (Notch, Cyclopamine, etc). But it’s true that we don’t have information of how sensitive it responds to changes of different magnitude. As far as we know, there is no in vivo, single cell information of how Shh targets respond to signaling of different magnitudes.

      (3) To strengthen the previous point, it would be nice to extend the analysis in Figure 2, at least partially, using other readouts for Shh activity (e.g. GBS-GFP)?

      We have used a GBS-RFP line previously and found it to be lower resolution in terms of showing the DV gradient, compared to ptch2:kaede.

      (4) It is unclear to me what is the relevant time window during which cells respond to Shh in the anterior versus posterior domains to determine progenitor specification. This is a concern to me, since: i) the average heterogeneity of Shh activity seems to increase strongly in time (Figure 2A/C); and ii) it is important to exclude that the finding of heterogeneous relationship between Shh activity and fate choices is largely driven by later timepoints, where potentially its activity is no longer relevant for cell fate specification. Can this point be clarified when this data is introduced in the manuscript and further discussed?

      Yes this is an important point/caveat of live signaling and fate tracking. As discussed in the manuscript, due to the sensitivity limit of fluorescent imaging, it’s difficult to determine the time when cells start to respond to the signal, and how variable that is from cell to cell. The posterior cells may be more variable in either spatial or temporal responses compared to the anterior and we are not able to distinguish that.

      (i) The ptch2:kaede reporter variability is higher in terms of magnitude (the signal gets brighter) in later times but the heterogeneity (overlap between difference cell fate groups) is lower in later times

      (ii) Similarly, the heterogenous relationship is more pronounced in early time points. Since we do not know exactly when the activity becomes no longer relevant (from our earlier studies we do think that the cells become specified early, when Shh signaling is noisy), we modelled the response profile and searched for a good predictor. The maximum response stands out, particularly as a good indicator for the posterior cells, suggests an early window/time of specification.

      Discussion has been modified to clarify these points.

      (5) Is the response of the patched reporter, as well as cell fate reporters, to defined concentrations of exogenously provided Shh heterogeneous, for instance, in in vitro experiments?

      Well-controlled (e.g., microfluidics and labeled Shh molecules) in vitro experiments will be fantastic future directions. Existing tissue explant + Shh dose approaches do not resolve the heterogeneity of exposure at single cell level but may be helpful in testing the limits and variabilities at different magnitudes.

      (6) The source of noise in this system is not entirely clear to me. The authors seem to attribute the heterogeneity they observe to the way cells respond to Shh, but can it be excluded that the morphogen profile is itself noisy to start with? It is currently difficult to distinguish between these two possibilities, given that the Shh activity reporter used in this study is itself a transcriptional output of the pathway. Can the distribution of Shh itself be analyzed (even if in immunostainings) during neural tube formation?

      Yes we fully agree. More quantitative analysis may help dissecting the sources of noise. The morphogen profile (particularly through time) will be great. Currently no reagent is available to achieve that. Studies using an engineered morphogen or tagged morphogen suggest that the pattern through tissue reasonably captures simple diffusion dynamics. However, at single cell level considerable randomness may still remain and difficult to quantitatively compare with still staining.

      (7) It is unclear to me how the authors define the ultimate cell fate of cells in their analysis in Figure 6. The brief description in the methods and in the manuscript seems to suggest that, in combination with marker expression, the cell position is used as a criteria to assign the fate to the progenitors - if this is the case, I guess the observed relationship in Figure 6 with LMDV distance is almost a control? This could be clarified for the readers.

      Yes indeed Figure 6 is a control as LMDV distances lead to final positions which form part of our determination of cell fates.

      As established by others’ and our previous studies (See also Fig.1A), the identification of MFPs and LFPs in zebrafish spinal cord is very robust. The MFPs are the apical constricted single column of cells along the midline on top of the notochord, and the LFPs are the 2 columns of cells next to MFP on both sides. LFPs’ expression of olig2:gfp did vary more in the posterior (timing of response/commitment could be a factor as the reviewer pointed out), but eventually the cells at those positions will be V3 interneurons or floor plates and have not been observed to make motoneurons. There are 3 low Olig2:GFP pMNs in the anterior dataset (Fig.2B’) and 3 high Olig2:GFP LFPs in the posterior dataset (Fig.2D’) that we checked carefully.

      The methods of fate determination are described in detail in methods.

      (8) The graphs in Figures 6 and 7 are difficult to interpret. What proportion, and absolute number, of cells are "mis specified" when the authors show the distinct colored lines in the pMN, LFP or more dorsal domains? How do the authors determine where each cell fate domain begins and ends to access for "mis-specified" cells? Can the authors also provide the corresponding experimental images in the figure?

      We apologize for the difficulties to interpret these figures. The graphs are a ranked list of all cells using the specified metric. The visual is to help generate an intuition of how mixed vs clear-cut the pattern is given the tested metric. They are not to be interpreted as the actual pattern in the tissue and there are no data images that show these patterns.

      (9) Given the experimental limitations/technical challenges discussed by the authors during the paper, the score of around 90% of predictability of cell fate choices is rather high in the anterior domain, suggesting a minor functional role for heterogeneity in this region. Even for the posterior domain, the score of 83% predictability based on the maximum response to Shh is still relatively high. In my view, this author's conclusions should be adjusted to make this difference clearer in the abstract and discussion, highlighting that the heterogeneity between Shh response and cell fate choices, particularly in the pMN fate, are stronger in the posterior domain affecting the precision of cell fate decisions particularly in this region. Can the authors further comment on potential mechanisms driving this difference?

      Yes – we agree that most cells are actually accurate in such a highly dynamic tissue. In the literature, the view has been more focused on how the GRN enables this accuracy. We therefore highlighted the heterogeneity and limit of accuracy of the GRN here.

      We have added the fact that the Shh response is still the main determinant of the pattern despite the heterogeneity in the Discussion. We also further discussed possibilities of the anterior posterior differences.

      (10) Following up from the previous point, the data in Figure 7 suggests that there might be different underlying mechanisms in how anterior and posterior cells interpret the Shh profile, with anterior cells potentially responding to the integrated concentration of Shh (since response time, average response, or maximum response to Shh all provide similar predictability scores for cell fate choices). In contrast, only the maximum response to Shh can provide a good prediction of posterior cell fate, consistent with a more instantaneous response to morphogen concentration (and thus potentially more error-prone measurement of the Shh profile?). This is a very interesting observation in my view. Could this be further tested?

      Thank you. Yes we found this very interesting too. We discussed the possibilities, including the reviewer’s suggestion that these cells may have different contexts or strategy to interpret the signal. It is also possible that the anterior cells use the same strategy (maximum response at an early time) and the subsequent response/duration do not matter to their fate commitment. A precise approach to shut down Shh response dynamics in single cells (e.g., optogenetics) will enable the test of these ideas. We hope following up studies will take such approaches.

    1. eLife Assessment

      In this important study, DNA and RNA are co-imaged in single cells to show that the proximity of topologically associated domain (TAD) boundaries is uncoupled from the transcriptional activity of nearby genes. The evidence supporting these conclusions is convincing for the regions examined, with high-throughput imaging providing robust statistics. This work will be of interest to researchers studying genome architecture and its relationship to gene regulation.

    2. Reviewer #2 (Public review):

      Summary:

      Almansour et al., investigate whether the proximity of TAD boundaries is directly linked to gene activity. The authors use high-throughput imaging to simultaneously measure the gene activity and physical distances between boundary regions in an allele-specific manner. Using transcriptional inhibitors, expression induction, and acute depletion of CTCF and cohesin, they test whether proximity of boundaries affects, or is affected by, gene activity.

      Strengths:

      The combined use of DNA and RNA imaging enabled simultaneous measurement of boundary proximity and transcriptional status at individual alleles. This allows single-allele correlation between boundary proximity and gene activity at multiple loci across thousands of alleles.

      The use of both transcription inhibitors and transcription stimulation provides compelling and consistent evidence that boundary proximity can be disconnected from a gene's activity. The data convincingly support the conclusion that stable proximity between boundary regions is not required for ongoing transcription at the loci and timescales examined.

      This work strengthens the emerging view that genome organization at the level of domain boundaries does not impose a deterministic control over transcription.

      Strong disruption of boundary distances is only observed upon depletion of cohesin. Notably, this corresponds with the largest changes in gene activity. In contrast, depletion of CTCF actually had minimal impact on boundary distances and also had minimal impact on gene activity. This makes sense in light of previous work, where live cell imaging demonstrated that cohesin is more important for domain-structure, whereas CTCF is only important for blocking cohesin from continuing on, such that the fully formed loop occurs in a very small percentage of cells. Therefore, the fact that disruption of cohesin (more important for internal domain structure) affects gene activity while disruption of CTCF does not is exceptionally interesting.

      Weaknesses:

      In untreated cells, the distribution of distance measurements between boundary probes is exceptionally narrow. While depletion of RAD21 clearly demonstrates an ability to detect changes in this distribution, this tight baseline distribution may limit sensitivity to more subtle changes (like those one might expect from transcriptional influences).

      This approach primarily tests the role of boundary interactions rather than domain organization as a whole.

    3. Reviewer #3 (Public review):

      Summary:

      This study addresses a central question in genome organization: whether the positions of chromosomal domain boundaries are functionally coupled to gene activity. The authors use high-throughput imaging to simultaneously measure distances between boundary markers and nascent RNA production in thousands of individual cells, enabling direct comparison of boundary positions and transcriptional status at single chromosomal copies. This approach is applied across multiple loci, genes, and cell types, and is combined with acute transcriptional perturbations and depletion of architectural proteins to test the relationship between chromosome structure and gene activity in both directions.<br /> This work makes a meaningful contribution by providing direct, single-cell evidence that domain boundary positions and gene activity are largely uncoupled in this system.

      Strengths:

      A major strength of the work is its single-cell, single-allele resolution, which overcomes the averaging inherent to population-based assays. The authors consistently find that boundary proximity is largely independent of transcriptional status: active and inactive alleles have similar boundary distances, transcriptional perturbations do not shift boundary distributions, and depletion of the boundary factor CTCF does not alter gene expression, whereas cohesin depletion affects both boundary organization and transcription. These conclusions are supported by large numbers of alleles, multiple loci and cell types, and internal controls that distinguish boundary-specific effects from broader chromatin influences. The study offers a robust, scalable imaging pipeline that will be valuable for future studies linking genome organization and transcription at single-cell resolution.

      Weaknesses:

      The study has important limitations that are acknowledged by the authors. Measurements are restricted to distances between flanking boundaries and do not capture internal domain architecture, sub-domain structure, or finer-scale regulatory contacts. Resolution is limited by probe size and imaging, potentially masking subtle positional changes, and only a small set of loci is examined, leaving open how broadly the uncoupling generalizes. Some perturbation effects, particularly for RAD21, may involve mechanisms beyond boundary disruption.

    4. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) Conceptual framing and interpretation:

      The central conclusion may require more precise framing to avoid potential overreach. The authors' interpretation equating "physical distance between TAD boundaries" with overall "TAD boundary architecture," and "transcriptional bursting events" with broader "gene activity," could benefit from clarification. This framing may not fully capture the temporal dynamics of transcription or the regulatory complexity within TADs. Furthermore, the broad conclusion of an uncoupled relationship appears to challenge extensive prior evidence from perturbation studies showing that disrupting TAD boundaries can alter gene expression. The authors' own observation of reduced gene activity upon RAD21 degradation suggests that global TAD disruption can affect transcription. A more precise and limited conclusion, acknowledging that their data demonstrate a lack of detectable correlation between boundary distance and bursting activity in their system, would be more accurate and help reconcile these findings with the existing literature.

      We have modified statements throughout the manuscript, including in the title, to enhance the precision of our conclusions to avoid overreach. We have also added on p. 16 of our Discussion, a separate section on the limitations of the study, noting that our conclusions are limited to TAD boundary distances and do not reflect the structure of TAD boundaries or of TADs themselves. We have also expanded our Discussion of possible TAD functions on p. 14/15.

      (2) Technical methods and data presentation:

      (2.1) Accuracy and dimensionality of distance measurements: The manuscript does not clearly state whether distances are measured in 2D or 3D, nor does it sufficiently address precision limits. The stated Z-step size (1 µm) may be inadequate for accurately measuring sub-micron chromatin distances in 3D.

      We state in both the Results and Methods that our data represent 2D distances derived from maximal-intensity projections of 3D image stacks. We previously published a detailed analysis of the precision of this measurement approach applied to chromatin interactions and documented the effect of 2D vs 3D analysis on these types of measurements. This study by Finn et al., 2022 is cited in the text. We also show in Figure S3 and mention on p. 6 and 10 that we observe similar results using either 2D or 3D analysis.

      (2.2) Probe design and systematic error: The genomic coverage size of the BAC probes used for DNA FISH is not explicitly stated. Large probe coverage could inherently blur the precise spatial location of adjacent DNA loci. The reported average distance (~300 nm) may be influenced by the physical size of the probes, as well as systematic expansion or distortion introduced by sample fixation and FISH processing. Although such technical limitations are currently unavoidable, the authors should clarify how these factors might affect their ability to detect subtle distance changes.

      The genomic location and size of all probes are provided in Supplementary Table 1. We deliberately use relatively large BAC probes both to generate robust, highly reproducible signals and to eliminate effects arising from local chromatin behavior. In line with earlier characterization of BAC probes (Finn et al., Cell, 2019; Finn et al., Methods, 2022), we find a strong correlation between micro-C/Hi_C interaction frequency and distance measurements. Systematic errors such as sample fixation and FISH processing have previously been evaluated by comparison to live cell data (see Finn et al., 2019) and found to be negligible, especially as all our analyses involve pairwise comparisons, which would both be similarly affected by systematic errors. We discuss resolution limits due to probe size in our new section on study limitations on p. 16.

      (2.3) Data Visualization: The manuscript would benefit from including representative, zoomed-in regions of interest from the raw imaging data. This would allow readers to visually assess measured distance differences against background noise.

      Raw images for inspection at any magnification are available at https://figshare.com/projects/_b_TAD_boundaries_and_gene_activity_are_uncoupled_b_/271078.

      (2.4) Potential impact of resolution limits: In Figure 5, the micro-C data reveal a clear difference in interaction patterns inside versus outside the VARS2 locus TAD, yet the imaging data show no corresponding distance difference. This strongly suggests that the current imaging system, limited by optical resolution, probe size, and localisation accuracy, may be unable to resolve finer-scale spatial reorganizations associated with specific chromatin conformations (e.g., enhancer-promoter loops). The authors should explicitly discuss that their conclusion of "no coupling observed" may be constrained by the resolution and sensitivity of their method and does not preclude the possibility of detecting such associations with higher-precision measurements or in live-cell dynamics.

      We generally see good agreement between micro-C/Hi-C data and distance measurements. Specifically, we consistently find closer proximity of boundaries than non-boundaries and larger boundary distances for larger TADs than for smaller ones, as presented throughout the study. Contrary to the reviewer’s statement, this is also true for the VARS2 TAD, where we find statistically significant shorter boundary distances for boundary probes (350 nm) vs the outside control region (390 nm), which correlates with the difference in micro-C interaction score of 5847 vs 2308. These data are shown in Figure 3. Regardless, we mention the issue of resolution due to probe size in the study limitation section on p. 16.

      Reviewer #2 (Public review):

      In untreated cells, the distribution of distance measurements between boundary probes is exceptionally narrow. While depletion of RAD21 clearly demonstrates an ability to detect changes in this distribution, this tight baseline distribution may limit sensitivity to more subtle changes (like those one might expect from transcriptional influences). In addition, the correlation analysis is asymmetric, primarily stratifying by transcriptional status and then comparing boundary distances. Given the central claim that boundary architecture does not influence gene activity, the analysis should be done from the opposite perspective (stratifying by boundary distance).

      We mention the limitations on resolution of our approach in our discussion of study limitations on p. 16. An example of an analysis of stratifying by boundary distance is presented in Figure S3C. The conclusion is the same as stratifying by activity status.

      Strong disruption of boundary distances is only observed upon depletion of cohesin. Notably, this corresponds with the largest changes in gene activity. In contrast, depletion of CTCF actually had minimal impact on boundary distances and also had minimal impact on gene activity. This makes sense in light of previous work, where live cell imaging demonstrated that cohesin is more important for domain-structure, whereas CTCF is only important for blocking cohesin from continuing on, such that the fully formed loop occurs in a very small percentage of cells. Therefore, the fact that disruption of cohesin (more important for internal domain structure) affects gene activity while disruption of CTCF does not is exceptionally interesting but is lacking from the discussion.

      We mention the stronger effect of cohesion depletion compared to CTCF loss on gene expression in multiple locations in the Results and Discussion.

      On a related note, this approach primarily tests the role of boundary interactions rather than domain organization as a whole, and it should be acknowledged that internal domain structures are not directly assessed.

      We have modified statements throughout the manuscript to clearly indicate that our conclusions relate to boundary interactions rather than domain organization as a whole. We also discuss this in our section on study limitations.

      The comparison to work in other organisms (particularly the comparisons made to Drosophila) should be handled with care. The mechanisms underlying domain formation differ substantially across these systems, particularly regarding the differences in CTCF's role.

      We have modified our discussion of the data on Drosophila TADs, particularly as it relates to CTCF.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I couldn't locate the image data from figshare with the information provided (DOI: 10.6084/m9.figshare.30728354)

      The link has been updated

      https://figshare.com/projects/_b_TAD_boundaries_and_gene_activity_are_uncoupled_b_/271078.

      Reviewer #2 (Recommendations for the authors):

      Some of the conclusions overreach. I recommend revising the claims and discussion to focus solely on the proximity of boundaries, instead of TADs themselves. This would match better with your experiments.

      We have modified statements throughout the manuscript, including in the title, to enhance the precision of our conclusions to avoid overreach. We have also added on p. 16, a separate section on limitations of our study, noting that our conclusions are limited to TAD boundary distances and do not reflect on the structure of the TADs themselves. We have also expanded our Discussion of possible TAD functions on p. 14/15.

      I do disagree with the interpretation of the data in some parts, particularly at the end, where you state that disruption of TADs does not impact gene activity. For example, "Altogether, these results demonstrate that disruption of TAD boundary architecture is insufficient to alter gene expression" doesn't seem to match the results. Sure, depletion of CTCF minimally impacted gene expression, but it also minimally impacted the boundary distances. I think it is interesting that depletion of RAD21 had a bigger impact on both gene expression and boundary distances, and this should be discussed.

      We have deleted this statement and now mention on p. 13 that RAD21 depletion affected gene expression, whereas loss of CTCF did not, and on p. 15 that loss of RAD21 had a greater impact on boundary distances than loss of CTCF. We have also expanded our Discussion of possible TAD functions on p. 14/15.

      Related to this, I also recommend expanding the discussion of prior live-cell imaging work (ref 32) that showed that the fully formed CTCF loop is a rare event.

      We have expanded the discussion of prior live-cell imaging work in several locations.

      All the analysis is done from the perspective of the gene expression (e.g. group by expression and then measure distances). It would help to show that the inverse analysis is consistent (e.g. group by distances and measure gene expression).

      Analysis of data stratified by distance measurements is shown in Figure S3C.

      The discussion of the Drosophila work is strange, given that CTCF in Drosophila has a very different N-terminus, explaining why it doesn't really form loops. Sure, maybe it contributes to domains in some way, but probably no more than the dozens of other architectural proteins that have been found in that system. This work clearly focuses on CTCF-loop domains, so I would be specific about that. In the introduction, you do a good job of saying "in human cells, TADs are.... marked by binding sites for the CTCF protein". However, then you overgeneralize and state that TADs form via a process of loop extrusion. I think a simple statement before this to say that TADs in human cells have become somewhat synonymous with CTCF loop domains, and that is how you will use the term here. However, other organisms have TADs despite the lack of conservation of the CTCF protein.

      We have modified the text accordingly.

      On a related note, in the discussion, you cite two papers in Drosophila to state that "TADs form prior to the establishment of cell-type-specific gene expression programs", but that's not entirely accurate for those papers. They actually show that TADs occur coincident with ZGA, but loops form before that (ref 23: Espinola et al), or that there are indeed a few boundaries that show up before ZGA, but these correspond to RNA Polymerase (ref 24: Ing-Simmons et al.).

      We have corrected this statement.

    1. eLife Assessment

      The manuscript presents important findings on how C. elegans can utilize distinct molecular mechanisms and circuit engagements to regulate tactile-dependent locomotory behaviours through the AFD thermosensory neuron. The authors use multiple techniques including microfluidics, genetic manipulations and single-copy rescue experiments, to provide compelling evidence for the role of AFD/AIB electrical synaptic connections in this behaviour. The reviewers are satisfied with the comprehensive revisions made by the authors.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Rosero and Bai examined how the well-known thermosensory neuron in C. elegans, AFD, regulates context-dependent locomotory behavior based on the tactile experience. Here they show that AFD uses discrete cGMP signalling molecules and independent of its dendritic sensory endings regulates this locomotory behavior. The authors also show here that AFD's connection to one of the hub interneurons, AIB, through gap junction/electrical synapses, is necessary and sufficient for the regulation of this context-dependent locomotion modulation.

      Strengths:

      This is an interesting paper showcasing how a sensory neuron in C. elegans can employ a distinct set of molecular strategies and different physical parts to regulate a completely distinct set of behaviors, which were not been shown to be regulated by AFD before. The experiments were well performed and the results are clear. However, there are some questions about the mechanism of this regulation. This reviewer thinks that the authors should address these concerns before the final published version of this manuscript.

      Comments on revisions:

      In this revised manuscript, Rosero and Bai satisfactorily addressed all the concerns raised by this reviewer regarding their original manuscript. This reviewer appreciates the authors' effort. This revised and improved manuscript demonstrates that a sensory neuron in C. elegans can utilize distinct molecular strategies and circuit engagements to regulate distinct sets of behaviors. This reviewer believes that the manuscript is suitable for final acceptance in eLife.

    3. Reviewer #2 (Public review):

      The goal of the study was to uncover the mechanisms mediating tactile-context-dependent locomotion modulation in C. elegans, which represents an interesting model of behavioral plasticity. Starting from a candidate genetic screen focusing on guanylate cyclase (GCY) mutants, the authors identified the AFD-specific gcy-18 gene as essential for tactile-context-dependent locomotion modulation. AFD has been primarily characterized as a thermosensory neuron. However, key thermosensory transduction genes and the sensory ending structure of AFD were shown here to be dispensable for tactile-context locomotion modulation. AFD actuates tactile-context locomotion modulation via the cell-autonomous actions of GCY-18 and the CNG-3 cyclic nucleotide-gated channel, and via AFD's connection with AIB interneurons through electrical synapses. At the circuit level, AIB also receive inputs from the mechanosensory neuron FLP, which was also shown to be relevant for tactile-context-dependent locomotion modulation.

      For this study, the authors combined a very clever microfluidic-based behavioral assay with a large set of genetic manipulations to dissect the molecular and cellular pathways involved. Rescue experiments with single-copy transgenes are particularly convincing. The study is very clearly written, and the figures are nicely illustrated with diagrams that effectively convey the authors' interpretation. Overall, the convergence of behavioral assays, genetics, and circuit analysis provides convincing support for the proposed role of the AFD-AIB connection, potentially downstream of FLP via synapic and of other mechanosensory neurons via extra-synaptic communication.

      The facts that AFD mediates tactile-context locomotion modulation, that this role relies on GCY-18, and on electrical synapses linking AFD to AIB are new, somewhat unexpected, and interesting. The study raises intriguing and addressable questions about the role of innexin-based cellular communication in a multimodal sensory-behavior microcircuit, including the direction and nature of the signal(s) transmitted through these electrical synapses. These questions remain difficult to address in most experimental systems. The compact and genetically tractable nervous system of C. elegans provides a powerful entry point for addressing them in the context of an intact in vivo circuit.

    4. Reviewer #3 (Public review):

      Summary:

      Rosero and Bai report an unconventional role of AFD neurons in mediating tactile-dependent locomotion modulation, independent of their well-established thermosensory function. They partially elucidate the signaling mechanisms underlying this AFD-dependent behavioral modulation. The regulation does not require the sensory dendritic endings of AFD but rather the AFD neurons themselves. This process involves a distinct set of cGMP signaling proteins and CNG channel subunits separate from those involved in thermosensation or thermotaxis. Furthermore, the authors demonstrate that AIB interneurons connect AFD to mechanosensory circuits through electrical synapses. They conclude that, beyond its primary function in thermosensation, AFD contributes to context-dependent neuroplasticity and behavioral modulation via broader circuit connectivity.

      While the discovery of multifunctionality in AFD is not entirely unexpected, given the limited number of neurons in C. elegans (302 in total), the molecular and cellular mechanisms underlying this AFD-dependent behavioral modulation, as revealed in this study, provide valuable insights into the field.

      Strengths:

      (1) The authors uncover a novel role of AFD neurons in mediating tactile-dependent locomotion modulation, distinct from their well-established thermosensory function, providing an important conceptual contribution to our understanding of how individual neurons can support multiple, mechanistically separable behavioral functions.

      (2) They provide meaningful mechanistic insight into how AFD, GCY-18-dependent cGMP signaling, and AFD-AIB electrical coupling contribute to this AFD-dependent behavioral modulation.

      (3) The neural behavior assays utilizing two types of microfluidic chambers (uniform and binary chambers) are innovative and well-designed. In the revised manuscript the authors introduce a removable-barrier assay that physically separates exploration and assay phases. This independent behavioral approach addresses prior concerns about ongoing sensory input and confirms that tactile experience alone is sufficient to modulate locomotion.

      (4) By comparing AFD's role in locomotion modulation to its thermosensory function throughout the study, the authors present strong evidence supporting these as two independent functions of AFD.

      (5) The finding that AFD contributes to context-dependent behavioral modulation is significant, further reinforcing the growing evidence that individual neurons can serve multiple functions through broader circuit connectivity.

      Weaknesses:

      While the requirement for AFD, GCY-18, and AFD-AIB electrical coupling is well supported, the directionality of information flow and the precise mode of interaction between mechanosensory neurons, AIB, and AFD remain unclear and an area of future studies.

      Overall, the authors successfully achieve their primary aim of identifying and characterizing a novel role for AFD in tactile experience-dependent locomotion modulation. This work contributes meaningfully to the growing body of literature demonstrating multifunctionality and context-dependent reconfiguration of individual neurons within compact nervous systems.

    5. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Although the reviewers agree on the potential importance of this study, they have brought out multiple pertinent queries with respect to the interpretation of some of the results presented in the manuscript, that the authors should consider addressing. The reviewers have also suggested modifications that would increase the clarity of the manuscript.

      We appreciate the thoughtful evaluation of our manuscript by the reviewers and the editor. We are encouraged by their recognition of the importance of our study and have carefully considered all the points raised. In response, we have added new data and revised the text to address the concerns and improve the clarity of the manuscript. Our detailed responses to the reviewers’ comments are provided below.

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Rosero and Bai examined how the well-known thermosensory neuron in C. elegans, AFD, regulates context-dependent locomotory behavior based on the tactile experience. Here they show that AFD uses discrete cGMP signaling molecules and independent of its dendritic sensory endings regulates this locomotory behavior. The authors also show here that AFD's connection to one of the hub interneurons, AIB, through gap junction/electrical synapses, is necessary and sufficient for the regulation of this context-dependent locomotion modulation.

      Strengths:

      This is an interesting paper showcasing how a sensory neuron in C. elegans can employ a distinct set of molecular strategies and different physical parts to regulate a completely distinct set of behaviors, which were not been shown to be regulated by AFD before. The experiments were well performed and the results are clear. However, there are some questions about the mechanism of this regulation. This reviewer thinks that the authors should address these concerns before the final published version of this manuscript.

      Weaknesses:

      (1) The authors argued about the role of prior exposure to different physical contexts which might be responsible for the difference in their locomotory behavior. However, the worms in the binary chamber (with both non-uniformly sized and spaced pillars) experienced both sets of pillars for one hour prior to the assay and they were also free to move between two sets of environments during the assay. So, this is not completely a switch between two different types of tactile barriers (or not completely restricted to prior experience), but rather a difference between experiencing a more complex environment vs a simple uniform environment. They should rephrase their findings. To strictly argue about the prior experience, the authors need to somehow restrict the worms from entering the uniform assay zone during the 1hr training period.

      We agree that, in the original design, worms in the binary chamber experience a more complex physical environment while retaining access to both exploration and assay zones. We have therefore revised the manuscript to more clearly distinguish between behavioral differences due to exposure to a complex environment and modulation driven by prior experience.

      To directly test whether locomotion modulation can be sustained by prior physical experience in the absence of continued access to the exploration zone, we introduced a barrier-based assay that prevents worms from re-entering the exploration zone before locomotion is measured. The results section has been revised accordingly to explicitly address this point.

      Revisions to the manuscript:

      Lines 122-139: Added two paragraphs describing the new assay and summarizing the corresponding results.

      “Because worms in the binary chamber are exposed to both pillar types and remain free to move between exploration and assay zones, the behavioral differences described above could reflect exposure to a more complex physical environment rather than prior experience alone. To directly test whether locomotion is modulated by prior physical experience independently of continued access to the exploration zone, we designed microfluidic chambers in which the assay zone could be separated from the exploration zone by a removable barrier (Fig. 1–Supplement 1A). In these chambers, worms were initially allowed to explore the entire device, including exploration zones that either matched or differed from the assay zone. A barrier was then inserted to prevent worms in the assay zone from re-entering the exploration zones.

      Under these conditions, locomotion immediately after barrier insertion was higher in worms that had previously explored physical settings matching the assay zone (205 ± 8 µm/s) than in worms that had explored non-matching settings (151 ± 7 µm/s; p = 0.006; Fig. 1–Supplement 1B). This difference persisted when worms were recorded 40 minutes after barrier insertion, with animals in matching chamber retaining their higher locomotion rates (218 ± 11 µm/s) compared to those in non-matching chambers (185 ± 8 µm/s; p = 0.02; Fig. 1–Supplement 1B). These findings demonstrate that prior exploration of distinct physical environments can modulate locomotion even when worms are prevented from returning to those environments, supporting a role for prior physical experience independent of ongoing sensory input.”

      Figure 1–Supplement 1: New figure showing the experimental design and behavioral results.

      (2) The authors here argued that the sensory endings of AFD are not required for this novel role of AFD in context-dependent locomotion modulation. However, gcy-18 has been shown to be exclusively localized to the ciliated sensory endings of AFD and even misexpression of GCY-18 in other sensory neurons also leads to localizations in sensory endings (Nguyen et. al., 2014 and Takeishi et. al., 2016). They should check whether gcy-18 or tax-2 gets mislocalized in kcc-3 or tax-1 mutants.

      As the reviewer suggested, we examined GCY-18 localization in wild type animals and in mutants with defective sensory microvilli using a split-GFP strategy (He et al., 2019). We generated a gcy18::gfp11×7 knock-in strain using CRISPR–Cas9 to visualize endogenous GCY-18 localization. Consistent with prior studies, GCY-18 localized strongly to the AFD dendritic ending in wild-type animals (Figure 4– Supplement 1A, A′, A′′), with an additional weaker signal detectable near the soma and axon (Figure 4– Supplement 1A′′′).

      In kcc-3 mutants, GCY-18 remained localized to the distal dendrite despite disruption of sensory microvillar morphology (Figure 4–Supplement 1B–B′′). Similarly, in ttx-1 mutants, which completely lack AFD sensory microvilli, GCY-18 still localized to the distal dendrite (Figure 4–Supplement 1C–C′′) and remained detectable near the soma and axon (Figure 4–Supplement 1C′′′).

      In the revised manuscript, we clarify both the implications and the limitations of these imaging experiments, noting that “although these experiments do not identify the precise subcellular site at which GCY-18 acts, they show that disruption of sensory microvilli does not substantially alter GCY-18 localization within AFD.” The exact site at which GCY-18 functions to support locomotion modulation therefore remains an important open question for future investigation.

      Revisions to the manuscript:

      Figure 4-Supplement 1: Added a new figure reporting GCY-18 localization in wild type and mutant worms.

      Lines 268-280: Added a new paragraph reporting GCY-18 localization in wild type, kcc-3, and ttx-1 mutants and clarifying its relevance to the reviewer’s concern.

      “Given that gcy-18 is required for context-dependent locomotion modulation and that GCY-18 localizes to the distal dendrite of AFD, we next examined how disruption of sensory microvilli affects its localization in AFD. We used a split-GFP strategy to visualize endogenous GCY-18 [73]. A tandem array of seven GFP11 β-strands (GFP11x7) was inserted at the C-terminus of GCY-18 using CRISPR-Cas9. When complemented with GFP1-10, GCY-18::GFP11x7 fluorescence was strongly enriched at the AFD sensory microvilli near the nose (Fig. 4–Supplement 1A-A′′), consistent with previous reports [42,74,75]. In addition, weaker but reproducible GCY-18 signal was detected near the AFD soma and axon (Fig. 4–Supplement 1A′′′). Importantly, in kcc-3, which exhibit disrupted sensory microvilli, and ttx-1 mutants, which lack sensory microvilli, GCY-18 remained localized to the distal dendrite and was still detectable near the soma and axon (Fig. 4–Supplement 1B-B′′’ and 1C-C′′′). Although these experiments do not identify the precise subcellular site at which GCY-18 acts, they show that disruption or loss of sensory microvilli does not substantially alter GCY-18 localization within AFD.”

      (3) MEC-10 was shown to be required for physical space preference through its action in FLP and not the TRNs (PMID: 28349862). Since FLP is involved in harsh touch sensation while TRNs are involved in gentle touch sensation, which are the neuron types responsible for tactile sensation in the assay arena? Does mec-10 rescue in TRNs rescue the phenotype in the current paper?

      We performed cell-specific rescue experiments of mec-10. Single-copy expression of mec-10 cDNA in either FLP neurons alone (egl-44p) or TRNs alone (mec-18p) did not restore context-dependent locomotion modulation (Fig. 5A). In contrast, co-expression in both FLP and TRNs (egl-44p::mec-10 + mec18p::mec-10), as well as expression from the mec-10 promoter, rescued the phenotype.

      These results indicate that input from multiple mec-10-expressing neurons, including both FLP and TRNs, is required for context-dependent locomotion adjustment. This requirement differs from spatial preference behavior, where mec-10 acts specifically in FLP (Han et al., 2017), suggesting distinct mechanosensory circuits are engaged by different tactile-driven behaviors.

      Revisions to the manuscript:

      Fig. 5A: Updated to include the cell-specific rescue data.

      Lines 317-331: Added a new paragraph describing these findings.

      “The mec-10 gene is expressed in several mechanosensory neurons, including the six touch receptor neurons (TRNs) and the polymodal nociceptors FLP and PVD [77,79]. To determine which neurons are required for tactile-dependent locomotion modulation, we expressed mec-10 cDNA under cell-specific promoters: mec-18p (TRNs) [80], egl-44p (FLP) [81], or mec-10p (TRNs, FLP, and PVD) [79]. Expression in either FLP or TRNs alone did not restore modulation, as worms carrying egl-44p::mec-10 (Δspeed: -11± 4%) or mec-18p::mec-10 (Δspeed: -13 ± 4%) transgenes showed significantly reduced Δspeed compared to wild type (Δ speed: N2: 33 ± 6%; p < 0.0001 for both; Fig. 5A). By contrast, mec-10 co-expression in both FLP and TRNs (Δspeed: 16 ± 4%), or expression from the mec-10 promoter (Δspeed: 23 ± 4%), restored Δ speed to wild type levels (p = 0.20 and p = 0.57, respectively; Fig. 5A). These findings indicate that mec10 expression across multiple mechanosensory neuron types is required for context-dependent locomotion modulation. It is also worth noting that, while both tactile-dependent locomotion modulation and previously reported spatial preference require FLP, only the former depends on TRNs. Together, these findings suggest that distinct subsets of mechanosensory neurons differentially contribute to behaviors shaped by tactile experience.”

      (4) The authors mention that the most direct link between TRNs and AFD is through AIB, but as far as I understand, there are no reports to suggest synapses between TRNs and AIB. However, FLP and AIB are connected through both chemical and electrical synapses, which would make more sense as per their mec10 data. (the authors mentioned about the FLP-AIB-AFD circuit in their discussion but talked about TRNs as the sensory modality). mec-10 rescue experiment in TRNs would clarify this ambiguity.

      We agree with the reviewer that there are no reported synapses between TRNs and AIB, and we have revised Fig. 5 and the corresponding text to clarify this point. In the revised manuscript, we removed any implication of a direct TRN-AIB connection and instead focus on the established FLP-AIB-AFD pathway, while considering potential indirect contributions from TRNs.

      As the reviewer suggested, we performed cell-specific mec-10 rescue experiments. Expression of mec-10 in either FLP alone or TRNs alone was insufficient to restore tactile-dependent locomotion modulation, whereas co-expression in both cell types rescued the phenotype (revised Fig. 5A). These results indicate that FLP is essential for this behavior, consistent with the known FLP-AIB-AFD connectivity, and that TRNs are also required.

      Given that TRNs lack direct synapses with AIB, TRN requirement suggests the involvement of indirect communication, likely mediated through modulatory mechanisms such as neuropeptide signaling. Accordingly, we have revised the model (revised Fig. 5C) and the corresponding text to clarify that tactiledependent locomotion modulation integrates inputs from multiple mec-10-expressing neurons and does not rely on a direct TRN-AIB synaptic connection.

      Revisions to the manuscript:

      Lines 334–345: Revised paragraph to clarify circuit logic and remove implication of direct TRN-AIB synapses.

      “Touch-sensitive neurons that express mec-10, including TRNs, FLP, and PVD, do not form direct synapses with AFD, suggesting that tactile information is relayed through intermediary neurons. Because the interneuron AIB receives synaptic input from FLP and forms electrical synapses with AFD, we hypothesized that AIB could serve as a conduit for mechanosensory signals to reach AFD. To test whether AIB is required for tactile-dependent modulation, we examined locomotion in worms with genetically ablated AIB neurons using npr-9p::caspase expression [82]. AIB-ablated worms failed to adjust locomotion speed, showing a near-complete loss of modulation (∆speed: -1 ± 5%) compared to wild type (30 ± 8%, p = 0.001, Fig. 5B). These results demonstrate that AIB is required for AFD-mediated tactile-dependent locomotion modulation. However, because mec-10-expressing TRNs are also required, additional pathways beyond AIB likely contribute to transmitting tactile information to AFD, potentially involving indirect synaptic connections through other interneurons or long-distance signaling via neuropeptides or other modulators (Fig. 5C).”

      Fig. 5: Updated to include new cell-specific mec-10 rescue data and revised model.

      (5) Do inx-7 or inx-10 rescue in AFD and AIB using cell-specific promoters rescue the behavior?

      Yes. We tested this during revision. Using the AFD-specific srtx-1b promoter, we expressed inx10 cDNA selectively in AFD neurons of inx-10 mutant worms. This manipulation significantly restored tactile-dependent locomotion modulation compared to non-transgenic inx-10 mutants (Fig. 6D), demonstrating that inx-10 expression in AFD alone is sufficient to rescue the behavioral defect.

      Revisions to the manuscript:

      Line 366-370: Added a description of the AFD-specific inx-10 rescue results.

      “We next tested whether restoring inx-10 specifically in AFD would be sufficient to rescue the behavioral defect. Using the AFD-specific srtx-1b promoter, we expressed inx-10 cDNA in inx-10 mutant worms. These transgenic animals displayed significantly improved locomotion modulation (∆speed: 42 ± 5%) compared to non-transgenic inx-10 mutants (15 ± 4%; p = 0.018; Fig. 6D), indicating that inx-10 expression in AFD alone is sufficient to restore function.”

      Fig. 6D: Updated to include new cell-specific inx-10 rescue data.

      (6) How Guanylyl cyclase gcy-18 function is related to the electrical synapse activity between AFD and AIB? Is AFD downstream or upstream of AIB in this context?

      At present, the precise relationship between GCY-18 signaling and the AFD-AIB electrical synapse is not fully resolved. Given that AIB receives mechanosensory input from FLP, it is likely that AIB acts upstream of AFD during tactile-dependent locomotion modulation. However, because the AIB-AFD connection is mediated by gap junctions, communication could also be bi-directional, especially since small signaling molecules such as cGMP and Ca<sup>2+</sup> are known to diffuse through electrical synapses.

      We have therefore revised the manuscript to state explicitly that the directionality of information flow between AFD and AIB remains open, and that this will be an important question for future investigation (Line 455-458).

      “Together, these findings support a model in which AIB functions as a hub neuron that relays mechanosensory input from FLP to AFD to modulate locomotion (Fig. 5C). However, because electrical synapses are often bidirectional, information flow may also occur in the opposite direction, from AFD to AIB.”

      Reviewer #2 (Public review):

      Summary:

      The goal of the study was to uncover the mechanisms mediating tactile-context-dependent locomotion modulation in C. elegans, which represents an interesting model of behavioral plasticity. Starting from a candidate genetic screen focusing on guanylate cyclase (GCY) mutants, the authors identified the AFDspecific gcy-18 gene as essential for tactile-context-dependent locomotion modulation. AFD is primarily characterized as a thermo-sensory neuron. However, key thermosensory transduction genes and the sensory ending structure of AFD were shown here to be dispensable for tactile-context locomotion modulation. AFD actuates tactile-context locomotion modulation via the cell-autonomous actions of GCY-18 and the CNG-3 cyclic nucleotide-gated channel, and via AFD's connection with AIB interneurons through electrical synapses. This represents a potentially relevant synaptic connection linking AFD to the mechanosensory-behavior circuit.

      Strengths:

      (1) The fact that AFD mediates tactile-context locomotion modulation is new, rather surprising, and interesting.

      (2) The authors have combined a very clever microfluidic-based behavioral assay with a large set of genetic manipulations to dissect the molecular and cellular pathways involved. Rescue experiments with singlecopy transgenes are very convincing.

      (3) The study is very clearly written, and figures are nicely illustrated with diagrams that effectively convey the authors' interpretation.

      Weaknesses:

      (1) Whereas GCY-18 in AFD and the AFD-AIB synaptic connection clearly play a role in tactile-context locomotion modulation, whether and how they actually modulate the mechanosensory circuit and/or locomotion circuit remains unclear. The possibility of non-synaptic communication linking mechanosensory neurons and AFD (in either direction) was not explored. Thus, in the end, we have not learned much about what GCY-18 and the AFD-AIB module are doing to actuate tactile context-dependent locomotion modulation.

      We agree with the reviewer that although GCY-18 in AFD and the AFD-AIB connection are clearly required for tactile context-dependent locomotion modulation, the precise mechanisms by which they influence mechanosensory and locomotor circuits remain unresolved. In particular, the possibility of nonsynaptic communication or bidirectional signaling between mechanosensory neurons and AFD cannot be addressed by the current experiments and warrants future investigation.

      At the same time, we believe this study reveals several previously unrecognized aspects of tactiledependent locomotion modulation that provide a foundation for future mechanistic investigation.

      Specifically, we show that (i) GCY-18 functions in AFD to support tactile-dependent locomotion modulation; (ii) the cGMP-gated channel TAX-4, required for thermosensation, is dispensable for this process, whereas CNG-3 is required, revealing functional specialization within AFD; (iii) the interneuron AIB is necessary for this modulation; and (iv) restoring a single electrical connection between AFD and AIB using mammalian Cx36 is sufficient to rescue tactile-dependent modulation in innexin mutants.

      Accordingly, we now explicitly state in the revised Discussion that “a limitation of this study is that the directionality and mode of information flow between AFD and AIB remain unresolved, and defining this relationship will be an important goal for future investigation” (Line 472-475).

      (2) The authors only focused on speed readout, and we don't know if the many behavioral parameters that are modulated by tactile context are also under the control of AFD-mediated modulation.

      We used locomotion speed as the primary behavioral readout because it provides a robust measure for detecting whether behavior is modified by prior tactile experience, rather than to capture the full spectrum of motor outputs. This strategy is often used to assess experience-dependent behavioral plasticity across sensory modalities and enabled us to uncover the unexpected role of AFD in tactile-dependent plasticity.

      In the revised manuscript, we expanded our analysis to include additional behavioral parameters. As described in the Results, AFD-ablated worms showed a complete loss of context-dependent modulation not only in speed, but also in idle time and turning frequency, with no detectable differences between uniform and binary chambers (Fig. 4E). These data strengthen the conclusion that AFD broadly supports tactiledependent behavioral modulation rather than selectively affecting a single locomotor parameter.

      Revisions to the manuscript:

      Fig. 4E: Revised panel to include additional locomotion parameters, including idle time and turning frequency, in wild type and AFD-ablated worms.

      Lines 283–285: Expanded the results to describe changes in locomotion speed, idle time, or turning frequency of AFD-ablated mutant worms. “These animals showed no detectable differences between uniform and binary chambers in locomotion speed, idle time, or turning frequency (Fig. 4E).”

      (3) The AFD-AIB gap junction reconstruction experiment was conducted in an innexin double mutant background, in which the whole nervous system's functioning might be severely impaired, and its results should be interpreted with this limitation in mind.

      We appreciate the reviewer’s concern that the innexin double-mutant background may broadly affect nervous system function, and we agree that loss of innexins is not restricted to the AFD-AIB synapse and could introduce global circuit perturbations.

      Importantly, however, the specificity of the rescue is informative. In an innexin double-mutant background, where electrical coupling is broadly disrupted, re-establishing a single electrical synapse between AFD and AIB using Cx36 was sufficient to restore tactile-dependent locomotion modulation (Fig. 6D). The ability of a targeted AFD-AIB connection to rescue behavior despite the absence of many other electrical synapses argues against a purely global network defect and instead identifies the AFD-AIB electrical synapse as a critical locus for this modulation.

      To further address this concern, we performed an additional rescue experiment in a less perturbed genetic background. In the revised manuscript, we show that AFD-specific expression of inx-10 rescues locomotion modulation in inx-10 single mutants (Fig. 6D). Together, these complementary rescue approaches, one restoring endogenous innexin function in AFD and the other reconstituting an electrical synapse using Cx36, support the conclusion that AFD-AIB electrical coupling is sufficient to enable tactile-dependent locomotion modulation, rather than reflecting nonspecific recovery of global circuit function.

      Revision to the manuscript:

      Fig. 6D and Lines 366-370: Added new data and revised text showing that AFD-specific inx-10 expression restores tactile-dependent locomotion modulation.

      “We next tested whether restoring inx-10 specifically in AFD would be sufficient to rescue the behavioral defect. Using the AFD-specific srtx-1b promoter, we expressed inx-10 cDNA in inx-10 mutant worms. These transgenic animals displayed significantly improved locomotion modulation (∆speed: 42 ± 5%) compared to non-transgenic inx-10 mutants (15 ± 4%; p = 0.018; Fig. 6D), indicating that inx-10 expression in AFD alone is sufficient to restore function.”

      Reviewer #3 (Public review):

      Summary:

      Rosero and Bai report an unconventional role of AFD neurons in mediating tactile-dependent locomotion modulation, independent of their well-established thermosensory function. They partially elucidate the signaling mechanisms underlying this AFD-dependent behavioral modulation. The regulation does not require the sensory dendritic endings of AFD but rather the AFD neurons themselves. This process involves a distinct set of cGMP signaling proteins and CNG channel subunits separate from those involved in thermosensation or thermotaxis. Furthermore, the authors demonstrate that AIB interneurons connect AFD to mechanosensory circuits through electrical synapses. They conclude that, beyond its primary function in thermosensation, AFD contributes to context-dependent neuroplasticity and behavioral modulation via broader circuit connectivity.

      While the discovery of multifunctionality in AFD is not entirely unexpected, given the limited number of neurons in C. elegans (302 in total), the molecular and cellular mechanisms underlying this AFD-dependent behavioral modulation, as revealed in this study, provide valuable insights into the field.

      Strengths:

      (1) The authors uncover a novel role of AFD neurons in mediating tactile-dependent locomotion modulation, distinct from their well-established thermosensory function.

      (2) They provide partial insights into the signaling mechanisms underlying this AFD-dependent behavioral modulation.

      (3) The neural behavior assays utilizing two types of microfluidic chambers (uniform and binary chambers) are innovative and well-designed.

      (4) By comparing AFD's role in locomotion modulation to its thermosensory function throughout the study, the authors present strong evidence supporting these as two independent functions of AFD.

      (5) The finding that AFD contributes to context-dependent behavioral modulation is significant, further reinforcing the growing evidence that individual neurons can serve multiple functions through broader circuit connectivity.

      Weaknesses:

      (1) Limited Behavioral Assays: The study relies solely on neural behavior assays conducted using two types of microfluidic chambers (uniform and binary chambers) to assess context-dependent locomotion modulation. No additional behavioral assays were performed. To strengthen the conclusions, the authors should validate their findings using an independent method, at the very least by testing AFD-ablated animals and gcy-18 mutants with a second behavioral approach.

      The reviewer points out that the original study relied on locomotion assays in two microfluidic environments (uniform and binary chambers) and suggests validation using an independent behavioral approach, particularly for AFD-ablated animals and gcy-18 mutants.

      To address this concern, we developed an independent behavioral assay in which the exploration and assay environments are physically separated by a removable barrier (Figure 1–Supplement 1A). In this design, worms first explored distinct physical settings, after which a barrier was inserted to confine them to an identical assay zone. This approach allowed us to directly test whether context-dependent locomotion modulation can be maintained when worms are prevented from re-entering the exploration environment and must rely solely on prior experience.

      Using this assay, we found that wild-type worms that had previously explored environments matching the assay zone moved significantly faster than those that had explored non-matching environments (Figure 1– Supplement 1B-C). These results demonstrate that context-dependent locomotion modulation is retained even when ongoing sensory input from the exploration zone is eliminated, independently validating our original findings using a distinct behavioral paradigm.

      Further, using this same assay, we found that locomotion modulation was significantly impaired in both gcy-18 mutants and AFD-ablated worms (Figure 4–Supplement 2A). Together, these results provide independent behavioral evidence supporting the conclusion that AFD and gcy-18 are required for contextdependent locomotion modulation.

      Revision to the manuscript:

      Figure 1–Supplement 1A: Added schematic and results from the removable-barrier assay in wild type animals.

      Lines 120-137: Added corresponding Results text describing the new assay and wild-type behavior.

      “Because worms in the binary chamber are exposed to both pillar types and remain free to move between exploration and assay zones, the behavioral differences described above could reflect exposure to a more complex physical environment rather than prior experience alone. To directly test whether locomotion is modulated by prior physical experience independently of continued access to the exploration zone, we designed microfluidic chambers in which the assay zone could be separated from the exploration zone by a removable barrier (Fig. 1–Supplement 1A). In these chambers, worms were initially allowed to explore the entire device, including exploration zones that either matched or differed from the assay zone. A barrier was then inserted to prevent worms in the assay zone from re-entering the exploration zones.

      Under these conditions, locomotion immediately after barrier insertion was higher in worms that had previously explored physical settings matching the assay zone (205 ± 8 µm/s) than in worms that had explored non-matching settings (151 ± 7 µm/s; p = 0.006; Fig. 1–Supplement 1B). This difference persisted when worms were recorded 40 minutes after barrier insertion, with animals in matching chamber retaining their higher locomotion rates (218 ± 11 µm/s) compared to those in non-matching chambers (185 ± 8 µm/s; p = 0.02; Fig. 1–Supplement 1B). These findings demonstrate that prior exploration of distinct physical environments can modulate locomotion even when worms are prevented from returning to those environments, supporting a role for prior physical experience independent of ongoing sensory input.” Figure 4–Supplement 2A: Added data for gcy-18 mutants and AFD-ablated worms in the removable barrier assay.

      Lines 288-296: Added text describing behavioral defects in gcy-18 mutants and AFD-ablated worms using the new assay.

      “Building on our finding that locomotion modulation can be driven by prior physical experience even after worms are prevented from re-entering the exploration zones, we next tested whether AFD is required for this modulation using chambers in which the exploration and assay zones were separated by a removable barrier (Fig. 1–Supplement 1A). Under these conditions, locomotion modulation was significantly reduced in AFD-ablated worms (∆speed: -AFD = 1 ± 6% vs. N2 = 23 ± 7%; p = 0.036; Fig. 4–Supplement 2A). Similarly, gcy-18 mutants showed defective locomotion modulation (∆speed: gcy-18 = -1 ± 8% vs. N2 = 23 ± 7%; p = 0.034; Fig. 4–Supplement 2A). These results indicate that AFD and gcy-18 are required to generate locomotion modulation in response to recent physical experience, even when continued access to surrounding environments is restricted.”

      (2) Clarity in Behavioral Assay Methodology: The methodology for conducting the behavioral assays is unclear. It appears that worms were free to move between the exploration and assay zones, with no control over the duration each worm spent in either zone. This lack of regulation may introduce variability in tactile experience across individuals, potentially affecting the reproducibility and quantitativeness of the method. The authors should clarify whether and how they accounted for this variability.

      In the primary assay, worms were allowed to move freely between the exploration and assay zones for one hour, and each animal’s tactile experience depended on its exploratory trajectory. To address the resulting variability, we performed an a priori power analysis, which determined that approximately 160 worms distributed across more than 20 chambers per condition were sufficient to obtain reliable populationlevel measurements. This sampling strategy was applied consistently across all experiments. Accordingly, analyses emphasize well-powered population means rather than individual trajectories, ensuring robust and reproducible comparisons despite variability in individual experience.

      In addition, as described above, we developed a removable-barrier assay that eliminates variability from ongoing exploration by confining worms to the assay zone after a defined exploration period. The consistency of behavioral effects across both assays further supports the robustness and reproducibility of the approach.

      (3) Potential Developmental and Behavioral Confounds in Mutant Analysis: Several neuronal mutant strains were used in this study, yet the effects of these mutations on development and general behavior (e.g., movement ability) were not discussed. Although young adult worms were used for behavioral assays, were they at similar biological ages? To rule out confounding factors, locomotion assays assessing movement ability should be conducted (see reference PMID 25561524).

      To address the possibility that behavioral phenotypes in mutant strains arise from developmental defects or impaired general locomotion, we directly measured locomotion speed on agar plates and body length in gcy-18 mutant and AFD-ablated worms. Neither genotype showed defects in basal locomotion speed or body length compared to wild type animals (Figure 4–Supplement 2B-C), indicating that the observed modulation defects are not explained by impaired development or gross motor ability.

      To further control for developmental variability, all behavioral assays were performed using agesynchronized populations. Animals were selected at a defined gravid adult stage, identified by the presence of 5-10 eggs arranged in a single row within the gonad. All mutant strains reached this developmental stage approximately three days after egg laying, comparable to wild type animals.

      Revision to the manuscript:

      Figure 4–Supplement 2B-C: Added quantification of locomotion speed on agar plates and body length for gcy-18 mutants and AFD-ablated worms.

      Lines 297-304: Added text describing the data presented in Figure 4–Supplement 2B-C.

      “Finally, to determine whether the modulation defects observed in gcy-18 mutants and AFD-ablated worms could be attributed to developmental abnormalities or gross motor impairments, we measured locomotion speed and body length on standard NGM plates. Both day-1 adult AFD-ablated worms (speed: 281 ± 10 µm/s; p = 0.33; body length: 1.12 ± 0.01 mm; p = 0.76) and gcy-18 mutants (speed: 291 ± 13 µm/s; p = 0.22; body length: 1.15 ± 0.02 mm; p = 0.86) showed locomotion speeds and body lengths comparable to wild type controls (speed: 252 ± 30 µm/s; body length: 1.14 ± 0.02 mm; Fig. 4–Supplement 2B, C). These results indicate that the loss of context-dependent locomotion modulation is not due to developmental defects or gross impairments in locomotion.”

      (4) Definition and Baseline Measurements for Locomotion Categories: The finding that tax-4 and kcc-3 contribute to basal locomotion but not to context-dependent locomotion modulation is intriguing. The authors argue that distinct mechanisms regulate these two processes; however, the study does not clearly define the concepts of "basal locomotion" and "context-dependent locomotion," nor does it provide baseline measurements. A clear definition and baseline data are needed to support this conclusion.

      We define basal locomotion as the locomotion speed of worms measured in the binary chamber, where wild-type animals consistently exhibit lower locomotion rates. Measurements from the binary chamber therefore serve as the baseline reference for locomotion speed in our microfluidic assays. Context-dependent locomotion modulation is defined as the quantified difference in locomotion speed between worms in uniform chambers and those in binary chambers. These definitions are now stated in:

      Lines 199-201: “We examined the locomotion speed of mutant worms in the binary chambers, which we refer to as the basal speed because wild type worms consistently move slowest in this environment.”

      Lines 645-46: “Asterisks above horizontal black lines indicate statistically significant differences in basal speed, defined as speed of worms in the binary chamber”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The availability of strains has not been mentioned. This should be addressed.

      The revised Methods section now includes a complete list of strains used in this study, and we have added a statement indicating that all strains are available upon request.

      Minor comment:

      Figure 1C - it should be Idle, not Idel.

      We have corrected the y-axis label in Figure 1C to ‘Idle.’

      Reviewer #2 (Recommendations for the authors):

      This is an interesting and well-written article, which I greatly appreciated reading. There are a few concerns that the authors should address, in my opinion, to provide a more complete and convincing story.

      Major points:

      (1) Maybe the material transmitted to me was incomplete, but I did not find the gcy gene screen results. It seems important to present the screen results in full, together with the description of the alleles tested for the 24 gcy genes.

      The revised manuscript now includes the complete results of the gcy mutant screen in Figure 2– Supplement 1, with the alleles tested for all 24 gcy genes listed in Table S1.

      (2) I did not find the actual p-values, sample sizes for each condition, or raw data; nor a data availability statement indicating where to retrieve these.

      Statistical significance is indicated by asterisks in all figures, with definitions provided in each figure legend (n.s., p > 0.05; *, p < 0.05; **, p < 0.01; ***, p < 0.001). Sample sizes are shown as individual data points in the plots, and we have now added explicit n values to each figure legend for clarity. A Data Availability Statement has also been added to indicate where the raw data can be accessed. Where possible, we have included exact p-values. For analyses using Tukey-Kramer post hoc tests, p-values are reported to four decimal places, reflecting the output limits of the statistical software used.

      (3) It is not clear why the authors only quantified animal speed for most of the study. What about idle time, turns, and reversals? This choice limits the reach of the study, as we only partly understand what AFD is doing, notably to explain the phenotype in the preference assay.

      Data on idle time, turning frequency, and reversal frequency for wild-type worms are now included in Figure 1F. In addition, we present new data showing that AFD ablation disrupts context-dependent modulation of locomotion speed, idle time, and turning frequency (Figure 4E).

      (4) Figure 2D and related text: these conclusions are based on a single mutant analysis. Were the millionmutation project lines outcrossed? It would be much more convincing if more gcy alleles were tested (this should be relatively easy since classical alleles are available at the CGC for gcy-8 and gcy-18).

      The million-mutation project lines used in this study were outcrossed prior to analysis. In addition, we confirmed that the observed defects were specifically due to loss of gcy-18 function by rescuing the phenotype through expression of gcy-18 cDNA under AFD-specific promoters. This cell-specific rescue shows that the behavioral defects arise from disruption of gcy-18 rather than from background mutations.

      (5) It is hard to interpret the speed phenotype when the authors switch between Delta speed and absolute speed display from one figure to another, or even from one panel to another. If only tax-4 and kcc-3 display a constitutive speed phenotype, then there should be no problem showing the absolute speed data in every panel. This is important to convince the reader that major speed changes in mutants are not biasing the interpretation based on Deltas. Indeed, if some mutants move very fast, there might be a ceiling effect. Conversely, if they move very slowly, there might be a 'sickness' effect. Both effects could prevent seeing a tactile-context-dependent modulation, and the results would need to be interpreted much more carefully. Providing the full view on absolute speed levels would also really help support the whole discussion paragraph about the differential regulation of constitutive versus context-dependent locomotion (from L339 onward).

      We focus on ∆speed because it directly quantifies experience-dependent locomotion modulation relative to each strain’s own baseline, making it an appropriate metric for comparing tactile plasticity across genotypes. This approach avoids confounding effects from strain-specific differences in overall locomotion levels.

      At the same time, we agree that absolute locomotion speed is important to consider when interpreting behavioral phenotypes. To address this, we added plate-based locomotion speed and body length measurements for two key genotypes that lack modulation, gcy-18 mutants and AFD-ablated worms (Figure 4–Supplement 2B–C). Both exhibit normal locomotion on agar plates, indicating that their defects in tactiledependent modulation are not due to impaired motor ability or general sickness.

      In addition, among the mutants tested in microfluidic chambers, tax-4 mutants display elevated basal speed yet retain robust context-dependent modulation, indicating that ceiling effects do not limit detection of modulation.

      (6) The gap junction expression is a nice experiment. But there is a major limitation that should be stated: the electrical synapse re-construction is made in a double mutant background in which the whole animal circuitry might be severely affected. It might well be that the restoration of behavioral plasticity represents something totally irrelevant to wild-type nervous system functioning. A cell-specific innexin knockout is needed to fully support the relevance of the AFD-AIB connection.

      We agree that reconstruction of an electrical synapse in an innexin double-mutant background carries the limitation that global circuit function may be broadly affected. To address this concern, we performed an additional rescue experiment in a less perturbed genetic background.

      As described above, we show that AFD-specific expression of inx-10 is sufficient to restore tactiledependent locomotion modulation in inx-10 single mutants (Fig. 6D). This cell-specific rescue does not rely on a double-mutant background and converges on the same outcome as the Cx36-based electrical synapse reconstruction. Together, these complementary approaches support the conclusion that restoring AFD-AIB coupling is sufficient to enable tactile-dependent locomotion modulation, rather than reflecting nonspecific recovery from global circuit disruption.

      (7) How was developmental age controlled? It seems that all genotypes were grown for a fixed duration (72h). Some mutants, like gcy-8, might grow slower. It would be useful to at least provide control data in wildtype animals showing that behavioral performance is similar even in slightly younger animals (covering the developmental age of the youngest mutant).

      Developmental age was controlled by strict age synchronization and staging criteria rather than growth duration alone. Worms were synchronized by allowing 40-50 young adults to lay eggs on OP50-seeded NGM plates for two hours, after which adults were removed. Developmental stage was further assessed by gonadal morphology, and only young adult animals with 5-10 eggs arranged in a single row were selected for behavioral assays. Using these criteria, all strains, including mutants, consistently reached the assayed stage approximately three days after egg laying, comparable to wild type animals.

      To further address the possibility that subtle developmental differences could influence behavior, we measured locomotion speed on agar plates and body length for genotypes that show defects in contextdependent modulation. gcy-18 mutants and AFD-ablated worms exhibited normal locomotion rates and body size, indicating that their behavioral phenotypes are unlikely to arise from developmental delay or impaired general motor ability. These control data are now included in the revised manuscript (Figure 4– Supplement 2B–C).

      (8) Plasmid construction description is entirely lacking.

      Description of plasmid construction has been added to the revised Methods.

      Minor points:

      (1) 'Context-dependent locomotion' should be replaced by 'tactile context-dependent locomotion' or something similar throughout the manuscript when referring to the impact of the pillar environment.

      Presently, this phrasing shortcut makes the communication too vague throughout, and even confusing when presenting the result of supplementary Figure 2 (where both thermal and tactile contexts are manipulated).

      We appreciate this suggestion and have revised the terminology for clarity where appropriate. Prior to introducing the mechanosensory origin of the modulation (that is, before presenting the mec-10 data), we retain the broader term “context-dependent modulation” to avoid presupposing a tactile mechanism before it is experimentally established.

      (2) L97: Suggested change along the same lines: "prior experience" -> "prior tactile experience".

      We have made this change as suggested.

      (3) Figure 1A: Would the author consider swapping the order of conditions displayed in this diagram? It would make more sense to have the same left-to-right order in the whole figure with the binary chamber on the left, particularly since the author describes the results considering the binary chamber as the 'reference point'.

      The order of chambers in Figure 1A has been revised as suggested, with the binary chamber now shown on the left.

      (4) Figure 1C: 'idel' typo in the axis label.

      The y-axis label has been updated from “idel” to “idle.”

      (5) Without AFD-specific manipulations, the data with tax-4 and tax-2 mutants provide limited information regarding TAX-4 and TAX-2 role in AFD. It should be explicitly mentioned in the Results section that they might work in other neurons.]

      The revised manuscript now explicitly states that the tax-2(p694) allele affects multiple neurons, including BAG, ASE, ADE, and AFD (Lines 421–422).

      (6) L220-222: The strict meaning of this sentence implies that one attributes a role to AFD in controlling constitutive locomotion, but none of the presented data directly shows this (both kcc-3 and tax-4 mutant phenotypes could arise from additional neurons, regardless of any perturbation in AFD). This should be corrected.

      To avoid implying that AFD directly controls constitutive locomotion, we have removed the sentence in question, “Together, these findings suggest that the role of AFD neurons in modulating context-dependent locomotion is distinct from their thermosensory functions and differs from the mechanisms controlling basal locomotion”, from the revised manuscript.

      (7) L328-329: Overstatement. Without AFD-specific manipulation of TAX-2 and TAX-4, the different mutant phenotypes could be due to different cell types, rather than different protein pairs in the channel heteromers. I would recommend addressing this alternative possibility directly in the discussion, rather than focusing only on one (I agree, very cool) possibility.

      We have clarified this point in the revised text. We now explicitly note that the tax-2(p694) mutation affects tax-2 expression in multiple neurons (AFD, BAG, ASE, and ADE) (Lines 421–422).

      Reviewer #3 (Recommendations for the authors):

      (1) Clarification of inx Gene Expression Analysis (Lines 270-271): The authors should specify how the expression of inx genes in individual neurons was identified.

      The revised manuscript now specifies that innexin expression patterns were identified using the CeNGEN single-cell transcriptomic database (Lines 352–354).

      (2) Cx36 Expression in AFD and AIB (Lines 287-288): Further clarification is needed on how Cx36 expression was achieved in AFD and AIB.

      We have clarified that Cx36 was expressed specifically in AFD using the srtx-1b promoter and in AIB using the inx-1 promoter, as stated in the main text (Lines 372–373) and the Fig. 6 legend.

    1. eLife Assessment

      This important study deepens our understanding of how populations of a given species may diverge in their molecular and physiological patterns as a result of adaptation to different thermal regimes. By approaching this question from multiple directions, the authors provide solid evidence for adaptive changes in three strains of the diamondback moth after only three years of experimental evolution, and support the causal involvement of the PxSODC gene in thermal adaptation to both cold and hot temperatures. This work would benefit from more sophisticated phylogenetic analyses, better statistical support, and a more detailed discussion of the differences in the three strains at the pathway level.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Lei and co-workers aim to uncover the genetic underpinnings of thermal adaptation across three strains of the diamondback moth (Plutella xylostella) through experimental evolution over three years under three different thermal regimes. They identify systematic differences in trait responses (e.g., survival, fecundity), metabolic profiles, gene expression, and in the amino acid sequence of the PxSODC gene, among others. These results suggest that the diamondback moth has a strong potential for rapid physiological adaptation to different thermal regimes. Overall, this is a comprehensive and generally well-executed study that addresses an important question in the face of ongoing climate change.

      Strengths:

      The authors employ multiple approaches to identify signatures of thermal adaptation across the three strains, such as trait performance comparisons, metabolomics, transcriptomics, and amino acid sequence comparisons. All these different angles form a convincing picture of the underlying factors that underpin thermal adaptation in this experimental system. The manuscript is also generally well written and easy to understand.

      Weaknesses:

      I am unable to judge the validity of all aspects of this work; I will focus only on areas within my core expertise.

      (1) The authors identify pathways that are enriched in different strain comparisons (Figure 3E), but do not provide a detailed interpretation of these results. It would be great if the authors could explain in more detail how the physiological processes of a cold-adapted strain of this species may differ from those of a warmer-adapted strain.

      (2) The authors reconstruct a phylogenetic tree of the PxSODC gene using the neighbor-joining algorithm. The limitations of this algorithm have been known for many years now, especially for sequences separated by long evolutionary distances. According to Wang et al. (2016), the last common ancestor of the species shown in Figure S4C occurred 392-350 million years ago. Given this, I would strongly recommend that the authors infer a phylogenetic tree using model-based methods, such as those implemented in RAxML-NG or IQ-TREE. Also, in the absence of a valid outgroup sequence, I would show the gene tree as unrooted or rooted based on the corresponding species tree.

      (3) There is a key piece of the puzzle that is currently missing: the structural mechanism behind the mutational effects described in this study (e.g., Figure 5). The authors could leverage AlphaFold to generate structural models of different mutants and conduct molecular dynamics simulations to examine their conformational dynamics.

      References:

      Wang, Yh., Engel, M., Rafael, J. et al. Fossil record of stem groups employed in evaluating the chronogram of insects (Arthropoda: Hexapoda). Sci Rep 6, 38939 (2016). https://doi.org/10.1038/srep38939

    3. Reviewer #2 (Public review):

      Summary:

      In this paper, the authors set out to better understand the genetic mechanisms underlying thermal adaptation in insects. They experimentally evolved diamondback moth (Plutella xylostella) populations - a pest species with a wide distribution - under both hot (12h:12h 32{degree sign}C/27{degree sign}C) and cold (15{degree sign}C/10{degree sign}C) thermal conditions, and conducted phenotypic assays and metabolic and transcriptomic profiling to analyze how populations changed to deal with this thermal stress compared to the nonevolved ancestral population (constant 26{degree sign}C). Phenotypic assays showed that evolved hot populations had increased survival at high temperatures (42-43{degree sign}C) while evolved cold populations had lower freezing points compared to the ancestral population. When measured at the constant 26{degree sign}C conditions, metabolic and transcriptomic profiles of 3rd instar larvae from the evolved population were distinctive from the ancestral population, with a set of overlapping metabolic and transcriptomic pathways that were significantly differentially expressed in both hot and cold evolved populations compared to the ancestral. The authors narrowed down this set of candidate genes further by focusing on genes with high expression levels overall, whose expression profile was correlated with differentially expressed metabolites, and that contained mutants in both hot and cold strains. From this set, they chose the PxSODC gene for further functional validation, as it has previously been shown to be involved in the response of insects to abiotic stress with its antioxidative role in cellular defense. At the constant 26{degree sign}C, this gene showed lower expression across development in evolved strains compared to the ancestral population, while it showed similar expression patterns under thermal stress. Knockdown of PxSODC resulted in decreased survival rates at high temperatures and higher freezing points compared to the ancestral population. Based on this validation, the authors hypothesize that the non-synonymous mutation in the PxSODC gene that they found in the cold and hot evolved populations might alter the conformation of the PxSODC protein, increasing enzyme capacity. Their experimental evolution experiment furthermore indicates the capacity of the pest species, the diamondback moth, to adapt to a wide range of temperatures, providing insights into its capacity for global dispersal.

      Strengths:

      (1) The authors did a tremendous amount of work to characterize the mechanisms underlying thermal adaptation in the diamondback moth, artificially selecting populations for three years in the lab and characterizing how they evolved as a result at different biological levels: from phenotypes in different life stages, to larval metabolites and gene transcription, to functionally validating how one of the resulting gene candidates influences the capacity to deal with thermal stress.

      (2) The paper identifies and provides further evidence for candidate genetic mechanisms that might be particularly important for thermal adaptation in insects, including lipid metabolism, oxidoreductase activity, and DNA methylation. It is furthermore interesting that the authors found similar mechanisms to be involved in both the adaptation to cold and hot environments. Their functional validation of some of the genes involved in these mechanisms is very useful to understand how these genes might be causally involved in insect thermal adaptation.

      (3) The paper also has applied value: the diamondback moth is a pest species with a wide distribution, so understanding its adaptive capacity to different thermal environments is important for predicting the prevalence and potential further range expansion of this species under future climate change.

      Weaknesses:

      (1) The paper in its current form is hard to digest and would benefit from improved clarification of the storyline, as well as a tighter integration between the phenotypic, omics, and functional validation data. Currently, it is not always clear what the relevance is of all the reported results, nor why certain decisions were made, or how all the different methods the authors used fit together. For example, the authors functionally validated a second gene, PxDnmt1, but it is unclear why this particular gene was chosen, nor how it relates to their selection regimes when looking at the results obtained with the phenotyping and omics data collection. Seeing how much work the authors did, this makes the paper overwhelming and difficult to read.

      (2) The authors at times stretch their results too far, as the ecological relevance of their study design and results is not clear, limiting the generalizability and value of the results for understanding species' adaptive potential under climate change. For example, the selection regimes used present the minimum and maximum known temperatures at which the species can survive and develop, but it is unclear how the temperatures relate to the natural environment of the source population, to what extent wild populations might experience these temperatures, and whether they would experience them at the extended duration used (12h at max/min temperature). Moreover, I wonder whether the comparisons made would identify the genes that matter under natural conditions, as unevolved populations were kept under constant conditions compared to 12h:12h temperature regimes for the evolved populations, and the metabolic and transcriptomic profiling was done under a constant favorable 26{degree sign}C rather than under thermal stress in a, as far as I can tell, randomly chosen life stage (larval stage).

      (3) The paper in its current form does not adequately describe the statistical analyses underlying the results, nor do the authors share their code, making it very hard to judge whether the analyses used are appropriate and the results trustworthy. I have concerns about the inappropriate use of t-tests, the lack of correcting for confounding variables, and the need for multiple testing corrections.

    4. Author Response:

      Public Review:

      We thank you and the reviewers for the thoughtful and constructive comments. The feedback helps us strengthen the manuscript substantially, and we plan to address the key points in the revised version as follows.

      Reviewer #1 (Public review):

      First, in response to the request for a clearer biological interpretation of the pathway enrichment results, we will expand the Discussion to more directly integrate these findings with the observed life-history divergence between strains.

      Second, we agree with the concern regarding the phylogenetic inference of PxSODC. We will therefore re-infer the phylogeny using a model-based Maximum Likelihood approach implemented in IQ-TREE, and, in the absence of an appropriate outgroup, the revised tree will be presented as unrooted.

      Third, to address the suggestion for a structural explanation of the mutational effects, we will add new structural analyses using AlphaFold modeling and 100 ns molecular dynamics simulations of the wild-type and mutant PxSODC proteins across three physiologically relevant temperatures.

      Reviewer #2 (Public review):

      First, we will restructured the Results and streamlined the presentation to better emphasize the central narrative. Extensive descriptive datasets will be moved to the Supplementary Materials, and the rationale linking different analytical layers will be stated more explicitly.

      Second, we will also revise the manuscript to better frame the ecological relevance and limitations of the experimental design. Specifically, we will clarify that the thermal selection regimes were chosen to reflect ecologically relevant extremes for the source population from subtropical Fuzhou, where summer and winter temperatures can approach the ranges used in the experiment. We will further explain that the cycling temperature treatments were designed to approximate severe but naturally occurring diurnal fluctuations.

      Third, in response to concerns about statistical rigor and reproducibility, we will substantially expanded the statistical methods throughout the manuscript. The revised version will provide a clearer description of the analyses used for each dataset, including sample sizes, comparison structure, and statistical thresholds. We will also clarify the application of multiple-testing correction for both transcriptomic and metabolomic analyses, specified the criteria used in network analyses, and more clearly distinguished the statistical approaches used for pairwise versus multi-group comparisons.