10,000 Matching Annotations
  1. Apr 2026
    1. Reviewer #1 (Public review):

      Summary:

      Spinal projection neurons in the anterolateral tract transmit diverse somatosensory signals to the brain, including touch, temperature, itch, and pain. This group of spinal projection neurons is heterogeneous in their molecular identities, projection targets in the brain, and response properties. While most anterolateral tract projection neurons are multimodal (responding to more than one somatosensory modality), it has been shown that cold-selective projection neurons exist in lamina I of the spinal cord dorsal horn. Using a combination of anatomical and physiological approaches, the authors discovered that the cold-selective lamina I projection neurons are heavily innervated by Trpm8+ sensory neuron axons, with calb1+ spinal projection neurons primarily capturing these cold-selective lamina I projection neurons. These neurons project to specific brain targets, including the PBNrel and cPAG. This study adds to the ongoing effort in the field to identify and characterize spinal projection neuron subtypes, their physiology, and functions.

      Strengths:

      (1) The combination of anatomical and physiological analyses is powerful and offers a comprehensive understanding of the cold-selective lamina I projection neurons in the spinal cord dorsal horn. For example, the authors used detailed anatomical methods, including EM imaging of Trpm8+ axon terminals contacting the Phox2a+ lamina I projection neurons. Additionally, they recorded stimulus-evoked activity in Trpm8-recipient neurons, carefully selected by visual confirmation of tdTomato and GFP juxtaposition, which is technically challenging.

      (2) This study identifies, for the first time, a molecular marker (calb1) that labels cold-selective lamina I projection neurons. Although calb1+ projection neurons are not entirely specific to cold-selective neurons, using an intersectional strategy combined with other genes enriched in this ALS group or cold-induced FosTRAP may further enhance specificity in the future.

      (3) This study shows that cold-selective lamina I projection neurons specifically innervate certain brain targets of the anterolateral tract, including the NTS, PBNrel, and cPAG. This connectivity provides insights into the role of these neurons in cold sensation, which will be an exciting area for future research.

      Weaknesses:

      (1) The sample size for the ex vivo electrophysiology is small. Given the difficulty and complexity of the preparation, this is understandable. However, a larger sample size would have strengthened the authors' conclusions.

      (2) The authors used tdTomato expression to identify brain targets innervated by these cold-selective lamina I projection neurons. Since tdTomato is a soluble fluorescent protein that fills the entire cell, using synaptophysin reporters (e.g., synaptophysin-GFP) would have been more convincing in revealing the synaptic targets of these projection neurons.

      (3) The summary cartoon shown in Figure 7 can be misleading because this study did not determine whether these cold-selective lamina I projection neurons have collateral branches to multiple brain targets or if there are anatomical subtypes that may project exclusively to specific targets. For example, a recent study (Ding et al., Neuron, 2025) demonstrated that there are PBN-projecting spinal neurons that do not project to other rostral brain areas. Furthermore, based on the authors' bulk labeling experiments, the three main brain targets are NTS, PBNrel, and cPAG. The VPL projection is very sparse and almost negligible.

    2. Reviewer #2 (Public review):

      Summary:

      In this study, the authors took advantage of a semi-intact ex vivo somatosensory preparation that includes hindlimb skin to characterize the response of projection neurons in the dorsal horn of the spinal cord to peripheral stimulation, including cold thermal stimuli. The main aim was to characterize the connectivity between peripheral afferents expressing the cold-sensing receptor TRPM8 and a set of genetically tagged neurons of the anterolateral system (ALS). These ALS neurons expressed high levels of the calcium-binding protein calbindin 1.

      In addition, combining different viral tracing methods, the authors could identify the anatomical targets of this specific subset of projection neurons within the brainstem and diencephalon.

      Strengths:

      The use of a relatively new (seldom used previously) transgenic line to label TRPM8-expressing afferents, combined with the genetic characterization of a previously identified subset of projection neurons, adds a specificity to the characterization. The transgenic line appears to capture well the subpopulation of Trpm8-expressing neurons

      In addition, the use of electron microscopy techniques makes the interpretation of the structural contacts more compelling.

      The writing is clear, and the presentation of findings follows a logical flow.

      Overall, this study provides solid, novel information about the brain circuits involved in cold thermosensation.

      Weaknesses:

      In the characterization of recorded neurons in close contact or in the absence of this contact with TRPM8 afferents, the number of recorded neurons is relatively low. In addition, the strength of thermal stimuli is not very well controlled, preventing a more precise characterization of the connectivity.

      The authors could provide some sense of the effort needed to record from the 6 cold-activated neurons described. How many preparations were needed, etc?

    3. Reviewer #3 (Public review):

      Summary:

      Razlan and colleagues provide a detailed anatomical characterization of lamina I projection neurons in the mouse spinal cord that are densely innervated by primary afferents activated by cooling of the skin. The authors, building on their previous anatomical work, validate a Trpm8-Flp mouse line, show synaptic contacts between Trpm8⁺ boutons and projection neurons at the ultrastructural level, and demonstrate at the physiological level that these neurons specifically respond to cooling stimuli. Next, by taking advantage of their previous transcriptomic analysis of ALS neurons, they identify calbindin as a marker for cold-activated lamina I projection neurons and map their ascending projections to the rostral lateral parabrachial area, caudal periaqueductal gray, and ventral posterolateral thalamus, well-known thermosensory and thermoregulatory centers. Altogether, these findings provide strong anatomical and functional evidence for a direct line of transmission from Trpm8⁺ sensory afferents through Calb1⁺ lamina I neurons to key supraspinal centers controlling perception of cold and thermoregulatory responses.

      Strengths:

      The combination of mouse genetics, electron microscopy, ex vivo physiology, and viral tracing provides convincing evidence for a direct cold pathway. The work validates the Trpm8-Flp line by extensive anatomical and molecular characterization. Integration with previous transcriptomic and anatomical data neatly links the cold-selective lamina I neurons to a molecularly defined cluster of ALS neurons, strengthening the bridge between molecular identity, anatomy, and physiological function.

      Weaknesses:

      While anatomical evidence for direct synaptic connectivity between Trpm8+ afferents and lamina I projection neurons is compelling, a physiological demonstration of strict monosynaptic transmission is not shown. The conclusion that these inputs are exclusively monosynaptic should be toned down. Similarly, the statement that "Lamina I ALS neurons that are surrounded by Trpm8 afferents are cold-selective" should also be toned down as only a few neurons have been tested and it cannot be excluded that other neurons with similar characteristics may be polymodal.

    1. eLife Assessment

      This study presents data suggesting that excitatory cholecystokinin (CCK)-expressing neurons in hippocampal area CA3 influence hippocampal-dependent memory using multiple methods to manipulate excitatory CCK-expressing CA3 neurons. The study is valuable, particularly considering that most past studies of CCK-expressing neurons have focused on those neurons that co-express CCK and GABA. Currently, the strength of evidence is incomplete, but it would improve if evidence of specificity was provided and other concerns were addressed. If this is not possible, the conclusions, particularly those requiring evidence of specific targeting of excitatory neurons, should be modified accordingly.

    2. Reviewer #1 (Public review):

      Summary:

      CCK is the most abundant neuropeptide in the brain, and many studies have investigated the role of CCK and inhibitory CCK interneurons in modulating neural circuits, especially in the hippocampus. The manuscript presents interesting questions regarding the role of excitatory CCK+ neurons in the hippocampus, which has been much less studied compared to the well-known roles of inhibitory CCK neurons in regulating network function. The authors adopt several methods including transgenic mice and viruses, optogenetics, chemogenetics, RNAi, and behavioral tasks to explore these less-studied roles of excitatory CCK neurons in CA3. They find that the excitatory CCK neurons are involved in hippocampal-dependent tasks such as spatial learning and memory formation, and that CCK-knockdown impairs these tasks.

      However, these questions are very dependent on ensuring that the study is properly targeting excitatory CCK neurons (and thus their specific contributions to behavior).

      There needs to be much more characterization of the CCK transgenic mice and viruses to confirm the targeting. Without this, it is unclear whether the study is looking at excitatory CCK neurons or a more general heterogeneous CCK neuron population.

      Strengths:

      This field has focused mainly on inhibitory CCK+ interneurons and their role in network function and activity, and thus this manuscript raises interesting questions regarding the role of excitatory CCK+ neurons, which have been much less studied.

      Weaknesses:

      (1a) This manuscript is dependent on ensuring that the study is indeed investigating the role of excitatory CCK-expressing neurons themselves and their specific contribution to behavior. There needs to be much more characterization of the CCK-expressing mice (crossed with Ai14 or transduced with various viruses) to confirm the excitatory-cell targeting. Without this, it is unclear whether the study is looking at excitatory CCK neurons or a more general heterogeneous CCK neuron population.

      (2) The methods and figure legends are still extremely sparse, still leading to many questions regarding methodology and accuracy. More details would be useful in evaluating the tools and data, and the lack of proper quantification is still prevalent throughout the paper. In many places, only % values are noted, or only images are presented, and the number of cells counted is almost never reported.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, the authors have demonstrated, through a comprehensive approach combining electrophysiology, chemogenetics, fiber photometry, RNA interference, and multiple behavioral tasks, the necessity of projections from CCK+ CAMKIIergic neurons in the hippocampal CA3 region to the CA1 region for regulating spatial memory in mice. Specifically, authors have shown that CA3-CCK CAMKIIergic neurons are selectively activated by novel locations during a spatial memory task. Furthermore, authors have identified the CA3-CA1 pathway as crucial for this spatial working memory function, thereby suggesting a pivotal role for CA3 excitatory CCK neurons in influencing CA1 LTP. The data presented appear to be well-organized and comprehensive.

      Strengths:

      (1) This work combined various methods to validate the excitatory CCK neurons in the CA3 area; these data are convincing and solid.

      (2) This study demonstrated that the CA3-CCK CAMKIIergic neurons are involved in the spatial memory tasks; these are interesting findings, which suggest that these neurons are important targets for manipulating the memory-related diseases.

      (3) This manuscript also measured the endogenous CCK from the CA3-CCK CAMKIIergic neurons; this means that CCK can be released under certain conditions.

      Weaknesses:

      In summary, this work can be formally accepted after the revision. For the limitations of the revision, the distinct neural effects of cholecystokinin (CCK) receptors (CCK-1R, CCK-2R, and CCK-3R) on hippocampal function have not been fully elucidated. Recent studies indicate that CCK-2R can modulate hippocampal activity at CA3-Schaffer collateral synapses; however, the roles of CCK-1R and CCK-3R in hippocampal function remain poorly characterized, with limited experimental evidence supporting their involvement. Overall, this study provides an interesting and novel perspective on the role of excitatory CCK signaling in hippocampus-dependent navigation learning.

    4. Reviewer #3 (Public review):

      Summary:

      Fengwen Huang et al. used multiple neuroscience techniques (transgenetic mouse, immunochemistry, bulk calcium recording, neural sensor, hippocampal-dependent task, optogenetics, chemogenetics, and interfer RNA technique) to elucidate the role of the excitatory cholecystokinin-positive pyramidal neurons in the hippocampus in regulating the hippocampal functions, including navigation and neuroplasticity.

      Strengths:

      (i) The authors provided the distribution profiles of excitatory cholecystokinin in the dorsal hippocampus via the transgenetic mice (Ai14::CCK Cre mice), immunochemistry, and retrograde AAV.

      (ii) The authors used the neural sensor and light stimulation to monitor the CCK release from the CA3 area, indicating that CCK can be secreted by activation of the excitatory CCK neurons.

      (iii) The authors showed that the activity of the excitatory CCK neurons in CA3 is necessary for navigation learning

      (iv) The authors demonstrated that inhibition of the excitatory CCK neurons and knockdown of the CCK gene expression in CA3 impaired the navigation learning and the neuroplasticity of CA3-CA1 projections.

      Weaknesses:

      (i) The causal relationship between navigation learning and CCK secretion remains nebulous; answering this question will require a more sensitive CCK-BR sensor in future work.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      CCK is the most abundant neuropeptide in the brain, and many studies have investigated the role of CCK and inhibitory CCK interneurons in modulating neural circuits, especially in the hippocampus. The manuscript presents interesting questions regarding the role of excitatory CCK+ neurons in the hippocampus, which has been much less studied compared to the well-known roles of inhibitory CCK neurons in regulating network function. The authors adopt several methods, including transgenic mice and viruses, optogenetics, chemogenetics, RNAi, and behavioral tasks to explore these less-studied roles of excitatory CCK neurons in CA3. They find that the excitatory CCK neurons are involved in hippocampal-dependent tasks such as spatial learning and memory formation, and that CCK-knockdown impairs these tasks.

      However, these questions are very dependent on ensuring that the study is properly targeting excitatory CCK neurons (and thus their specific contributions to behavior). There needs to be much more characterization of the CCK transgenic mice and viruses to confirm the targeting. Without this, it is unclear whether the study is looking at excitatory CCK neurons or a more general heterogeneous CCK neuron population.

      Strengths:

      This field has focused mainly on inhibitory CCK+ interneurons and their role in network function and activity, and thus, this manuscript raises interesting questions regarding the role of excitatory CCK+ neurons, which have been much less studied.

      Weaknesses:

      (1a) This manuscript is dependent on ensuring that the study is indeed investigating the role of excitatory CCK-expressing neurons themselves and their specific contribution to behavior. There needs to be much more characterization of the CCK-expressing mice (crossed with Ai14 or transduced with various viruses) to confirm the excitatory-cell targeting. Without this, it is unclear whether the study is looking at excitatory CCK neurons or a more general heterogeneous CCK neuron population.

      Thank you for this constructive comment. Indeed, the current study lacks comprehensive strategies to unequivocally distinguish excitatory CCK neurons from heterogeneous CCK neuronal populations. Nevertheless, we provide multiple lines of evidence supporting the distribution of CaMKIIα/Vglut1-expressing CCK<sup>+</sup> neurons in the hippocampus (Figure 1F), using complementary approaches including transgenic mouse models as well as viral and antibody-based labeling (Figure 1A, Figure 1H-I). In addition, we demonstrate that 635 nm light reliably evokes field excitatory postsynaptic potentials (fEPSPs) at CA3-Schaffer collateral synapses expressing DIO-CaMKIIα-ChrimsonR in vitro (Figure 2A-F). Importantly, these light-evoked excitatory synaptic responses are abolished by AMPA and NMDA receptor antagonists (CNQX and APV), confirming the excitatory nature of the DIO-CaMKIIα-ChrimsonR-expressing synapses. To demonstrate the future works that can further support our findings and conclusions, we have added the strategies that can be conducted in the Discussion section in the revision:

      “Due to technical limitations at the current stage, we were unable to perform whole-cell recordings or pharmacological manipulations using CCK receptor antagonists. In future studies, the application of these approaches to directly record and selectively block EPSPs from excitatory CCK neurons in the hippocampus will further strengthen and validate our conclusions.” (Line 265 - line 269 in the revision).

      (1b) For the experiments that use a virus with the CCK-IRES-Cre mouse, there is no information or characterization on how well the virus targets excitatory CCK-expressing neurons. (Additionally, it has been reported that with CaMKIIa-driven protein expression, using viruses, can be seen in both pyramidal and inhibitory cells.

      We thank the reviewer for this insightful comment regarding the specificity of viral targeting in CCK-IRES-Cre mice.

      To address this concern, we performed additional characterization of viral expression in CA3. We found that DIO-CaMKIIα-mCherry expression showed a high degree of colocalization with CaMKIIα immunoreactivity, indicating preferential targeting of excitatory neurons (sFigure 1A-B; sFigure 2A-B; sFigure 3A-B). We showed an example to confirmed the high specificity of the AAV for infecting the excitatory CCK neurons in CA3 area.

      Besides, we acknowledge prior reports showing that CaMKIIα-driven viral expression can, in some cases, be detected in a small subset of inhibitory neurons. However, because CA3-Schaffer collateral projections to CA1 arise exclusively from excitatory CA3 pyramidal neurons, any potential expression in inhibitory CCK<sup>+</sup> interneurons are unlikely to directly contribute to the recorded CA1 synaptic responses in our electrophysiological experiments. That said, we cannot fully exclude the possibility that a minor population of inhibitory CCK⁺ neurons could indirectly modulate CA3 pyramidal neuron activity via local circuit mechanisms, particularly in experiments involving optogenetic manipulation or shRNA expression. We now explicitly acknowledge this limitation in the revised manuscript:

      “Importantly, to further improve cell-type specificity, we propose an intersectional genetic strategy using CCK-IRES-Cre × VGlut1-Flp mice combined with a Cre-On/Flp-On (Con/Fon) AAV, which would restrict expression exclusively to excitatory CCK-expressing neurons and eliminate potential contributions from inhibitory CCK<sup>+</sup> cells. This approach will be implemented in future studies to refine circuit specificity.” (Line 269 - line 273 in the revision).

      (2) The methods and figure legends are extremely sparse, leading to many questions regarding methodology and accuracy. More details would be useful in evaluating the tools and data. More details would be useful in evaluating the tools and data. Additionally, further quantification would be useful-e.g. in some places, only % values are noted, or only images are presented.

      Thank you for these constructive comments. We have expanded the methodological descriptions in both the Methods section and the figure legends to provide sufficient detail for evaluating the experimental tools and data accuracy. In addition, we have added quantitative analyses where previously only representative images or percentage values were shown. Specifically, quantification has now been included for each AAV condition in the corresponding figures in the revised manuscript.

      (3) It is unclear whether the reduced CCK expression is correlated, or directly causing the impairments in hippocampal function. Does the CCK-shRNA have any additional detrimental effects besides affecting CCK-expression (e.g., is the CCK-shRNA also affecting some other essential (but not CCK-related) aspect of the neuron itself?)? Is there any histology comparison between the shRNA and the scrambled shRNA?

      Recent studies from our lab demonstrated that knockout the CCK gene expression significantly attenuates the hippocampal-dependent spatial learning and CA3-CA1 LTP, indicating CCK plays a critical role in modulating the hippocampal functions[1,2]. Additionally, CCK-shRNA or CCK-scramble did not significantly affect the excitatory synaptic transmission in the CA3-CA1 projections, hinting that CCK-shRNA may exhibits no obvious adverse effect on other neural components.

      Finally, we have provided the histology comparison between the shRNA and the scrambled shRNA regrading the expression level of the CCK protein (Pro-CCK) in the revision. Our result shows that CCK-shRNA (left panel) significantly reduced CCK expression in CA3<sup>CCK</sup>-positive neurons compared with the CCK-Scramble group (right panel).

      Citation:

      (1) Wang, J. L., Sha, X. Y., Shao, Y., Zhang, Z. H., Huang, S. M., Lin, H., ... & Sun, J. P. (2025). Elucidating pathway-selective biased CCKBR agonism for Alzheimer’s disease treatment. Cell.

      (2) Zhang, N., Sui, Y., Jendrichovsky, P., Feng, H., Shi, H., Zhang, X., ... & He, J. (2024). Cholecystokinin B receptor agonists alleviates anterograde amnesia in cholecystokinin-deficient and aged Alzheimer's disease mice. Alzheimer's research & therapy, 16(1), 109.

      https://doi.org/10.7554/eLife.109001.1.sa2

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors have demonstrated, through a comprehensive approach combining electrophysiology, chemogenetics, fiber photometry, RNA interference, and multiple behavioral tasks, the necessity of projections from CCK+ CAMKIIergic neurons in the hippocampal CA3 region to the CA1 region for regulating spatial memory in mice. Specifically, authors have shown that CA3-CCK CAMKIIergic neurons are selectively activated by novel locations during a spatial memory task. Furthermore, authors have identified the CA3-CA1 pathway as crucial for this spatial working memory function, thereby suggesting a pivotal role for CA3 excitatory CCK neurons in influencing CA1 LTP. The data presented appear to be well-organized and comprehensive.

      Strengths:

      (1) This work combined various methods to validate the excitatory CCK neurons in the CA3 area; these data are convincing and solid.

      (2) This study demonstrated that the CA3-CCK CAMKIIergic neurons are involved in the spatial memory tasks; these are interesting findings, which suggest that these neurons are important targets for manipulating the memory-related diseases.

      (3) This manuscript also measured the endogenous CCK from the CA3-CCK CAMKIIergic neurons; this means that CCK can be released under certain conditions.

      Weaknesses:

      (1) The authors do not mention which receptors of the CCK modulate these processes.

      We appreciate the reviewer for raising this important question. Based on our recent work, CCK-B receptors are the primary neural components mediating CCK functions in the hippocampus at both the synaptic plasticity and behavioral levels (Su et al., 2023; Zhang et al., 2024; Wang et al., 2025). To clarify this mechanism, we have added the following content to the revised manuscript:

      “Based on our recent work, CCK signaling in the hippocampus is predominantly mediated by CCK-B receptors, which play a critical role in regulating synaptic plasticity and spatial memory-related behaviors.” (Line 105 - line 106 in the revision).

      (2) This author does not test the CCK gene knockout mice or the CCK receptor knockout mice in these neural processes.

      Thank you for this insightful comment. We previously tested these experiments in an earlier study. Our results showed that high-frequency electrical stimulation failed to induce significant LTP in the CA3-CA1 pathway in both CCK gene knockout (CCK-KO) mice and CCK-B receptor knockout (CCK-BR-KO) mice in vitro (Su et al., 2023; Zhang et al., 2024; Wang et al., 2025). These findings indicate that CCK mediates its synaptic effects predominantly through CCK-B receptors in the CA3-CA1 pathway. Accordingly, we have added this description to the revised manuscript.

      “Additionally, high-frequency electrical stimulation fails to induce LTP in the CA3-CA1 pathway in both CCK-KO and CCK-BR-KO mice, indicating that CCK-dependent synaptic plasticity in this circuit is primarily mediated by CCK-B receptors.” (Line 170 - line 173 in the revision).

      (3) The author does not test the source of CCK release during the behavioral tasks.

      We thank the reviewer for raising this important point. In our previous work, we directly monitored CCK release in the hippocampus during an object-exploration task using a GPCR-based CCK-BR sensor combined with fiber photometry (Su et al., 2023). During object exploration, we observed a rapid and robust increase in CCK-BR sensor fluorescence, indicating activity-dependent CCK release in the hippocampus. Based on these findings, we deduced that hippocampal CCK release plays a critical role in hippocampus-dependent behavioral tasks.

      We acknowledge that hippocampal neurons receive CCK-positive projections from multiple brain regions, making it technically challenging to isolate and monitor the precise source of CCK release in the CA1 area during behavioral tasks in vivo. One potential strategy to address this limitation is selective overexpression of CCK in CA3 neurons (e.g., AAV-CCK delivery), followed by assessment of CCK-BR sensor responses during hippocampal-dependent behaviors. We have added this discussion to the revised manuscript to clarify the source and functional relevance of CCK release during behavioral tasks.

      “Besides, using a GPCR-based CCK-BR sensor combined with fiber photometry, our previous work demonstrated rapid, activity-dependent CCK release in the hippocampus during object-exploratory behavior, supporting a functional role for hippocampal CCK signaling in cognitive tasks (Su et al., 2023). Given that hippocampal neurons receive CCK-positive projections from multiple brain regions, it remains technically challenging to precisely identify the cellular source of CCK release in CA1 during behavior. Future studies employing selective CCK overexpression in CA3 neurons, together with CCK-BR sensor recordings, may help further delineate the contribution of CA3-derived CCK to hippocampal-dependent behaviors.” (Line 313 - line 321 in the revision).

      Citation:

      (1) Wang, J. L., Sha, X. Y., Shao, Y., Zhang, Z. H., Huang, S. M., Lin, H., ... & Sun, J. P. (2025). Elucidating pathway-selective biased CCKBR agonism for Alzheimer’s disease treatment. Cell.

      (2) Zhang, N., Sui, Y., Jendrichovsky, P., Feng, H., Shi, H., Zhang, X., ... & He, J. (2024). Cholecystokinin B receptor agonists alleviates anterograde amnesia in cholecystokinin-deficient and aged Alzheimer's disease mice. Alzheimer's research & therapy, 16(1), 109.

      (3) Su, J., Huang, F., Tian, Y., Tian, R., Qianqian, G., Bello, S. T., ... & He, J. (2023). Entorhinohippocampal cholecystokinin modulates spatial learning by facilitating neuroplasticity of hippocampal CA3-CA1 synapses. Cell Reports, 42(12).

      https://doi.org/10.7554/eLife.109001.1.sa1

      Reviewer #3 (Public review):

      Summary:

      Fengwen Huang et al. used multiple neuroscience techniques (transgenetic mouse, immunochemistry, bulk calcium recording, neural sensor, hippocampal-dependent task, optogenetics, chemogenetics, and interfer RNA technique) to elucidate the role of the excitatory cholecystokinin-positive pyramidal neurons in the hippocampus in regulating the hippocampal functions, including navigation and neuroplasticity.

      Strengths:

      (1) The authors provided the distribution profiles of excitatory cholecystokinin in the dorsal hippocampus via the transgenetic mice (Ai14::CCK Cre mice), immunochemistry, and retrograde AAV.

      (2) The authors used the neural sensor and light stimulation to monitor the CCK release from the CA3 area, indicating that CCK can be secreted by activation of the excitatory CCK neurons.

      (3) The authors showed that the activity of the excitatory CCK neurons in CA3 is necessary for navigation learning.

      (4) The authors demonstrated that inhibition of the excitatory CCK neurons and knockdown of the CCK gene expression in CA3 impaired the navigation learning and the neuroplasticity of CA3-CA1 projections.

      Weaknesses:

      (1) The causal relationship between navigation learning and CCK secretion?

      Thank you for pointing out this important issue. Previous studies have shown that CCK can be rapidly secreted during exploratory behaviors, as detected by the CCK-BR sensor. In parallel, CCK-positive neurons have been demonstrated to play a critical role in the precise execution of hippocampus-dependent spatial learning. Together, these findings suggest that exploratory behavior induces CCK secretion, which in turn contributes to the accuracy of hippocampal-dependent learning and memory processes. Based on this evidence, we propose that CCK secretion serves as a functional link between behavioral exploration and spatial learning. We have added these explanations in the revised manuscript to better clarify the causal relationship between behavioral exploration and CCK secretion:

      “Besides, using a GPCR-based CCK-BR sensor combined with fiber photometry, our previous work demonstrated rapid, activity-dependent CCK release in the hippocampus during object-exploratory behavior, supporting a functional role for hippocampal CCK signaling in cognitive tasks (Su et al., 2023). Given that hippocampal neurons receive CCK-positive projections from multiple brain regions, it remains technically challenging to precisely identify the cellular source of CCK release in CA1 during behavior. Future studies employing selective CCK overexpression in CA3 neurons, together with CCK-BR sensor recordings, may help further delineate the contribution of CA3-derived CCK to hippocampal-dependent behaviors.” (Line 313 - line 321 in the revision)

      (2) The effect of overexpression of the CCK gene on hippocampal functions?

      We thank the reviewer for this comment. In fact, an earlier study from our laboratory demonstrated that intraperitoneal injection of exogenous CCK-4 significantly improved performance in hippocampus-dependent spatial learning tasks in both CCK gene knockout (CCK-KO) mice and Alzheimer’s disease (AD) mouse models. These findings suggest that enhancing CCK signaling can ameliorate hippocampal dysfunction at both the behavioral and synaptic plasticity levels (Zhang et al., 2024; Wang et al., 2025). Accordingly, although direct genetic overexpression of CCK in the hippocampus has not yet been extensively characterized, the observed benefits of exogenous CCK delivery support the notion that increased CCK availability positively modulates hippocampal function and spatial learning. We have cited this study in the revised manuscript to support this interpretation.

      “Interestingly, an earlier study demonstrated that intraperitoneal injection of exogenous CCK-4 significantly improved performance in hippocampus-dependent spatial learning tasks in both CCK gene knockout (CCK-KO) mice and Alzheimer’s disease (AD) mouse models (Zhang et al., 2024). These findings suggest that enhancing CCK signaling can ameliorate hippocampal dysfunction at both the behavioral and synaptic plasticity levels.” (Line 291 - line 297 in the revision)

      (3) What are the functional differences between the excitatory and inhibitory CCK neurons in the hippocampus?

      In the hippocampus, CCK-expressing neurons consist of two major populations with distinct functions: excitatory (glutamatergic) and inhibitory (GABAergic) neurons. Excitatory CCK neurons are relatively sparse and intermingled with pyramidal cells. By releasing glutamate, they directly contribute to excitatory transmission and are thought to participate in synaptic plasticity and information processing related to learning and memory. In contrast, inhibitory CCK neurons are more abundant and include well-characterized interneuron subtypes such as CCK-positive basket cells. These neurons release GABA and primarily target the perisomatic region of pyramidal neurons, providing strong control over neuronal firing. Notably, inhibitory CCK interneurons are highly sensitive to neuromodulatory signals, particularly endocannabinoids via CB1 receptors, enabling dynamic regulation of inhibitory tone and network activity. Together, excitatory CCK neurons mainly support hippocampal excitation and plasticity, whereas inhibitory CCK neurons regulate network dynamics and spike timing. As the focus of the present study is on excitatory CCK neurons, a detailed comparison between these two populations was not included in the original manuscript.

      (4) Do CCK sources come from the local CA3 or entorhinal cortex (EC) during the high-frequency electrical stimulation?

      Thank you for this insightful comment. Our data indicate that the CCK detected during high-frequency stimulation originates from CA3 neurons rather than the entorhinal cortex (EC). As shown in Figure 2, we used an optogenetic approach combined with a GPCR-based CCK sensor to selectively examine CCK release from the CA3-CA1 pathway. ChrimsonR was specifically expressed in CA3 neurons projecting to CA1, restricting light stimulation to CA3 axon terminals. In parallel, the CCK sensor was locally expressed in CA1, allowing real-time detection of CCK release at CA3 presynaptic sites. High-frequency light stimulation robustly evoked CCK signals in CA1, demonstrating activity-dependent CCK release from CA3 terminals. Importantly, EC inputs were neither genetically targeted nor optically stimulated in this experiment, excluding the EC as a source of the detected CCK. Together, these results support the conclusion that CCK released during high-frequency stimulation is derived from local CA3 projections to CA1. Similarly, as the focus of the present study is on excitatory CCK neurons in the CA3 area, a detailed comparison between these two CCK sources was not included in the original manuscript.

      Citation:

      (4) Wang, J. L., Sha, X. Y., Shao, Y., Zhang, Z. H., Huang, S. M., Lin, H., ... & Sun, J. P. (2025). Elucidating pathway-selective biased CCKBR agonism for Alzheimer’s disease treatment. Cell.

      (5) Zhang, N., Sui, Y., Jendrichovsky, P., Feng, H., Shi, H., Zhang, X., ... & He, J. (2024). Cholecystokinin B receptor agonists alleviates anterograde amnesia in cholecystokinin-deficient and aged Alzheimer's disease mice. Alzheimer's research & therapy, 16(1), 109.

      (6) Su, J., Huang, F., Tian, Y., Tian, R., Qianqian, G., Bello, S. T., ... & He, J. (2023). Entorhinohippocampal cholecystokinin modulates spatial learning by facilitating neuroplasticity of hippocampal CA3-CA1 synapses. Cell Reports, 42(12).

    1. eLife Assessment

      Using isolated frog brainstem preparations, pharmacological manipulation of excitability, systematic extracellular unit mapping, and focal microinjections, this study provides important findings on whether the buccal rhythm generator is a discrete anatomical nucleus or a distributed, state-dependent network. The question is conceptually significant and of interest to researchers working within respiratory neurobiology and rhythmogenicity in general, and the preparation and experimental strategy are generally appropriate. However, the evidence for the strongest architectural claims is incomplete, with pseudoreplication in pooled unit-mapping analyses, inconsistent statistical reporting, and limited controls in necessity/sufficiency experiments. Overall, although data are largely convincing, substantial revision and more nuanced interpretation of the results are required before claims of state-dependent architectural reorganization can be considered well-supported.

    2. Reviewer #1 (Public review):

      Summary:

      The authors test whether the frog buccal ventilatory rhythm generator behaves as a discrete, anatomically localized oscillator or as a distributed, state-dependent network. They combine reduced preparations (segment/subsegment work), systematic extracellular unit surveys over a defined grid, and local AMPA/GABA microinjections in a hemisected brainstem preparation. Based on these approaches, the authors conclude that mild global excitation (bath AMPA) broadens the distribution of rhythmically active units and renders a previously defined "buccal area" functionally non-identifiable as a unique necessary/sufficient locus.

      The central idea is plausible, and the overall experimental strategy is appropriate for the question being asked. However, in its current form, the manuscript overstates the strength of inference supporting the "expansion" and "loss of necessity/sufficiency" conclusions. This is primarily due to (a) statistical treatment of unit-mapping data that does not respect clustering by preparation/animal, (b) inconsistent statistical reporting across sections, and (c) limited interpretability of focal inhibitory perturbations under a globally excited state.

      Strengths:

      (1) The manuscript addresses a clear mechanistic question with broader relevance: whether rhythm generation is best conceptualized as a localized kernel or as an emergent distributed property that changes with excitatory state.

      (2) The authors use convergent approaches (reduced preparations, mapping, and necessity/sufficiency-style pharmacological perturbations), which is appropriate for circuit-level inference.

      (3) A strong element is the within-unit analysis supporting state-dependent changes in phase coupling for a subset of units ("lung" units adopting a buccal-like pattern). The authors' offline PCA-based spike sorting (with cluster-quality selection via silhouette score) provides some reassurance that the reported pre/post injection changes are not simply driven by unit misidentification.

      Weaknesses:

      (1) Pseudoreplication in unit-survey statistics undermines the main mapping inference. The Methods state that "Units were pooled from multiple preparations" and that chi-squared tests were used to compare proportions across conditions (baseline vs 60 nM AMPA). The Results similarly report proportion changes (e.g., 110 units pooled from three preparations vs 137 units pooled from three additional animals) analyzed with chi-squared tests. Because many units come from the same preparation/animal, independence is unlikely to hold; therefore, inference about state-dependent reorganization at the systems level should be made at the preparation/animal level or via hierarchical models that explicitly account for clustering.

      (2) Statistical methods are inconsistently described and need harmonization. In the segment dose-response "Analysis," values are described as compared to zero using a "One-sample t-test." Yet Table 1 is titled as using a "Wilcoxon One-sample Test." These discrepancies must be resolved throughout (Methods, Results, figure legends, and tables), including clear reporting of the unit of n and exact test statistics.

      (3) Unit classification and operational definitions raise interpretational concerns. The unit classification scheme defines "buccal units" as those firing during buccal bursts as well as lung bursts, and explicitly notes that "no units were found which fired only during buccal bursts." This is a consequential result, and it currently reads more like a limitation of detection/classification (or state-space sampled) than a robust biological conclusion. Without additional evidence, it weakens claims about a distinct buccal rhythmogenic module and complicates the interpretation of "buccal identity" changes under excitation.

      (4) Microinjection mapping: high exclusion rate and alternative explanations for 'loss of necessity' under excitation. The manuscript reports that 15 experiments were conducted, but 9 were excluded because the buccal area was not found or the preparation was "overdriven." This exclusion rate is too high to leave implicit; it raises concerns about selection bias and demands transparent accounting. Moreover, under baseline conditions, GABA (or AMPA-GABA) microinjections reliably reduce/abolish buccal bursts, but under bath 60 nM AMPA, the same injections produce no significant change in instantaneous frequency. This pattern can be interpreted as network redistribution, but it can also reflect state-dependent changes in gain, dynamic range, or local pharmacological impact (e.g., inhibition being comparatively underpowered in the globally excited state). Additional controls/analyses are required to distinguish these explanations.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors investigate the response of the amphibian respiratory rhythm generator under varying excitability conditions. They use pharmacological agents to increase and/ or decrease synaptic excitability and demonstrate the resilience of buccal rhythms under different conditions. They employ these results to formulate their primary thesis, that there is no obligatory locus of the buccal respiratory rhythm in the frog, and that their respiratory rhythmogenic mechanisms should be considered diffuse and anatomically distributed across a larger brainstem region.

      Strengths:

      This manuscript is well written, with a sufficiently large number of experiments, for which the authors should be congratulated.

      Weaknesses:

      The presented results don't support the authors' main conclusions, and the interpretation of the data is heavily biased toward their hypothesis. This impregnates an unsubstantiated narrative in the Abstract, Introduction, and Discussion of this manuscript, which must be reexamined with the following points in consideration:

      (1) The authors seem to confuse degeneracy with redundancy. For instance, at line 54, they state, "These findings support the broader hypothesis that respiratory rhythm-generating circuits can switch to being diffuse and redundant, with discrete oscillators quickly drowning in a sea of excitations."

      Redundancy means having the same component repeated multiple times to buffer the failure of any single component, whereas degeneracy means different functional components that compensate for one another under perturbations (Goaillard and Marder, ARN 2021)

      Since the premotor-lung units get converted to buccal units under high excitability, this suggests a degenerate mechanism for respiratory rhythm generation- rather than a redundant mechanism, where there should be multiple buccal units that get recruited under different excitability conditions.

      (2) Line 83, "but the essential requirement for a discrete, rudimentary buccal oscillator is also lost".

      This statement is not supported by the data presented in this study. How does the expansion of the buccal unit imply that the essential requirement for discreteness is lost? Under increased excitability, does the burst/rhythm initiation zone also expand? Or does it still remain centered around the location of buccal units under physiological conditions? Increased excitability can lead to recruitment of a larger area, without a change in the location of the rhythmogenic kernel.

      (3) Line 86, "... oscillators should be viewed as promiscuous flexible functional entities that expand or contract...".

      Oscillators can be regarded as promiscuous only if, under physiological conditions, they switch positions. Under high excitability, only the flexibility argument holds, which has been established in mammals before (e.g., CA Del Negro, K Kam, JA Hayes, JL Feldman, The Journal of physiology 587 (6), 1217-1231; CA Del Negro, C Morgado-Valle, JL Feldman,Neuron 34 (5), 821-830; NA Baertsch, LJ Severs, TM Anderson, JM Ramirez, Proceedings of the National Academy of Sciences 116 (15), 7493-7502; NA Baertsch, HC Baertsch, JM Ramirez Nature communications 9 (1), 843).

      Results:

      (4) Interpretation of data in Figure 6.

      How does the Buccal activity and L2 Power stroke change with 60nm AMPA (in CN5)? Does the increase in the Buccal neurons and decrease in power stroke neurons also reflect in the CN5 activity? Also see comments on Figure 9 data below.

      (5) Interpretation of data in Figure 7.

      Here, classifying buccal neurons solely by spiking may obscure the fact that the 'silent' neurons under baseline conditions were part of the rhythmic network but could not spike due to subthreshold inputs. 60 nM AMPA increased their firing in response to previously subthreshold synchronous inputs during the buccal burst. Intracellular recordings are required to negate this possibility and establish that the neuronal classification is robust.

      (6) Interpretation of data in Figure 8.

      "Lung units can transform into buccal units under excitation".<br /> CN5 buccal and lung bursts need to be compared before and after AMPA injection. From Figure 8 A-D, it is apparent that the example Unit2's activity increases during the buccal bursts, after AMPA injection. However, they are also present in buccal burst pre-AMPA, albeit with less frequency.

      It is striking that the pre-AMPA epoch (panel A) is less than half of the post-AMPA epoch. This would, in itself, lead to a biased estimate of lung units that are active under the baseline condition during the buccal bursts.

      Figure 8G, meta-analysis of lung units spiking during the baseline buccal bursts is warranted to interpret the main claim of this figure. Similarly, analysis of spiking per lung burst for the post-AMPA condition is essential for comparing the lung unit's contribution under high excitability.

      (7) Interpretation of data in Figure 9

      "Buccal area loses importance under increased excitation."

      This interpretation is not fully supported by the data presented in this manuscript. Under 60 nm AMPA, does the ratio of lung burst to buccal burst change in CN5? This analysis is crucial for determining whether the lung units are indeed converted into buccal bursts at the expense of lung activity or whether their appearance during buccal bursts is incidental due to increased excitability. In the baseline, there are 4-5 buccal bursts per lung burst, whereas under high excitability, there are 2-3 buccal bursts per lung burst (Figure 9 A-B). This seems inconsistent with the conclusion that increased excitability converts lung units into buccal units (Figures 6 &7).

      Could the authors comment on the connectivity between the lung and the buccal units? Results in Figure 9A-B indicate that lung units may receive an efference copy of buccal units, and under high excitability, their spikes may generate negative feedback onto the buccal units, terminating their bursts. This could explain the decrease in the buccal-to-lung burst in high-AMPA conditions. This type of circuit interaction resembles the mammalian breathing CPG, in which the parafacial/RTN (which controls the abdominal muscles) and preBötC (which controls the diaphragm) interact and cross-inhibit each other.

      (8) Line 382.

      "Buccal-like bursting produced from two independent slices".

      The two "independent" slices have portions of the same anatomical kernel, the buccal rhythm generator. This experiment is like the sandwich slice preparation of preBötC (Del Negro Lab), in which two thinner slices exhibit rhythmic activity. Thus, the two slices are not independent; they are anatomically adjacent and functionally overlapping.

    4. Reviewer #3 (Public review):

      Summary:

      This study uses isolated frog brainstem preparations to test whether inspiratory rhythm generation is confined to a narrowly defined neural center or instead reflects the activity of a distributed and adaptable network. Building on prior rodent work, the authors examine structural and functional parallels between the frog Buccal Area and the mammalian preBötzinger complex. By increasing excitatory drive, they assess whether a localized rhythmogenic region can expand into a broader network that participates in buccal rhythm generation, providing insight into how respiratory circuits are dynamically reconfigured across physiological states.

      Strengths:

      The work presents compelling evidence that ventilatory rhythm generation is supported by a flexible, state-dependent network rather than a fixed anatomical locus. The experimental preparation is well-suited to address these questions, and the data are generally of high quality. The demonstration that increased excitation recruits a more distributed network parallels observations in mammalian systems and strengthens the translational relevance of the findings. Overall, the analyses are thoughtful, and the interpretations are largely well supported by the results.

      Weaknesses:

      Some issues limit the strength of the conclusions. First, the study does not address the transition from eupnea to gasping in mammals, which could provide important physiological context for the observed AMPA-induced network reorganization. Second, the reported transformation of lung-active neurons into buccal-active neurons would benefit from additional analyses to clarify whether neurons switch identities or acquire dual activity. Finally, the necessity and sufficiency experiments in Figure 9 require further support, particularly through AMPA dose-response analyses and more comprehensive GABA manipulations, to confirm that network expansion does not obscure the continued functional importance of the core buccal region.

    5. Author response:

      Reviewer #1 (Public review):

      Hierarchical Inference (Unit Survey)

      We agree that pooling units across preparations can overstate the strength of inference if preparation-level clustering is ignored. We will therefore reanalyze the unit-survey dataset using a hierarchical approach in which the preparation/animal is treated as the unit of inference. Our pooled dataset was derived from three chunk preparations exposed to AMPA and three baseline preparations, allowing us to report per-preparation proportions and variability as requested.

      A preliminary reanalysis of the buccal segment preparations is summarized below. In this analysis, the unit of inference is shifted from individual recorded units to the preparation level (n = 3 baseline; n = 3 at 60 nM AMPA), thereby accounting for potential within-preparation dependence.

      Author response table 1.

      The distribution of units for each of the three preparations per condition is as follows:

      Using the proportion of buccal units per preparation as the dependent variable:

      Baseline (n = 3): mean proportion of buccal units = 6.5% (SD 5.7%).

      60 nM AMPA (n = 3): mean proportion of buccal units = 53.2% (SD 6.0%).

      Absolute difference in proportions = 46.7% (95% CI 33.4% to 59.8%).

      Independent-samples t-test on per-preparation proportions: t(4) = 9.77, p = 0.0006.

      Thus, this preliminary hierarchical reanalysis indicates that the observed recruitment is consistent across preparations and is not driven by outlier data from a single animal. These results support substantial expansion of the buccal oscillator with excitation.

      Statistical Standardization: In the revision, we will better justify our use of parametric and non-parametric versions of the one-sample tests and review usage in the Methods, Table 1, and figure legends for consistency.

      Exclusion criteria for microinjection experiments: We will extend the description of these experiments by including a flow diagram summarizing the 15 attempted microinjection experiments and documenting the technical reasons for the 9 exclusions. These exclusions reflected the technical requirements of the preparation: (a) the buccal area had to be localized before AMPA excitation so that the effects of buccal-area manipulation during excitation could be interpreted reliably, which was not always possible; and (b) preparations had to exhibit sufficiently sustained periods of consecutive buccal bursting to permit quantification of buccal burst frequency, whereas some preparations expressed motor patterns dominated by lung bursts.

      Pharmacological Potency and Necessity: We will revise the wording of this section to make the causal interpretation more precise. Our data already show that local GABA microinjections can reverse the excitatory effects of local AMPA microinjections, providing an internal control for local pharmacological efficacy of GABA when the local network is excited. Notably, the local AMPA concentration used in these experiments (5 µM) is nearly two orders of magnitude greater than the 60 nM concentration used in bath application. We therefore interpret the failure of focal GABA inhibition to abolish rhythm during global excitation as being consistent with expansion of rhythmogenic capacity beyond the spatial reach of the local injection, rather than with failure of the GABA manipulation itself.

      Finding an inhibitory site that remains sensitive in bath applied AMPA is an interesting experiment but this would require identifying the anatomical substrate of a brainstem circuit for a non-ventilatory circuit in Rana that is guaranteed not to undergo reconfiguration with AMPA. This is beyond the scope of the current manuscript; based on our work to identify the neuronal substrate for ventilation in Rana, this would take at least five years to complete. In addition, having identified such a circuit there would be no guarantee that AMPA would not cause reconfiguration in this case too. With regards to transection boundaries and location of injections, we agree these would be useful refinements. We used the location of nerves as reliable landmarks to guide transections and located the buccal area using stereotactic coordinates to guide micropipette insertion and functional criteria (AMPA and GABA sufficiency and necessity tests) to locate the exact position based on our previous work.

      Unit Classification: We will review the nomenclature we use to define units to ensure it does not cause confusion and provide more explicit criteria for unit classes. This will include clarification of the absence of “buccal-only” units as currently defined. Specifically, when both buccal and lung rhythms are present, units active during buccal bursts are also active during lung bursts in our preparation. This does not conflict with the multiple interacting oscillator model we have proposed previously. Rather, recruitment of buccal-area neurons during lung bursts is consistent with a model in which the lung oscillator excites the buccal oscillator. It is also consistent with prior evidence that lung bursts persist after buccal-area ablation. In addition, burst frequency during lung episodes exceeds buccal burst frequency during intervening buccal periods. We will revise the text to make this logic clearer.

      Reviewer #2 (Public review):

      (1) Degeneracy vs. Redundancy

      We agree that degeneracy is the more precise term for the phenomenon our data demonstrate, in which structurally and functionally distinct neurons (lung units) acquire the capacity to participate in buccal rhythm generation under excitation. The Discussion already uses this language (e.g., "necessity and sufficiency may not work in a large degenerate network where rhythm generation is distributed across many elements"), but we used the word "redundant" in the Key Points Summary and Abstract in the broader sense of distributed robustness that a wider readership could grasp. Nonetheless, we recognize the distinction drawn by Goaillard and Marder (2021) and, considering the reviewers concerns, we will revise the Abstract and Key Points to adopt the degeneracy framework consistently.

      (2) Loss of Essential Requirement for a Discrete Oscillator

      The reviewer asks whether expansion of the rhythmically active region necessarily implies loss of the rhythmogenic kernel. We believe our necessity and sufficiency experiments (Figure 9) directly address this. Under baseline conditions, GABA microinjection into the buccal area reliably abolishes buccal bursting; under 60 nM bath AMPA, the same injection at the same location and volume has no significant effect on buccal frequency. If the kernel remained essential and the surrounding recruitment were merely supplementary, local inhibition of the kernel should still slow or abolish the rhythm. It does not. We interpret this as evidence that the essential requirement for the discrete buccal area is lost under excitation, not merely that a larger area has been recruited around a still-critical core. We acknowledge, however, that the word "lost" could be read as implying permanent elimination rather than state-dependent suspension, and we will temper this language in the revision.

      (3) Novelty Relative to Mammalian Studies

      We appreciate the reviewer drawing attention to the cited mammalian literature (Del Negro et al., 2002, 2009; Baertsch et al., 2018, 2019), which we discuss in detail in the manuscript. However, we respectfully note that our findings extend this literature in several ways that the public review does not acknowledge. First, Baertsch et al. demonstrated recruitment of tonic or silent neurons to become phasically active during inspiration; we show that neurons already assigned to one oscillator phase (lung) can be dynamically reassigned to another (buccal), which represents a qualitatively different form of reconfiguration. Second, we developed a novel approach to functionally ablate motor neuron pools using high-frequency nerve stimulation, enabling the unit survey to be interpreted at the premotor level which was not achieved in the mammalian studies cited. Third, our data provide the first demonstration of state-dependent oscillator expansion in a non-mammalian tetrapod, offering evolutionary context that strengthens the generality of the principle. We will revise the term "promiscuous" if it overstates the claim, but we maintain that our data support the conclusion that oscillator boundaries are flexible, which goes beyond what has been shown in mammals.

      (4) Figure 6, CN5 Output Under AMPA

      The reviewer asks whether the shift in premotor unit composition is reflected in CN5 motor output. This is a reasonable question. As noted in the manuscript, 60 nM AMPA produces only minor changes in the overt motor pattern as recorded from CN5, which is precisely why we interpret the premotor changes as a reorganization of the network's internal architecture that is not readily apparent from motor output alone. This is in sharp contrast to observations of substantive network reconfiguration in mammals in which eupnea is replaced by the pathological condition of gasping. We will add quantification of CN5 burst parameters (amplitude, duration, frequency) under baseline and 60 nM AMPA to make this point explicit.

      (5) Subthreshold Recruitment vs. Network Expansion

      The reviewer suggests that neurons classified as newly rhythmic under AMPA may have been part of the rhythmic network all along, receiving subthreshold inputs at baseline. We are grateful to the reviewer for highlighting this and hope they would agree that the literature clearly demonstrates that all respiratory neurons receive subthreshold phasic inputs of one kind or another, perhaps providing a clue that reconfiguration is a common feature of respiratory networks generally. Regardless of the implications for other animals, we agree this is likely the mechanism at work in the frog, and indeed our manuscript states that "this increase in the number and proportion of premotor buccal units is due in part to recruitment of sub-threshold buccal neurons that, under low excitability, only fire during lung bursts," citing intracellular evidence from Kogo and Remmers (1994) that lung neurons in this region receive subthreshold buccal-timed input. We note that this observation does not diminish our conclusion and likely explains the mechanism by which network expansion occurs. Whether one calls these neurons "newly recruited" or "pushed above threshold," the functional consequence is the same: a larger population of neurons is now rhythmically active during buccal bursts, and the necessity of the original buccal area is lost. We will clarify this reasoning in the revision and acknowledge the limitation that additional intracellular recordings from our preparation would be needed to fully characterize the subthreshold dynamics.

      (6) Figure 8, Epoch Length and Meta-analysis

      The reviewer notes that the pre-AMPA epoch appears shorter than the post-AMPA epoch in Figure 8A, which could bias unit classification. We will address this in the revision by reporting epoch durations explicitly and addressing its implication on spike counts where appropriate. Regarding the request for meta-analysis of lung unit spiking during baseline buccal bursts: this analysis is part of the rationale for the phase-recruitment panels, and we will expand Figure 8 to include the requested cross-condition comparisons (lung unit activity during baseline buccal bursts, and during post-AMPA lung bursts) as also suggested by Reviewer 3.

      (7) Figure 9, Buccal-to-Lung Burst Ratio

      The reviewer observes that the ratio of buccal to lung bursts decreases from approximately 4-5:1 under baseline to 2-3:1 under 60 nM AMPA, and suggests this is inconsistent with conversion of lung units into buccal units. We do not believe this is inconsistent. The buccal-to-lung burst ratio reflects the overt motor pattern, which is determined by the interaction of multiple oscillators and is influenced by AMPA at both buccal and lung levels. A change in this ratio does not speak to whether individual premotor units have acquired buccal-timed activity; the unit survey and the single-unit transformation data (Figure 8) address that question directly. Regarding the alternative model involving efference copy and cross-inhibition: this is an interesting hypothesis, but it is speculative and not tested by the current dataset. We are happy to discuss lung-buccal interactions more fully in the revision, including the parallels to parafacial/preBötC interactions in mammals, but we note that our data on unit transformation are better explained by network reconfiguration than by a feedback model that remains to be tested.

      (8) "Independent" Slices

      The reviewer compares our Level 2 transection to the preBötC sandwich slice preparation and argues the two resulting slices are not independent. We take the reviewer's point that "independent" may be read as implying no shared developmental or functional origin, which is not our intent. By "independent" we mean that the two physically separated slices can each generate rhythmic output without being synaptically connected to each other. This is, in fact, our central point: rhythmogenic capacity is distributed across a region broad enough to endow two separated slices with independent rhythm-generating capability when excited. We note that the analogy to the sandwich slice is imperfect because in our Level 1 cuts, only the rostral slice containing the buccal area generates rhythm -- the caudal slice does not -- whereas Level 2 cuts that bisect the buccal area produce rhythmicity in both halves, consistent with distributed capacity specifically within the buccal region. We will revise the wording to clarify what we mean by "independent" in this context.

      Reviewer #3 (Public review):

      Physiological Parallels: We will expand the Discussion to place these findings in a broader comparative context, including the eupnea-to-gasping transition in mammals as an example of state-dependent reconfiguration of respiratory networks. This will also allow us to clarify two advances that may otherwise be missed when comparing our work to that in mammals: (a) we developed a novel approach to functionally eliminate motor neurons, allowing mapped units to be interpreted as premotor; and (b) the state-dependent reconfiguration of the buccal oscillator occurred without qualitative changes in the overt lung-buccal motor pattern.

      Unit Transformation Analysis: We will revise Figure 8 to improve clarity around the observed lung-to-buccal transformation by expanding the phase-recruitment panels as suggested and will revisit the operational definitions of lung and buccal unit identity to reduce ambiguity. The central observation is that some units active only during lung bursts under one condition become active during buccal bursts when network excitation is increased.

      Saturation vs. Network Expansion: We will directly address the possibility that 60 nM bath-applied AMPA simply pushes the network toward a frequency ceiling. Two observations strongly argue against this interpretation: (a) 60 nM global AMPA produced only mild changes in buccal frequency, whereas local AMPA injection at much higher concentrations produced larger effects; and (b) local GABA was sufficient to reverse the effects of high-concentration local AMPA microinjections but insufficient to abolish rhythm during low-concentration global AMPA application. Together, these findings are more consistent with global AMPA endowing the network with distributed rhythm-generating capacity than with simple saturation of a discrete local oscillator. Notwithstanding these arguments, we will attempt to extend AMPA/GABA dose response experiment as suggested or add the lack of such experiments as a caveat to our interpretation.

      Figure 9C Correction: We will correct the statistical markings in Figure 9C to align with the text in the Results regarding the significance of frequency changes under 60 nM AMPA.

      In total, we believe these revisions will improve the rigor and clarity of the manuscript while preserving the central conclusion supported by the data: that the organization of the frog respiratory rhythmogenic network is state dependent and becomes more distributed under excitation.

    1. eLife Assessment

      This valuable study addresses a timely question regarding the contribution of transposable elements to splice isoform diversity in the Drosophila brain, directly engaging with recent conflicting findings in the field. The work provides convincing evidence that TE-gene chimeric transcripts are detectable and that prior discrepancies largely arise from methodological differences in computational pipelines and experimental design. The combination of reanalysis, methodological clarification, and targeted validation represents a technical contribution that will be of interest to researchers studying transcriptome complexity and transposable elements. However, the strength of evidence would be further enhanced by increased methodological transparency, more rigorous experimental controls, and a more cautious interpretation of functional implications.

    2. Reviewer #1 (Public review):

      Summary:

      Choucri and Treiber have reassessed their previous study on TE-gene chimeric transcripts in neural genes in response to Azad et al (2024). Azad and colleagues argued that, contrary to Choucri and Treiber's findings, chimeric TE-mRNAs are relatively infrequent, and they cautioned that further optimization of bioinformatics pipelines is needed to detect TE insertions from RNAseq accurately. In this short response, Choucri and Treiber clearly demonstrate that differences in the tools used between their study and that of Azad et al. likely account for the contrasting results, along with RT-PCR failure in designing primers that would match the chimeric transcript, and the use of different Drosophila lines. The authors emphasize the need for uniform, standardized criteria in such analysis, which would ultimately strengthen and advance the field.

      Strengths:

      The addition of a ratio to compute the number of splice reads specific to the chimeric transcript and compare to the exon-exon splice reads is really interesting because it opens the door to finally quantify the contribution of chimeric TEs to the overall gene expression, although this is not the scope of the present article. The clear dissection of chimeric transcripts, along with the results from Azad et al, allows us to understand the differences between the two studies confidently. Finally, the discussion on Drosophila lines is indeed essential, given that the lines and even individuals have high TE polymorphism.

      Weaknesses:

      I think it is necessary to add more detail to this article, for instance, the differences between TEchim and Tidal could be laid out more precisely. Regarding the roo example, one of the caveats of this family, along with others, is the presence of simple repeats. It would be important to show that the simple repeats are not interfering with the read mapping. Regarding the experiments, if we are looking for a standardized protocol, then we should have a detailed material and methods section, with every experiment, replicate, and PCR temperature clearly defined. Finally, and in my opinion, more importantly, the use of RT negative controls on the RT PCRs, along with DNA PCRs to show insertion presence, is mandatory for testing the presence of chimeric genes. Of course, water negative PCR controls are also needed, and unfortunately, absent from Figure 3.

    3. Reviewer #2 (Public review):

      Summary:

      This study by Choucri and Treiber aims to directly address a recent critique regarding the role of transposable elements (TEs) in diversifying the neural transcriptome of Drosophila. The authors seek to demonstrate that TEs are not merely genomic "noise" but are frequently and reliably "exonized" into brain-specific mRNA. By introducing an upgraded computational pipeline, TEChim, and conducting precise experimental validations, the authors set out to show that TE-mediated splicing represents a genuine biological phenomenon that expands the molecular repertoire of the nervous system.

      Strengths:

      The study's primary strength lies in its rigorous technical "forensic" analysis of previous failed replication attempts. The authors convincingly demonstrate that the lack of signal in the opposing study stemmed from a fundamental methodological mismatch: the software used by the critics (TIDAL) is logically incapable of detecting splice sites located within TE sequences. Importantly, the authors complement this computational clarification with definitive experimental evidence through an effective "experimental rescue." By employing correctly designed primers and matching the genetic backgrounds of the fly strains, thereby accounting for genomic polymorphisms, they successfully validated all seven loci that were previously reported as undetectable. This dual-pronged strategy, addressing both algorithmic bias and experimental design, establishes a more robust technical benchmark for the detection and validation of TE-derived exons in neural tissues.

      Weaknesses:

      While the technical rebuttal is highly convincing, the scope of the study remains primarily defensive. As a response to a prior critique, the work focuses on establishing the existence and detectability of chimeric TE-derived transcripts rather than exploring their broader functional consequences. As a result, there is limited new insight into how these TE-modified isoforms influence neural circuit function or organismal behavior. In addition, the detection and validation of these events remain technically demanding, requiring deep sequencing and specialized bioinformatic expertise, which may limit broader adoption by laboratories without dedicated computational resources.

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript by Choucri and Treiber responds to a recent paper by Azad et al., which responds to a paper by Treiber and Wadell (Genome Research, 2020). The controversy relates to the detection of transcripts with transposable elements (TEs) spliced into them in the Drosophila brain.

      Strengths:

      The authors now argue convincingly that these transcripts exist using an improved, updated version of their pipeline. They also validate some of their findings using RT-PCR and explain why Azad et al. failed to detect these transcripts due to methodological errors. Overall, I am convinced that these transcripts exist and that the TE-derived transcripts described by Choucri and Treiber are real.

      Weaknesses:

      The authors should mention that combining PCR-amplified cDNA generation with short-read sequencing is suboptimal for detecting TE-fusion transcripts. Recently, direct long-read ONT RNA sequencing, which does not require amplification and spans the entire transcript, has been used to detect similar transcripts in human stem cells and the human brain (PMID: 40848716 & Garza et al, BioRxiv). Had the authors used this technology to validate their findings, there would be no question about these transcripts. If not doing such experiments, then they should at least discuss the possibility and the advantage of the approach.

    1. eLife Assessment

      This study presents an important methodological advance-Liver-CUBIC combined with multicolor metallic nanoparticle perfusion-that enables high-resolution 3D visualization of the liver's complex multi-ductal architecture. The identification of the Periportal Lamellar Complex (PLC) as a novel perivascular structure with distinct cellular composition and low-permeability characteristics is convincing, supported by rigorous imaging data. The observed scaffolding role during fibrosis offers intriguing biological insights, though the functional claims would benefit from direct experimental validation.

    2. Reviewer #1 (Public review):

      [Editors' note: this version has been assessed by the Reviewing Editor without further input from the original reviewers. The authors have addressed the minor comments raised in the previous round of review.]

      Summary:

      In this manuscript, Chengjian Zhao et al. focused on the interactions between vascular, biliary, and neural networks in the liver microenvironment, addressing the critical bottleneck that the lack of high-resolution 3D visualization has hindered understanding of these interactions in liver disease.

      Strengths:

      This study developed a high-resolution multiplex 3D imaging method that integrates multicolor metallic compound nanoparticle (MCNP) perfusion with optimized CUBIC tissue clearing. This method enables the simultaneous 3D visualization of spatial networks of the portal vein, hepatic artery, bile ducts, and central vein in the mouse liver. The authors reported a perivascular structure termed the Periportal Lamellar Complex (PLC), which is identified along the portal vein axis. This study clarifies that the PLC comprises CD34⁺Sca-1⁺ dual-positive endothelial cells with a distinct gene expression profile, and reveals its colocalization with terminal bile duct branches and sympathetic nerve fibers under physiological conditions.

      Comments on revisions:

      The authors very nicely addressed all concerns from this reviewer. There are no further concerns and comments.

    3. Reviewer #3 (Public review):

      Xu, Cao and colleagues aimed to overcome the obstacles of high-resolution imaging of intact liver tissue. They report successful modification of the existing CUBIC protocol into Liver-CUBIC, a high-resolution multiplex 3D imaging method that integrates multicolor metallic compound nanoparticle (MCNP) perfusion with optimized liver tissue clearing, significantly reducing clearing time and enabling simultaneous 3D visualization of the portal vein, hepatic artery, bile ducts, and central vein spatial networks in the mouse liver. Using this novel platform, the researchers describe a previously unrecognized perivascular structure they termed Periportal Lamellar Complex (PLC), regularly distributed along the adult liver portal veins.<br /> Using available scRNAseq data, the authors assessed the CD34⁺Sca-1⁺ cells' expression profile, highlighting mRNA presence of genes linked to neurodevelopment, bile acid transport, and hematopoietic niche potential. Different aspects of this analysis were then addressed by protein staining of selected marker proteins in the mouse liver tissue. Next, the authors addressed how the PLC and biliary system react to CCL4-induced liver fibrosis, implying PLC dynamically extends, acting as a scaffold that guides the migration and expansion of terminal bile ducts and sympathetic nerve fibers into the hepatic parenchyma upon injury.

      The work clearly demonstrates the usefulness of the Liver-CUBIC technique and the improvement of both resolution and complexity of the information, gained by simultaneous visualization of multiple vascular and biliary systems of the liver. The identification of PLC and the interpretation of its function represent an intriguing set of observations that will surely attract the attention of liver biologists as well as hepatologists. The importance of the CD34+/Sca1+ endothelial cell population and claims based on transcriptomic re-analysis require future assessment by functional experimental approaches to decipher the functional molecules involved in PLC formation, maintenance, and the involvement in injury response before establishing their role in biliary, arterial, and neural liver systems.

      Strengths:

      The authors clearly demonstrate an improved technique tailored to the visualization of the liver vasulo-biliary architecture in unprecedented resolution.<br /> This work proposes a new morphological feature of adult liver facilitating interaction between the portal vein, hepatic arteries, biliary tree, and intrahepatic innervation, centered at previously underappreciated protrusions of the portal veins - PLCs.

      Weaknesses:

      The importance of CD34+Sca1+ endothelial cell sub-population for PLC formation and function was not tested and warrants further validation.

    4. Author Response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Chengjian Zhao et al. focused on the interactions between vascular, biliary, and neural networks in the liver microenvironment, addressing the critical bottleneck that the lack of high-resolution 3D visualization has hindered understanding of these interactions in liver disease.

      Strengths:

      This study developed a high-resolution multiplex 3D imaging method that integrates multicolor metallic compound nanoparticle (MCNP) perfusion with optimized CUBIC tissue clearing. This method enables the simultaneous 3D visualization of spatial networks of the portal vein, hepatic artery, bile ducts, and central vein in the mouse liver. The authors reported a perivascular structure termed the Periportal Lamellar Complex (PLC), which is identified along the portal vein axis. This study clarifies that the PLC comprises CD34⁺Sca-1⁺ dual-positive endothelial cells with a distinct gene expression profile, and reveals its colocalization with terminal bile duct branches and sympathetic nerve fibers under physiological conditions.

      Comments on revisions:

      The authors very nicely addressed all concerns from this reviewer. There are no further concerns and comments.

      We thank the reviewer for the positive evaluation and helpful feedback.

      Reviewer #3 (Public review):

      Xu, Cao and colleagues aimed to overcome the obstacles of high-resolution imaging of intact liver tissue. They report successful modification of the existing CUBIC protocol into Liver-CUBIC, a high-resolution multiplex 3D imaging method that integrates multicolor metallic compound nanoparticle (MCNP) perfusion with optimized liver tissue clearing, significantly reducing clearing time and enabling simultaneous 3D visualization of the portal vein, hepatic artery, bile ducts, and central vein spatial networks in the mouse liver. Using this novel platform, the researchers describe a previously unrecognized perivascular structure they termed Periportal Lamellar Complex (PLC), regularly distributed along the adult liver portal veins.

      Using available scRNAseq data, the authors assessed the CD34<sup>+</sup>/Sca-1<sup>+</sup> cells' expression profile, highlighting mRNA presence of genes linked to neurodevelopment, bile acid transport, and hematopoietic niche potential. Different aspects of this analysis were then addressed by protein staining of selected marker proteins in the mouse liver tissue. Next, the authors addressed how the PLC and biliary system react to CCL4-induced liver fibrosis, implying PLC dynamically extends, acting as a scaffold that guides the migration and expansion of terminal bile ducts and sympathetic nerve fibers into the hepatic parenchyma upon injury.

      The work clearly demonstrates the usefulness of the Liver-CUBIC technique and the improvement of both resolution and complexity of the information, gained by simultaneous visualization of multiple vascular and biliary systems of the liver. The identification of PLC and the interpretation of its function represent an intriguing set of observations that will surely attract the attention of liver biologists as well as hepatologists. The importance of the CD34+/Sca1+ endothelial cell population and claims based on transcriptomic re-analysis require future assessment by functional experimental approaches to decipher the functional molecules involved in PLC formation, maintenance, and the involvement in injury response before establishing their role in biliary, arterial, and neural liver systems.

      Strengths:

      The authors clearly demonstrate an improved technique tailored to the visualization of the liver vasulo-biliary architecture in unprecedented resolution.

      This work proposes a new morphological feature of adult liver facilitating interaction between the portal vein, hepatic arteries, biliary tree, and intrahepatic innervation, centered at previously underappreciated protrusions of the portal veins - PLCs.

      Weaknesses:

      The importance of CD34+Sca1+ endothelial cell sub-population for PLC formation and function was not tested and warrants further validation.

      We thank the reviewer for the valuable comment regarding the potential role of the CD34<sup>+</sup>/Sca-1<sup>+</sup> endothelial cell sub-population in PLC function.

      We agree that direct functional validation would be a crucial next step to confirm the contribution of this specific sub-population to PLC formation and function. The focus of the present study remains on the spatial localization and reproducible characterization of PLC structures based on 3D imaging, as well as the relevant transcriptional features revealed by single-cell analysis.

      To avoid overinterpretation, we have revised the Discussion section accordingly, providing a more focused and cautious description of the related findings.

      Comments on revisions:

      I appreciate the author's effort to revise the text so it more rigorously adheres to the presented evidence. Following a thorough read of the revised text, a few remaining minor issues were identified in the Discussion.

      (1) From where comes the hard evidence for PLC being the stem cell niche in the following sentence?

      for the two following statements:

      This suggests that the PLC may not only provide structural support but also serve as a perivascular stem cell niche specific to the portal region, potentially involved in hematopoiesis and tissue regeneration.

      The PLC serves as a directional scaffold for ductal growth, a specialized stem cell niche, and a potential site of neurovascular coupling.

      We thank the reviewer for this important comment. We agree that the term “stem cell niche” may imply functional evidence for direct stem cell regulation, which was not demonstrated in this study. Our conclusions were based on the spatial enrichment and transcriptional features of CD34<sup>+</sup>/Sca-1<sup>+</sup> endothelial populations expressing hematopoiesis-related genes in the portal region.

      To avoid overinterpretation, we have revised the sentence to remove the term “stem cell niche” and instead describe the PLC as being enriched in perivascular endothelial cell populations with hematopoiesis-related gene expression features. The revised text now reads:

      “These results suggest that, beyond structural support, the PLC in the portal region is enriched with perivascular endothelial cell populations exhibiting hematopoiesis-related gene expression features.”

      We have also modified the corresponding statement later in the Discussion. It now reads:

      “The PLC serves as a directional scaffold for ductal growth, displays distinct perivascular endothelial transcriptional features in the portal region, and may represent a potential site of neurovascular coupling.”

      We believe this wording more accurately reflects the descriptive and transcriptomic nature of our data without implying functional niche activity.

      (2) In the following paragraph, I lack references to the previously published evidence of liver innervation guidance mechanisms, such as the mesenchyme-mediated guidance (CD31- population) Gannoun et al., 2023 https://doi.org/10.1242/dev.201642, an important context for your finding.

      Further analysis showed significant upregulation of genes involved in neurodevelopment and axonal guidance in the CD34<sup>+</sup>/Sca-1<sup>+</sup> cluster, along with activation of neuronal signaling pathways. Immunostaining confirmed the presence of TH<sup>+</sup> sympathetic nerve fibers wrapping around the PLC in a "beads-on-a-string" pattern (Fig. 6), consistent with a classic neurovascular unit(Adori et al., 2021). Previous studies have shown that sympathetic nerves enter the liver along collagen fibers of Glisson's capsule and interact with hepatic arteries, portal veins, and bile duct epithelium, supporting the PLC as a scaffold for intrahepatic neurovascular integration.

      We thank the reviewer for highlighting the importance of previously published evidence regarding liver innervation guidance mechanisms. We agree that these studies provide important context for interpreting the neurodevelopmental and axon guidance–related transcriptional signatures observed in our dataset. Accordingly, we have revised the Discussion section to incorporate reference to mesenchyme-mediated axon guidance mechanisms in the portal region during liver development (Gannoun et al., 2023). This addition better situates our findings within the existing literature.

      (3) Several sentences have issues with a lack of space between words.

      We have carefully re-examined the entire manuscript for spacing and formatting inconsistencies and corrected minor typographical issues to ensure uniform formatting throughout the text.

    1. eLife Assessment

      This manuscript presents a valuable study of the activity and functional relevance of different circuits in the dentate gyrus of mice performing a pattern separation task. Solid evidence is presented to support the paper's central conclusions. The study is likely to be of interest to those studying the subregional organization and cell type-specific functions of the dentate gyrus.

    2. Reviewer #1 (Public review):

      This manuscript investigates how dentate gyrus (DG) granule cell subregions, specifically suprapyramidal (SB) and infrapyramidal (IB) blades, are differentially recruited during a high cognitive demand pattern separation task. The authors combine TRAP2 activity labeling, touchscreen-based TUNL behavior, and chemogenetic inhibition of adult-born dentate granule cells (abDGCs) or mature granule cells (mGCs) to dissect circuit contributions.

      This manuscript presents an interesting and well-designed investigation into DG activity patterns under varying cognitive demands and the role of abDGCs in shaping mGC activity. The integration of TRAP2-based activity labeling, chemogenetic manipulation, and behavioral assays provides valuable insight into DG subregional organization and functional recruitment. However, several methodological and quantitative issues limit the interpretability of the findings. Addressing the concerns below will greatly strengthen the rigor and clarity of the study.

      Major points:

      (1) Quantification methods for TRAP+ cells are not applied consistently across panels in Figure 1, making interpretation difficult. Specifically, Figure 1F reports TRAP+ mGCs as density, whereas Figure 1G reports TRAP+ abDGCs as a percentage, hindering direct comparison. Additionally, Figure 1H presents reactivation analysis only for mGCs; a parallel analysis for abDGCs is needed for comparison across cell types.

      (2) The anatomical distribution of TRAP+ cells is different between low- and high-cognitive demand conditions (Figure 2). Are these sections from dorsal or ventral DG? Is this specific to dorsal DG, as itis preferentially involved in cognitive function? What happens in ventral DG?

      (3) The activity manipulation using chemogenetic inhibition of abDGCs in AsclCreER; hM4 mice was performed; however, because tamoxifen chow was administered for 4 or 7 weeks, the labeled abDGC population was not properly birth-dated. Instead, it consisted of a heterogeneous cohort of cells ranging from 0 to 5-7 weeks old. Thus, caution should be taken when interpreting these results, and the limitations of this approach should be acknowledged.

      (4) There is a major issue related to the quantification of the DREADD experiments in Figure 4, Figure 5, Figure 6, and Figure 7. The hM4 mouse line used in this study should be quantified using HA, rather than mCitrine, to reliably identify cells derived from the Ascl lineage. mCitrine expression in this mouse line is not specific to adult-born neurons (off-targets), and its expression does not accurately reflect hM4 expression.

      (5) Key markers needed to assess the maturation state of abDGCs are missing from the quantification. Incorporating DCX and NeuN into the analysis would provide essential information about the developmental stage of these cells.

      Minor points:

      (1) The labeling (Distance from the hilus) in Figure 2B is misleading. Is that the same location as the subgranular zone (SGZ)? If so, it's better to use the term SGZ to avoid confusion.

      (2) Cell number information is missing from Figures 2B and 2C; please include this data.

      (3) Sample DG images should clearly delineate the borders between the dentate gyrus and the hilus. In several images, this boundary is difficult to discern.

      (4) In Figure 6, it is not clear how tamoxifen was administered to selectively inhibit the more mature 6-7-week-old abDGC population, nor how this paradigm differs from the chow-based approach. Please clarify the tamoxifen administration protocol and the rationale for its specificity.

      Comments on revisions:

      I appreciate the authors' careful and thorough revisions. They have addressed all of my previous concerns satisfactorily, and the manuscript is now significantly strengthened. I have no further concerns.

    3. Reviewer #2 (Public review):

      In this study, the authors investigate how increasing cognitive demand shapes activity patterns in the dorsal dentate gyrus (DG). Using a touchscreen-based TUNL task combined with TRAP/c-Fos tagging, birth-dating of adult-born granule cells (abDGCs), and chemogenetic inhibition, they show that higher task demand increases mature granule cell (mGC) recruitment and enhances suprapyramidal (SB) versus infrapyramidal (IB) blade bias. Functionally, mGC inhibition reduces overall activity and impairs performance without disrupting blade bias, whereas inhibition of {less than or equal to}7-week-old abDGCs increases mGC activity, abolishes blade bias, and impairs discrimination under high-demand conditions. These findings suggest that effective pattern separation depends not only on overall DG activity levels but also on the spatial organization of recruited ensembles.

      The integration of touchscreen TUNL with temporally controlled activity tagging and birth-dated cohorts is technically strong. Quantification of SB-IB bias and radial/apical distributions adds anatomical precision beyond bulk activity measures. The comparison between mGC and abDGC inhibition is conceptually compelling and supports dissociable functional roles. Overall, the data convincingly demonstrate that increasing cognitive demand amplifies blade-biased DG recruitment and that mGCs and abDGCs differentially contribute to both behavioral performance and network organization.

      However, how abDGCs are integrated into the mGC network under high cognitive demand remains unresolved. Additional experiments are needed to clarify how abDGCs shape spatial recruitment patterns and whether they directly inhibit or indirectly regulate mGC activity to maintain high performance.

      Furthermore, the authors frame "high cognitive demand" as a multidimensional construct encompassing broad behavioral challenge. It would strengthen the work to delineate how local abDGC-mGC circuit interactions regulate specific task components in real time. This will require higher temporal resolution approaches, as TRAP and c-Fos labeling integrate activity over prolonged windows and primarily reflect sustained engagement rather than moment-to-moment computations.<br /> The central conclusion that dentate function depends on coordinated spatial recruitment rather than total activity magnitude is supported by the data, although mechanistic interpretations should be tempered given methodological limitations.<br /> Overall, this work advances models of adult neurogenesis by emphasizing a critical-period modulatory role of abDGCs in organizing DG network activity during high-demand discrimination. The combined behavioral and circuit-level framework is likely to be influential in the field.

    4. Reviewer #3 (Public review):

      This study examines the role of dentate gyrus neuronal populations, reflecting neurogenesis and anatomical location (suprapyramidal vs infrapyramidal blade), in a mnemonic discrimination task that taxes the pattern separation functions of the dentate. The authors measure dentate gyrus activity resulting from cognitive training and test whether adult neurogenesis is required for both the anatomical patterns of activity and performance in the cognitive task. The authors find that more cognitively challenging variants of the task evoked more dentate activity, but also distinct patterns of activity (more activity in the suprapyramidal blade, less in the infdrapyramidal blade). Using chemogenetic approaches they silence mature vs immature dentate gyrus neurons and find that only mature neurons (either the general population or specifically mature adult-born neurons), and not immature adult-born neurons, are required for the difficult version of the task. Inhibition of mature adult-born neurons furthermore increased overall activity in the dentate and reduced the biased pattern of activity across the blades, consistent with evidence that adult-born neurons broadly regulate dentate gyrus activity.

      Comments on revisions:

      I appreciate the efforts the authors have taken to revise this manuscript. I have only minor concerns with this revised version of the manuscript:

      Methods state that significance is defined as P<0.05 but some results are interpreted as significant when P=0.05. Either the alpha value needs to change or the interpretation needs to change.

      I believe the statistical results for group and blade effects for the ANOVAs, in Figs 2,3 & 4, appear to be switched (blade should be significant, not group).

      I appreciate that sometimes there is not a perfect overlap between immunohistochemical signals, but I continue to believe that the spatially-non-overlapping TRAP and EDU signals in Fig 3 is caused by these 2 markers being in different cells. A Z-stack or orthogonal projection could verify/disprove this concern.

    5. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      This manuscript investigates how dentate gyrus (DG) granule cell subregions, specifically suprapyramidal (SB) and infrapyramidal (IB) blades, are differentially recruited during a high cognitive demand pattern separation task. The authors combine TRAP2 activity labeling, touchscreen-based TUNL behavior, and chemogenetic inhibition of adult-born dentate granule cells (abDGCs) or mature granule cells (mGCs) to dissect circuit contributions.

      This manuscript presents an interesting and well-designed investigation into DG activity patterns under varying cognitive demands and the role of abDGCs in shaping mGC activity. The integration of TRAP2-based activity labeling, chemogenetic manipulation, and behavioral assays provides valuable insight into DG subregional organization and functional recruitment. However, several methodological and quantitative issues limit the interpretability of the findings. Addressing the concerns below will greatly strengthen the rigor and clarity of the study.

      Major points:

      (1) Quantification methods for TRAP+ cells are not applied consistently across panels in Figure 1, making interpretation difficult. Specifically, Figure 1F reports TRAP+ mGCs as density, whereas Figure 1G reports TRAP+ abDGCs as a percentage, hindering direct comparison. Additionally, Figure 1H presents reactivation analysis only for mGCs; a parallel analysis for abDGCs is needed for comparison across cell types.

      In Figure 1G and 1H we report TRAP+ abDGCs as a percentage rather than density because we are analyzing colocalization of the two markers, which are very sparse in this population. Given the very low number of double-labeled abDGCs, calculating density would not be practical. In the revised manuscript we have clarified the rationale for using these measures. As noted in the current text, we did not observe abDGCs co-expressing TRAP and c-Fos; we have made this point more explicit to guide interpretation of these data.

      (2) The anatomical distribution of TRAP+ cells is different between low- and high-cognitive demand conditions (Figure 2). Are these sections from dorsal or ventral DG? Is this specific to dorsal DG, as it is preferentially involved in cognitive function? What happens in ventral DG?

      The sections shown in Figure 2 were obtained from the dorsal dentate gyrus (see Methods, “Histology and imaging”: stereotaxic coordinates −1.20 to −2.30 mm relative to bregma, Paxinos atlas). From a feasibility standpoint, it is not possible to analyze the entire longitudinal extent of the hippocampus with these low-throughput histological approaches. We therefore focused on the dorsal DG, for which there is a strong functional rationale. A large body of work indicates that the dorsal hippocampus, and specifically the dorsal DG, is preferentially involved in spatial memory and in the fine contextual discrimination that underlies pattern separation. The dorsal hippocampus is critical for encoding and distinguishing similar spatial representations, a core component of the high-cognitive demand task used here. In contrast, the ventral DG is more strongly associated with emotional regulation and affective memory processing and is less implicated in high-resolution spatial encoding. For these reasons, the present study was designed to assess TRAP+ cell distributions specifically in the dorsal DG.

      (3) The activity manipulation using chemogenetic inhibition of abDGCs in AsclCreER; hM4 mice was performed; however, because tamoxifen chow was administered for 4 or 7 weeks, the labeled abDGC population was not properly birth-dated. Instead, it consisted of a heterogeneous cohort of cells ranging from 0 to 5-7 weeks old. Thus, caution should be taken when interpreting these results, and the limitations of this approach should be acknowledged.

      We agree that prolonged tamoxifen administration results in labeling a heterogeneous population of abDGCs spanning approximately 0 to 5–7 weeks of age, rather than a precisely birth-dated cohort. This is a limitation of this approach and we have included discussion of this in more detail in the revised manuscript.

      (4) There is a major issue related to the quantification of the DREADD experiments in Figure 4, Figure 5, Figure 6, and Figure 7. The hM4 mouse line used in this study should be quantified using HA, rather than mCitrine, to reliably identify cells derived from the Ascl lineage. mCitrine expression in this mouse line is not specific to adult-born neurons (off-targets), and its expression does not accurately reflect hM4 expression.

      We agree that mCitrine is not a marker that allows localization of hM4Di as it is well known that the mCitrine can be independently expressed in a Cre independent manner in this mouse. As suggested, we have removed the figure that showed the mCitrine and have performed immunohistochemical localization of the DREADD with an antibody against the HA tag. This is now shown in Figure 5.

      (5) Key markers needed to assess the maturation state of abDGCs are missing from the quantification. Incorporating DCX and NeuN into the analysis would provide essential information about the developmental stage of these cells.

      The goal of this study was to examine activity patterns of adult-born versus mature granule cells, rather than to assess maturation state. The adult-born neurons analyzed were 25–39 days old, an age at which point most cells have progressed beyond the DCX⁺ stage and are expected to express NeuN based on prior work. We therefore do not think that including DCX or NeuN quantification would provide additional information relevant to the aims or interpretation of this study.

      Minor points:

      (1) The labeling (Distance from the hilus) in Figure 2B is misleading. Is that the same location as the subgranular zone (SGZ)? If so, it's better to use the term SGZ to avoid confusion.

      We have updated Figure 2B, the Methods, and the main text to more explicitly localize this which it the boundary between the subgranular zone (SGZ) and the hilus.

      (2) Cell number information is missing from Figures 2B and 2C; please include this data.

      We have now added the cell number information to the figure legends. In Figures 2B and 2C, each point corresponds to a single cell, with an equal number of mice per group. The total number of TRAP⁺ cells per mouse is shown in Figure 1F, which reports TRAP⁺ cell densities by group.

      (3) Sample DG images should clearly delineate the borders between the dentate gyrus and the hilus. In several images, this boundary is difficult to discern.

      We made the DG-hilus boundaries clearer in the sample images to improve visualization and interpretation.

      (4) In Figure 6, it is not clear how tamoxifen was administered to selectively inhibit the more mature 6-7-week-old abDGC population, nor how this paradigm differs from the chow-based approach. Please clarify the tamoxifen administration protocol and the rationale for its specificity.

      We apologize for the confusion here. The protocol used in Figure 6 is the same tamoxifen chow–based approach as in Figure 5, differing only in the duration of tamoxifen exposure. Mice in Figure 5 received tamoxifen chow for 7 weeks, whereas mice in Figure 6 received it for 4 weeks, restricting labeling to a younger and narrower cohort of adult-born DGCs. Thus, the population targeted in Figure 6 is younger than that in Figure 5 and does not correspond to mature 6–7-week-old neurons. By contrast, the experiment in Figure 4 targets a more mature population, consisting predominantly of ~5-week-old adult-born neurons as well as mature granule cells, which are Dock10-positive and express Cre endogenously, allowing selective manipulation of this later-stage population.

      We have corrected the paragraph accordingly and clarified the age range of the labeled populations in the revised manuscript.

      Reviewer #2 (Public review):

      Summary

      In this manuscript, the authors combine an automated touchscreen-based trial-unique nonmatching-to-location (TUNL) task with activity-dependent labeling (TRAP/c-Fos) and birth-dating of adult-born dentate granule cells (abDGCs) to examine how cognitive demand modulates dentate gyrus (DG) activity patterns. By varying spatial separation between sample and choice locations, the authors operationally increase task difficulty and show that higher demand is associated with increased mature granule cell (mGC) activity and an amplified suprapyramidal (SB) versus infrapyramidal (IB) blade bias. Using chemogenetic inhibition, they further demonstrate dissociable contributions of abDGCs and mGCs to task performance and DG activation patterns.

      The combination of behavioral manipulation, spatially resolved activity tagging, and temporally defined abDGC perturbations is a strength of the study and provides a novel circuit-level perspective on how adult neurogenesis modulates DG function. In particular, the comparison across different abDGC maturation windows is well designed and narrows the functionally relevant population to neurons within the critical period (~4-7 weeks). The finding that overall mGC activity levels, in addition to spatially biased activation patterns, are required for successful performance under high cognitive demand is intriguing.

      Major Comments

      (1) Individual variability and the relationship between performance and DG activation.

      The manuscript reports substantial inter-animal variability in the number of days required to reach the criterion, particularly during large-separation training. Given this variability, it would be informative to examine whether individual differences in performance correlate with TRAP+ or c-Fos+ density and/or spatial bias metrics. While the authors report no correlation between success and TRAP+ density in some analyses, a more systematic correlation across learning rate, final performance, and DG activation patterns (mGC vs abDGC, SB vs IB) could strengthen the interpretation that DG activity reflects task engagement rather than performance only.

      As mentioned, we previously reported no correlation between task success and TRAP+ density. We have now performed additional analyses examining correlations with learning rate, final performance, and DG activation patterns (mGC vs abDGC, SB vs IB), and found no significant relationships. Therefore, as we did not find any positive correlations the original interpretation that DG activity primarily reflects task engagement rather than performance level seems the most parsimonious.

      (2) Operational definition of "cognitive demand".

      The distinction between low (large separation) and high (small separation) cognitive demand is central to the manuscript, yet the definition remains somewhat broad. Reduced spatial separation likely alters multiple behavioral variables beyond cognitive load, including reward expectation, attentional demands, confidence, engagement, and potentially motivation. The authors should more explicitly acknowledge these alternative interpretations and clarify whether "cognitive demand" is intended as a composite construct rather than a strictly defined cognitive operation.

      We agree that reducing spatial separation between stimuli likely engages multiple behavioral and cognitive processes beyond a single, strictly defined operation. We have now clarified this point in the manuscript and explicitly state that our use of the term “cognitive demand” reflects a multidimensional behavioral challenge rather than a singular cognitive process (see Discussion).

      (3) Potential effects of task engagement on neurogenesis.

      Given the extensive behavioral training and known effects of experience on adult neurogenesis, it remains unclear whether the task itself alters the size or maturation state of the abDGC population. Although the focus is on activity and function rather than cell number, it would be useful to clarify whether neurogenesis rates were assessed or controlled for, or to explicitly state this as a limitation.

      While the primary goal of this study was to examine activity and functional recruitment of adult-born granule cells, we also quantified the survival of birth-dated neurons at the end of behavioral training. Density measurements of BrdU⁺ and EdU⁺ cells revealed no differences across experimental groups, indicating that engagement in the pattern separation task, across low to high cognitive demand conditions, did not significantly alter survival of adult-born neurons. In addition, we examined the spatial distribution of BrdU⁺ and EdU⁺ neurons between the suprapyramidal and infrapyramidal blades of the dentate gyrus. The proportion of newborn neurons was consistent across all groups, with approximately 60% located in the suprapyramidal blade and 40% in the infrapyramidal blade. These findings indicate that behavioral training did not alter the baseline distribution of adult-born neurons. We have now clarified these points in the manuscript (See Results).

      (4) Temporal resolution of activity tagging.

      TRAP and c-Fos labeling provide a snapshot of neural activity integrated over a temporal window, making it difficult to determine which task epochs or trial types drive the observed activation patterns. This limitation is partially acknowledged, but the conclusions occasionally imply trial-specific or demand-specific encoding. The authors should more clearly distinguish between sustained task engagement and moment-to-moment trial processing, and temper interpretations accordingly. While beyond the scope of the current study, this also motivates future experiments using in vivo recording approaches.

      We agree and have made changes to the manuscript to discuss these points (see Discussion and Limitations).

      (5) Interpretation of altered spatial patterns following abDGC inhibition.

      In the abDGC inhibition experiments, Cre+ DCZ animals show delayed learning relative to controls. As a result, when animals are sacrificed, they may be at an intermediate learning stage rather than at an equivalent behavioral endpoint. This raises the possibility that altered DG activation patterns reflect the learning stage rather than a direct circuit effect of abDGC inhibition. Additional clarification or analysis controlling for the learning stage would strengthen the causal interpretation.

      We agree that differences in learning stage could in principle confound the interpretation of DG activation patterns. However, although Cre+ DCZ-treated mice exhibited delayed learning, they ultimately reached the same performance criterion as control animals. Thus, adult-born DGC inhibition did not prevent learning but increased the time required to reach criterion, indicating that these neurons are beneficial for learning efficiency rather than strictly necessary for task acquisition. Importantly, all animals were sacrificed only after reaching the predefined success criterion. Therefore, the immunohistochemical analyses were performed at the same behavioral endpoint for Cre+ DCZ and control groups, even though the number of training days differed. Consequently, the observed differences in DG activation reflect circuit recruitment at equivalent task mastery rather than differences in learning stage.

      (6) Relationship between c-Fos density and behavioral performance.

      The study reports that abDGC inhibition increases c-Fos density while impairing performance, whereas mGC inhibition decreases c-Fos density and also impairs performance. This raises an important conceptual question regarding the relationship between overall activity levels and task success. The authors suggest that both sufficient activity and appropriate spatial patterning are required, but the manuscript would benefit from a more explicit discussion of how different perturbations may shift the identity, composition, or coordination of the active neuronal ensemble rather than simply altering total activity levels.

      We agree that our findings highlight that successful performance is not determined solely by the overall level of dentate gyrus activity, but rather by the composition and spatial organization of the active neuronal ensemble. In our study, inhibition of abDGCs increased overall mGC activity while disrupting the spatially organized, blade-biased activation pattern and impaired performance. In contrast, direct inhibition of mGCs reduced global excitability but preserved the relative spatial organization of active neurons in animals that continued to perform the task. These findings suggest that different perturbations alter task performance by shifting the identity and coordination of the active neuronal ensemble, rather than simply increasing or decreasing total activity levels. We have now expanded the Discussion to more explicitly address how dentate gyrus computations may depend on the structured recruitment of granule cell ensembles and how distinct manipulations differentially disrupt this organization.

      Reviewer #3 (Public review):

      Summary:

      The authors used genetic models and immunohistochemistry to identify how training in a spatial discrimination working memory task influences activity in the dentate gyrus subregion of the hippocampus. Finding that more cognitively challenging variants of the task evoked more and distinct patterns of activity, they then investigated whether newborn neurons in particular were important for learning this task and regulating the spatial activity patterns.

      Strengths:

      The focus on precise anatomical locations of activity is relatively novel and potentially important, given that little is known about how DG subregions contribute to behavior. The authors also use a task that is known to depend on this memory-related part of the brain.

      Weaknesses:

      Statistical rigor is insufficient. Many statistical results are not stated, inappropriate tests are used, and sample sizes differ across experiments (which appear to potentially underlie null results). The chemogenetic approach to inhibit adult-born neurons also does not appear to be targeting these neurons, as judged by their location in the DG.

      Please refer to the updated statistical analyses in response to the recommendations below.

      Recommendations for the authors:

      Reviewing Editor Comments

      Please note that reviewers agreed that appropriate revisions are needed to increase the strength of evidence for the paper's claims. Concerns were raised about a lack of statistical rigor in the statistical analyses used. Results of statistical tests were not consistently provided (i.e., statistic applied, value of statistic, degrees of freedom, p-value), and seemingly inappropriate statistical tests were used in some instances. Also, some comparisons had lower statistical power than others. When clarifying the statistical approaches used in the manuscript, we also encourage you to consider reading this article that outlines common statistical mistakes (Makin TR, Orban de Xivry JJ. Ten common statistical mistakes to watch out for when writing or reviewing a manuscript. Elife. 2019 Oct 9;8:e48175. doi: 10.7554/eLife.48175.), such as the importance of not basing conclusions on a significant p-value for one pair-wise comparison vs a non-significant p-value for another pairwise comparison (i.e., groups that are being compared should be included in the same statistical analysis, and interaction effects should be reported when appropriate). We hope that you find this information to be helpful should you decide to submit a revised manuscript to eLife.

      Reviewer #1 (Recommendations for the authors):

      (1) Standardize TRAP+ quantification across Figure 1.

      Please report TRAP+ cell numbers using consistent metrics (e.g., density or percentage) to enable comparison across cell types. In addition, extend the TRAP+ reactivation analysis in Figure 1H to include abDGCs so that reactivation dynamics can be compared directly between mGCs and abDGCs.

      Reply in Public Review

      (2) Clarify whether dorsal or ventral DG was analyzed in Figure 2.

      The differing anatomical distributions of TRAP+ cells under low- and high-demand conditions raise important questions about DG axis specificity. Please indicate whether analyses were performed in dorsal DG, ventral DG, or both, and provide data or justification accordingly.

      Reply in Public Review

      (3) Acknowledge limitations of the tamoxifen-chow labeling strategy in AsclCreER; hM4 experiments.

      Since tamoxifen chow administered over 4-7 weeks labels a heterogeneous abDGC population spanning a broad age range, this approach does not generate birth-dated cohorts. This limitation should be clearly addressed in the text and interpretations, particularly related to cell age-dependent effects, should be tempered.

      Reply in Public Review

      (4) Revise DREADD quantification using HA rather than mCitrine.

      The hM4 mouse line requires HA immunostaining to accurately identify Ascl-lineage cells expressing the DREADD receptor. Because mCitrine is not specific to adult-born neurons and does not reliably reflect hM4 expression, quantification based on mCitrine should be revised.

      Reply in Public Review

      (5) Include markers to assess abDGC maturation state.

      Adding quantification of DCX and NeuN would help define the developmental stage of abDGCs in key experiments and improve the interpretation of cell-age-dependent effects.

      Reply in Public Review

      (6) Clarify DG layer boundaries and terminology in Figure 2.

      If the metric labeled "Distance from the hilus" corresponds to the subgranular zone (SGZ), using SGZ terminology would prevent confusion. Additionally, please provide clearer delineation of DG and hilus borders in sample images.

      Reply in Public Review

      (7) Provide missing cell number data for Figures 2B and 2C.

      Reply in Public Review

      (8) Clarify the tamoxifen administration protocol in Figure 6.

      Please describe how the protocol selectively targets 6-7-week-old abDGCs and how it differs from the chow-based approach. This will help readers understand the intended specificity of the manipulation.

      Reply in Public Review

      Reviewer #2 (Recommendations for the authors):

      (1) EdU birth-dating timeline

      The manuscript would benefit from a clearer description of the EdU birth-dating timeline, ideally with a schematic similar to that provided for BrdU in Supplementary Figure 1.

      We appreciate the suggestion. However, we did not include a separate schematic for EdU because its use and birth-dating logic are identical to BrdU (both are thymidine analogs administered systemically and incorporated during S-phase). Therefore, the timeline shown in Supplementary Figure 1 applies equally to both markers. We have clarified this point in the Methods section to avoid confusion.

      (2) Clarity of TUNL task description.

      The description of the TUNL task, particularly for readers unfamiliar with touchscreen-based paradigms, is difficult to follow without consulting prior literature. A simplified schematic or a clearer step-by-step explanation in the main text or supplementary material would improve accessibility.

      We note that the main steps of the TUNL protocol are illustrated in Figure 1A, Supplementary Figure 2A and 2B. Nevertheless, we agree that the description in the text can be made clearer for readers less familiar with touchscreen-based tasks. Thus , we have now revised the Methods section to provide a clearer step-by-step description of the TUNL.

      (3) Influence of outliers in Figure 1G.

      In Figure 1G, the reported trend that ~1% of 25-39-day-old abDGCs are TRAP+ during LS trials appears to be driven by a small number of outliers. This should be acknowledged, and the wording of the conclusion moderated to reflect the variability in the data.

      We agree with the reviewer that the apparent outliers reflect the inherent sparsity of TRAP labeling in this population. In absolute terms, this corresponds to between 0 and 2 TRAP⁺ 25–39-day-old abDGCs per mouse, such that the presence or absence of a small number of labeled cells can appear as outliers when expressed as a percentage. We have revised the text to acknowledge this (see Results).

      (4) Presentation of learning curves.

      Rather than focusing primarily on "days before criterion" (DBC), it would be helpful to show full learning curves across the entire training period. This would provide a clearer picture of acquisition dynamics and inter-animal variability.

      We agree that learning curves can be informative in many behavioral paradigms. However, in our protocol, mice do not undergo the same number of training days because training stops individually once each animal reaches criterion. As a result, plotting full learning curves would produce trajectories of different lengths, making group comparisons difficult and visually cluttered. For this reason, we aligned animals based on days before criterion (DBC), which allows direct comparison of learning dynamics relative to task acquisition. We also consider the cumulative probability representation to be the most appropriate way to summarize learning progression across animals in this context which are also included in the figures.

      (5) Clarification of Figure 3B labeling

      In Figure 3B, the identity of the orange-labeled group above the LS condition is unclear. Clarification in the figure legend would improve interoperability.

      Figure 3B includes two experimental groups. One group performed both the large- and small-separation conditions; this group is shown in orange and labeled LS. Within this group, the upper orange trace corresponds to performance in the large-separation condition, while the lower orange trace corresponds to performance in the small-separation condition. The second group is a control group that performed only the large-separation configuration, and therefore only a single green trace is shown. We agree that this distinction was not sufficiently clear and have revised the figure legend and text to clarify the identity of each trace.

      Reviewer #3 (Recommendations for the authors):

      (1) Please label figures and, even better, put the legends on the same page.

      (2) Just to confirm, in establishing the task, mice performed above 70% for the small separation trials in one of the sessions on 2 consecutive days, for each criterion? Performance seems to be below 70%.

      Yes. To meet the criterion, each mouse had to reach ≥70% correct performance in at least one of the two daily sessions on two consecutive days. We then averaged the performance across both sessions for each of those days. As a result, if one session was ≥70% but the other was lower, the daily average could fall below 70%. The values shown in the figure correspond to these daily averages, further averaged across mice.

      (3) mGC needs to be explicitly defined. Am I assuming any non-birthdated GC is an mGC according to the authors? (which means it is unknown whether they are in fact mature, though likely most of them are).

      In this study, “mature granule cells” (mGCs) refer operationally to granule cells that are not birth-dated with BrdU or EdU and therefore are not classified as adult-born neurons within the defined labeling window. We agree that this population is not directly age-defined, and that while the majority are expected to be mature based on their birth timing relative to the labeling period, we cannot exclude the possibility that a small fraction may include younger, unlabeled neurons. We have now explicitly defined this usage of mGCs in the Methods and clarified this point in the text to avoid ambiguity.

      (4) Methods state that Kruskal-Wallis tests were used when more than 3 groups were compared, but I don't see these stats presented (e.g., for trap data in Figure 1, blade x task TRAP expt in Figure 3 (should be 2-way RM anova here and elsewhere), etc) or any corrections for multiple comparisons. I appreciate that the mean rates of TRAPed abGCs are higher in the S and LS groups than in the shaping group, but most mice do not have any BrdU+ cells that are also TRAPed, and there are no statistics here to support the claim. I don't think there is enough sampling to accurately quantify activation of abGCs. Also, no stats to support the claim that TRAPing increases at the "tip of the SB after the more demanding LS task".

      We agree with this comment. We have now systematically tested all datasets for normality (by group) and applied parametric tests when the data met normality assumptions, and non-parametric tests otherwise. The statistical analyses have been revised accordingly. We added the appropriate tests (including two-way ANOVA where relevant, such as for blade × group comparisons) and now report full statistics in the figure legends and results sections. For the TRAP analyses in adult-born DGCs, we explicitly acknowledge the very low number of BrdU⁺/TRAP⁺ cells, which limits statistical power and, in some cases, precludes robust statistical testing. These limitations are now clearly stated in the Results and Discussion, and the corresponding interpretations have been tempered. For all Kruskal–Wallis tests, post hoc pairwise comparisons were performed using Dunn’s test, with Bonferroni correction for multiple comparisons, as now specified in the Methods section. We also expanded the Methods to describe the statistical workflow in detail. In addition, we have added the previously missing statistical analysis for Figure 2C. Comparisons were performed between the 0–50% and 50–100% portions of the blade, where 0% corresponds to the apex and 100% corresponds to the distal tip of the blade.

      (5) Figure 3I: I can't figure out which effect is statistically significant here (what does the asterisk signify?). Why no individual data points in this graph?

      We agree that the absence of individual data points reduced interpretability, and we have now updated the figure to include individual data points to better illustrate data distribution and variability.

      (6) The gradient of activity (shap < S < LS) could be due to how long they've been trained on a given stage (e.g. less activity during shaping because they have habituated, and neurons encoding that task phase have already been selected)

      We agree that task duration and habituation could, in principle, influence activity levels. Under this interpretation, higher activity would primarily reflect task novelty rather than cognitive demand. However, our data do not support this explanation. Specifically, we found no correlation between the number of training days required to reach criterion and c-Fos–positive or TRAP-positive cell density within a given stage. Thus, animals that reached criterion rapidly did not show higher activity levels than animals that required more days of training and were presumably more habituated to the task demands. This suggests that the observed activity gradient (shaping < S < LS) is not driven by exposure duration or habituation, but rather reflects differences in cognitive demand across task stages.

      (7) The TRAP+ EDU+ cell in Figure 3 looks odd because the BrdU signal is (a lot) larger than the TRAP signal, but BrdU is in the nucleus and should be smaller.

      We agree that the example in Figure 3 is not optimal. In dividing cells, BrdU/EdU signals can sometimes appear broader or closely apposed, which may affect their apparent size.

      (8) For the Ascl-HM4Di experiment, HM4Di appears to be expressed in all of the areas of the granule cell layer where abGCs are NOT located (i.e. no expression in the deep cell layer, near the sgz). This is problematic because it suggests perhaps abGCs are not inhibited as expected.

      As noted in our response to Reviewer #1, we did not use the mCitrine to localize the DREADD receptor as it has been demonstrated that mCitrine expression is expressed in a Cre-independent manner and not correlated with hM4Di expression. In the revised manuscript we include a representative image were we performed immunostaining using an HA antibody to directly visualize hM4Di and confirm its expression in adult-born granule cells (Figure 5).

      (9) Line 267: "6-7 week old neurons by themselves do not influence either the performance of mice in the task". I don't think this is fair because this experiment wasn't designed with as much power to detect an effect. The group trends are in the same direction, but there are many fewer mice in this experiment (n=6/group) than in the =<7w experiment (n=11/group), where the effect just reached statistical significance.

      We are sorry for this confusion which came from an incorrect version. The experiment shown in Figure 6 does not target 6–7-week-old neurons specifically. It uses the same tamoxifen chow–based protocol as Figure 5, but with a shorter exposure (4 weeks vs. 7 weeks), thereby labeling a younger and more restricted cohort of adult-born DGCs. By contrast, Figure 4 targets a more mature population, consisting predominantly of ~5-week-old adult-born neurons as well as mature granule cells (Dock10+).

      We have corrected the paragraph accordingly and clarified the age range of the labeled populations in the revised manuscript.

    1. eLife Assessment

      This paper describes Unbend - a new method for measuring and correcting motions in cryo-EM images, with a particular emphasis on more challenging in situ samples such as lamellae and whole cells. The method, which fits a B-spline model using cross-correlation-based local patch alignment of micrograph frames, represents an important tool for the cryo-EM community. The authors elegantly use 2D template matching to provide convincing evidence that Unbend outperforms the previously reported method of Unblur by the same authors. Comparison to alternative programs for motion correction shows smaller gains, but with interesting differences between data sets.

    2. Reviewer #1 (Public review):

      Kong et al.'s work describes a new approach that does exactly what the title states, "Correction of local beam-induced sample motion in cryo-EM images using a 3D spline model." It is, therefore, a more elaborate approach than current methods in the field for the "movie alignment" stage. Additionally, the work uses 2DTM (2D Template Matching)-related measurements to quantify the improvement of the new method compared to other methods in the field. I find both parts very compelling (the new method and the testing approach)

      On a "focused" view, the strengths of the work rest on presenting a better approach for motion correction and on measuring their performance very well at the 2D level in a compelling manner

      On a more "general" view, the authors introduce the important notion that even one of the most worked-out steps in the processing workflow can still be done better in a measurable way, and that this could lead to better results beyond the 2DTM metrics used for testing, reflecting in better results along the processing pipeline (although the manuscript does not explore further this notion)

      On the "usability" side, the method is still CPU-based and is slower than standards in the field. This may pose significant limitations in practical work, although the authors are aware of this issue and are working on it.

    3. Reviewer #2 (Public review):

      Summary:

      The authors present a new method, Unbend, for measuring motion in cryo-EM images, with a particular emphasis on more challenging in situ samples such as lamella and whole cells (that can be more prone to overall motion and/or variability in motion across a field of view). Building on their previous approach of full-frame alignment (Unblur), they now perform full-frame alignment followed by patch alignment, and then use these outputs to generate a 3D model of the motion. This model allows them to estimate a continuous, per-pixel shift field for each movie frame that aims to better describe complex motions and so ultimately generate improved motion-corrected micrographs. Performance of Unbend is evaluated using the 2D template matching (2DTM) method developed previously by the lab, and results are compared to using full-frame correction alone and to the leading local motion correction methods. Several different in situ samples are used for evaluation covering a broad range that will be of interest to the rapidly growing in situ cryo-EM community.

      Strengths:

      The method appears an elegant way of describing complex motions in cryo-EM samples and the authors present sound data that Unbend generally improves SNR of aligned micrographs as well as increases detection of particles matching the 60S ribosome template when compared to using full-frame correction alone and since review to the leading local motion correction methods. The authors also give interesting insights into how different areas of a lamella behave with respect to motion by using Unbend on a montage dataset collected previously by the group. There is growing interest in imaging larger areas of in situ samples at high resolution and these insights contribute valuable knowledge. Additionally, the availability of data collected in this study through the EMPIAR repository will be much appreciated by the field.

      Weaknesses:

      A major weakness was comparing this method to full-frame approaches only but this has since been addressed by the authors during review and Unbend is compared to MotionCor2, 3, CryoSPARC and Warp. The improvements here are smaller, generally it seems to perform on par with the above methods, but there are significant gains for certain samples (e.g. the M. pneumoniae sample). A comment from this reviewer about using an adaptive approach to decide if/when to proceed to the full Unbend pipeline, over full-frame alone, has been addressed by the authors.

    4. Reviewer #3 (Public review):

      Summary

      Kong and coauthors describe and implement a method to correct local deformations due to beam induced motion in cryo-EM movie frames. This is done by fitting a 3D spline model to a stack of micrograph frames using cross-correlation-based local patch alignment to describe the deformations across the micrograph in each frame, and then computing the value of the deformed micrograph at each pixel by interpolating the undeformed micrograph at the displacement positions given by the spline model. A graphical interface in cisTEM allows the user to visualise the deformations in the sample, and the method is proved to be successful by showing improvements in 2D template matching (2DTM) results on the corrected micrographs using five in situ samples.

      Impact

      This method has great potential to further streamline the cryo-EM single particle analysis pipeline by shortening the required processing time as a result of obtaining higher quality particles early in the pipeline, and is applicable to both old and new datasets, therefore being relevant to all cryo-EM users.

      Strengths

      (1) The key idea of the paper is that local beam induced motion affects frames continuously in space (in the image plane) as well as in time (along the frame stack), so one can obtain improvements in the image quality by correcting such deformations in a continuous way (deformations vary continuously from pixel to pixel and from frame to frame) rather than based on local discrete patches only. 3D splines are used to model the deformations: they are initialised using local patch alignments and further refined using cross-correlation between individual patch frames and the average of the other frames in the same patch stack.

      (2) Another strength of the paper is using 2DTM to show that correcting such deformations continuously using the proposed method does indeed lead to improvements, as evidenced by the number of particles found and the quality of the detections (measured using 2DTM SNR). This is shown using five in situ datasets, where local motion is quantified using statistics based on the estimated motions of ribosomes. The same analysis is performed using other deformation correction tools, with Unbend showing superior performance in terms of particle detected or 2DTM SNR of the detections.

    5. Author Response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their constructive comments. A central concern raised is the comparison of performance with existing motion-correction methods. In response, we performed motion correction using several widely used approaches and compared results using the number of particles detected by 2DTM and their associated SNR. To minimize potential bias, we selected parameters to give each method a comparable level of model flexibility so that the results are as directly comparable as possible. Overall, Unbend performs the best. We note that extensive, method-specific parameter optimization could further affect absolute performance, and a comprehensive benchmarking study is therefore beyond the scope of this work

      Public Reviews:

      Reviewer #1 (Public review):

      Kong et al.'s work describes a new approach that does exactly what the title states: "Correction of local beam-induced sample motion in cryo-EM images using a 3D spline model." I find the method appropriate, logical, and well-explained. Additionally, the work suggests using 2DTM-related measurements to quantify the improvement of the new method compared to the old one in cisTEM, Unblur. I find this part engaging; it is straightforward, accurate, and, of course, the group has a strong command of 2DTM, presenting a thorough study.

      However, everything in the paper (except some correct general references) refers to comparisons with the full-frame approach, Unblur. Still, we have known for more than a decade that local correction approaches perform better than global ones, so I do not find anything truly novel in their proposal of using local methods (the method itself- Unbend- is new, but many others have been described previously). In fact, the use of 2DTM is perhaps a more interesting novelty of the work, and here, a more systematic study comparing different methods with these proposed well-defined metrics would be very valuable. As currently presented, there is no doubt that it is better than an older, well-established approach, and the way to measure "better" is very interesting, but there is no indication of how the situation stands regarding newer methods.

      Regarding practical aspects, it seems that the current implementation of the method is significantly slower than other patch-based approaches. If its results are shown to exceed those of existing local methods, then exploring the use of Unbend, possibly optimizing its code first, could be a valuable task. However, without more recent comparisons, the impact of Unbend remains unclear.

      We thank the reviewer for this important point. We agree that comparing against modern local motion-correction approaches is a valuable task. To address this, we added a new benchmarking section (pp. 17–18, lines 444–492, Fig. 8, Fig. 8—figure supplement 1) that compares Unbend against widely used patch-based local correction methods, including MotionCor2, MotionCor3, Warp, and CryoSPARC. Using the same 2DTM-based metrics described in the manuscript (detections per micrograph and SNR distributions for commonly detected particles), we find that Unbend provides the most stable performance across the tested datasets and, in most cases, yields higher detection counts and higher SNR than the alternative methods.

      Regarding runtime, the current implementation is CPU-based and is therefore slower than some optimized GPU-accelerated packages. We now clarify this limitation in the manuscript (line 498–499). Our primary goal in this study is to improve motion-correction accuracy and quantify its impact using 2DTM-based measures. Importantly, higher-quality motion-corrected micrographs can reduce downstream processing cost (e.g., by increasing particle detection efficiency and reducing ambiguous candidates), so modest additional compute times at the motion-correction stage can be offset later in the workflow. We also note that GPU acceleration and additional code-level optimizations are planned for future releases (line 501-503); however, they are not required to evaluate the methodological contribution and the benchmarking results presented here.

      Reviewer #2 (Public review):

      Summary:

      The authors present a new method, Unbend, for measuring motion in cryo-EM images, with a particular emphasis on more challenging in situ samples such as lamella and whole cells (that can be more prone to overall motion and/or variability in motion across a field of view). Building on their previous approach of full-frame alignment (Unblur), they now perform full-frame alignment followed by patch alignment, and then use these outputs to generate a 3D cubic spline model of the motion. This model allows them to estimate a continuous, per-pixel shift field for each movie frame that aims to better describe complex motions and so ultimately generate improved motion-corrected micrographs. Performance of Unbend is evaluated using the 2D template matching (2DTM) method developed previously by the lab, and results are compared to using full-frame correction alone. Several different in situ samples are used for evaluation, covering a broad range that will be of interest to the rapidly growing in situ cryo-EM community.

      Strengths:

      The method appears to be an elegant way of describing complex motions in cryo-EM samples, and the authors present convincing data that Unbend generally improves SNR of aligned micrographs as well as increases detection of particles matching the 60S ribosome template when compared to using full-frame correction alone. The authors also give interesting insights into how different areas of a lamella behave with respect to motion by using Unbend on a montage dataset collected previously by the group. There is growing interest in imaging larger areas of in situ samples at high resolution, and these insights contribute valuable knowledge. Additionally, the availability of data collected in this study through the EMPIAR repository will be much appreciated by the field.

      Thank you for this positive assessment.

      Weaknesses:

      While the improvements with Unbend vs. Unblur appear clear, it is less obvious whether Unbend provides substantial gains over patch motion correction alone (the current norm in the field). It might be helpful for readers if this comparison were investigated for the in situ datasets. Additionally, the authors are open that in cases where full motion correction already does a good job, the extra degrees of freedom in Unbend can perhaps overfit the motions, making the corrections ultimately worse. I wonder if an adaptive approach could be explored, for example, using the readout from full-frame or patch correction to decide whether a movie should proceed to the full Unbend pipeline, or whether correction should stop at the patch estimation stage.

      We thank the reviewer for suggesting an adaptive criterion to decide whether to proceed patch alignment or not. We agree that such an approach could be valuable for efficiency and for avoiding unnecessary model flexibility. However, our results indicate that a simple criterion based on the magnitude of estimated local patch motion is unlikely to be sufficient. For example, in the BS-C-1 cell lysate dataset, (see line 412-417 on page 16), we observe minimal local motion (Figure 4b) with mean patch shifts of only 0.7Å and full-frame alignment already yields comparable detection counts, yet local correction still produces a measurable SNR gain (13.84 ± 0.04 to 14.25 ± 0.04, 3%) and improves SNR for ~70% of the commonly detected targets (Figure 6c). This suggests that residual local distortion can remain even when overall local motion appears small. Establishing a robust, dataset-agnostic stopping rule would therefore require a dedicated, systematic benchmarking study across many samples and acquisition conditions.

      Reviewer #3 (Public review):

      Summary

      Kong and coauthors describe and implement a method to correct local deformations due to beam-induced motion in cryo-EM movie frames. This is done by fitting a 3D spline model to a stack of micrograph frames using cross-correlation-based local patch alignment to describe the deformations across the micrograph in each frame, and then computing the value of the deformed micrograph at each pixel by interpolating the undeformed micrograph at the displacement positions given by the spline model. A graphical interface in cisTEM allows the user to visualise the deformations in the sample, and the method has been proven to be successful by showing improvements in 2D template matching (2DTM) results on the corrected micrographs using five in situ samples.

      Impact

      This method has great potential to further streamline the cryo-EM single particle analysis pipeline by shortening the required processing time as a result of obtaining higher quality particles early in the pipeline, and is applicable to both old and new datasets, therefore being relevant to all cryo-EM users.

      Strengths

      (1) One key idea of the paper is that local beam induced motion affects frames continuously in space (in the image plane) as well as in time (along the frame stack), so one can obtain improvements in the image quality by correcting such deformations in a continuous way (deformations vary continuously from pixel to pixel and from frame to frame) rather than based on local discrete patches only. 3D splines are used to model the deformations: they are initialised using local patch alignments and further refined using cross-correlation between individual patch frames and the average of the other frames in the same patch stack.

      (2) Another strength of the paper is using 2DTM to show that correcting such deformations continuously using the proposed method does indeed lead to improvements. This is shown using five in situ datasets, where local motion is quantified using statistics based on the estimated motions of ribosomes.

      Thank you for this positive assessment.

      Weaknesses

      (1) While very interesting, it is not clear how the proposed method using 3D splines for estimating local deformations compares with other existing methods that also aim to correct local beam-induced motion by approximating the deformations throughout the frames using other types of approximation, such as polynomials, as done, for example MotionCor2.

      We thank the reviewer for this suggestion. We agree that positioning Unbend relative to existing local motion-correction methods is important. In the revised manuscript, we added a dedicated benchmarking section comparing Unbend with widely used local correction approaches, including MotionCor2, MotionCor3, Warp, and CryoSPARC, using the same 2DTM-based metrics (Fig. 8, Fig. 8—figure supplement 1). This section is included on pp. 17–18, lines 444–492. To make the comparison as fair as possible, we matched nominal model flexibility across methods and otherwise used default parameters to reduce method-specific tuning. This expanded comparison provides a direct baseline against current patch-/spline-based approaches and shows that Unbend performs consistently across the in situ datasets evaluated here, with improvements in detection counts and/or SNR in multiple cases.

      (2) The use of 2DTM is appropriate, and the results of the analysis are enlightening, but one shortcoming is that some relevant technical details are missing. For example, the 2DTM SNR is not defined in the article, and it is not clear how the authors ensured that no false positives were included in the particles counted before and after deformation correction. The Jupyter notebooks where this analysis was performed have not been made publicly available.

      We agree that these technical details improve clarity and reproducibility. We have therefore made three changes.

      (1) Definition of 2DTM SNR. We added an explicit definition of the 2DTM SNR in Section “2DTM provides a one-step verification for motion correction”, pp. 11, lines 277–287). Briefly, at each image location we compute cross-correlation values over the searched orientation space and define the 2DTM SNR as the maximum per location z-score across orientations.

      (2) False-positive control / detection threshold. We clarified how detection thresholds were set to control false positives (pp. 11, lines 285–287). Specifically, we used the standard 2DTM statistical framework in which the threshold  is chosen using the one-false-positive (1-FP) criterion (or equivalently, a specified expected false-positive rate). We applied the same thresholding procedure consistently across all motion-corrected micrographs. This ensures that particle counts before/after correction reflect changes in signal recovery.

      (3) Reproducibility of the analysis. We have made the script used for the benchmarking and figure generation publicly available (pp. 24 line 622-623), and we provide a link in the Data Availability statement (pp. 25 line 650). The repository includes sample .star files and a python package that computes detections per micrograph, commonly detected particles, and SNR comparisons.

      (3) It is also not clear how the proposed deformation correction method is affected by CTF defocus in the different samples (are the defocus values used in the different datasets similar or significantly different?) or if there is any effect at all.

      We thank the reviewer for raising this point. In the revised manuscript, we now report the defocus ranges used for each dataset (Table 1) and clarify that all motion-correction comparisons were performed within each dataset using the same CTF estimation and 2DTM settings (pp. 23 line 615-618). Across the five datasets, four were collected at similar defocus ranges (1.0 µm to 1.5µm), whereas one dataset includes near-focus (0.4 µm) micrographs (Table 1). Because Unbend operates on frame alignment/warping rather than CTF modeling, we do not expect a defocus specific effect beyond indirect influences through image SNR and reliability of cross-correlation-based alignment.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The obvious recommendation would be to use their 2DTM approach for a comparison of their new method with other currently used ones

      We agree and added a new comparison section (pp. 17–18, lines 444–492). Addressed above in Response to Reviewer #1 Public Review.

      Reviewer #2 (Recommendations for the authors):

      (1) Line 29, typo. 3 ~ 8% > 3 - 8%.

      Corrected.

      (2) Lines 220 and 226. Should this be e-/Angstrom squared for the exposure?

      Corrected to e<sup>-</sup>/Å<sup>2</sup> (Now pp. 9 lines 230, 236).

      (3) Figure 2 c-d. These are good for instinctively seeing the movement, but I found the legend confusing, as a 10 x 10 pixel array is mentioned, yet the schematics show a higher sampling (30 x 30 pixels? in c-e).

      Thank you for pointing this out. The “10×10” annotation refers to the physical scale, whereas the grid represents pixel sampling. We removed the “10×10” label and now show only the pixel grid to avoid confusion. The caption has been updated to state that the grid corresponds to a 30×30 pixel sampling. (Fig. 2c, d; pp. 31, line 766)

      (4) Figure 4. It would be good if the n of movies analyzed was given in the figure legend.

      Thank you for noticing this. We report the number of movies per dataset in the corresponding summary table (Table 1).

      (5) Figure 5. X/Y axes labels missing (assume pixels). Also, suggest changing the strain scale to % to match the main text description of this figure.

      We added X/Y axis labels, changed the strain scale to % (Figure 5), and specified that the strains are per pixel on pp. 14 line 367. Correspondingly, the X/Y labels and strain scale in strain plots in Figure 4—figure supplementary 1 to 5 are also changed.

      (6) Unify labelling of Figure 4 and 6 (i.e., Bacteria vs. M. pneumoniae, etc.).

      Corrected. Sample labels are now consistent across figures. (Figures 4 and 6)

      Reviewer #3 (Recommendations for the authors):

      Some recommendations related to the points mentioned in the 'Weaknesses' section in the public review:

      (1) If feasible, it would be useful to see a comparison with other existing methods that estimate local deformations (e.g., MotionCor2), at least on some of the datasets. For example, does the proposed method lead to better 2DTM SNR in the detected particles compared to other methods, or higher detection numbers? Alternatively, if such a comparison would require too much additional work and the authors have good reasons to believe that the results are evident, it would be helpful to include a discussion about why the proposed method is expected to perform better, both in terms of the general approach and specific implementation details.

      We agree that this comparison is important. (pp. 17–18, lines 444–492). Addressed above in Response to Reviewer #3 Public Review (1).

      (2) It would be useful to define the 2DTM SNR in the main text of the paper, as well as to address the point about false positives in the picked particles.

      We added an explicit definition of 2DTM SNR and clarified the detection thresholding/false-positive control used in our analysis (pp. 11, lines 277–287). Addressed above in Response to Reviewer #3 Public Review (2.1 and 2.2).

      (3) Regarding the results shown in Figures 4 and 6: do the authors have any insight about how the CTF defocus affects the deformation estimation and correction across the different sample types?

      We now report the defocus ranges used for each dataset (Table 1). We have addressed this problem in Response to Reviewer #3 Public Review (3).

      (4) Will the Jupyter notebooks used for the 2DTM analysis be made publicly available?

      Yes. We have deposited a python script used for the 2DTM benchmarking and figure generation in a public repository and added the link in Data Availability statement. (pp. 23 line 622, pp. 25 line 650). Addressed above in Response to Reviewer #3 Public Review (2.3).

      (5) I would also appreciate a few words about the implementation details of the 3D spline model (e.g., what libraries have been used, if any, or if the authors have implemented their own code for this).

      The 3D spline model and warping code were implemented by us (no external spline library was used) and the relevant implementation details are described in the “Sample distortion modeling and correction” section (pp. 7–10, lines 174–246). For optimization, we used the L-BFGS implementation provided by the dlib library, which is now explicitly cited (pp. 10, line 264).

      Some comments regarding the presentation of the work:

      (1) I found the mathematical background on splines on pages 7-9 a little distracting from the main ideas of the paper, and I believe it could be moved to the methods section. A short description of this in the main text of the paper would suffice, and it would be useful to state clearly when this is background material and when it is the authors' contribution.

      We appreciate the suggestion. Because Unbend includes an in-house spline implementation (no external spline library) and it is the central part of this work, we retained the spline description to support reproducibility. (pp. 7–10, lines 174–246).

      (2) More generally, I found the whole method very interesting, but understanding exactly what all the steps involved were was a bit cumbersome, as they are spread across different sections of the main text. I think it would be useful to have a dedicated section giving the exact steps taken in the algorithm, possibly pointing to the relevant section in the text for more details about each step. This could be, for example, in the form of an 'Algorithm' box or a flowchart.

      We added an Algorithm box as Figure 2 supplement summarizing the end-to-end workflow and pointing to the relevant sections for details (Figure 2—figure supplement 1 Algorithm, pp. 4, line 96–103, pp. 32 line 799). This is intended to make the sequence of steps easier to follow.

      (3) In Figure 3, panels (b) and (c), the difference between the two micrographs, before and after correction, is not very noticeable, particularly the Thon rings in the spectra. I don't know if this is due to the image quality in the paper or if a better example could be shown. For example, the differences are clear in some of the supplementary figures.

      Thank you for the suggestion. We revised the figure by adding annotations to show the recovered Thon rings. This figure shows a vertex motion and is intended not only to show improvement but also to illustrate complex, spatially varying deformation patterns that motivate the 3D spline model (pp. 12, lines 304–308). The supplementary figures display those with highest motions in each sample type, thus the Thon rings for the motion corrected micrograph in higher frequency space look more obvious. We also refer readers to the supplementary examples where the differences are more pronounced (pp. 12, lines 310–312).

    1. eLife Assessment

      This is a valuable study that integrates behavioral and molecular approaches to identify neuromodulators influencing blood-feeding behavior in the disease vector Anopheles stephensi. Through gene expression analyses across blood-seeking life stages and RNA interference experiments, the authors present solid evidence that co-knockdown of the neuromodulators short Neuropeptide F and RYamide affects blood-seeking states in A. stephensi. However, evidence demonstrating that these neuropeptides are sufficient to promote host-seeking is lacking.

    2. Reviewer #2 (Public review):

      Summary:

      In this study, Bansal et al examine and characterize feeding behaviour in Anopheles stephensi mosquitoes. While sharing some similarities to the well-studied Aedes aegypti mosquito, the authors demonstrate that mated-females, but not unmated (virgin) females, exhibit suppression in their blood-feeding behaviour after imbibing an initial bloodmeal. Using brain transcriptomic analysis comparing sugar fed, blood fed and starved mosquitoes, several candidate genes potentially responsible for influencing blood-feeding behaviour were identified, including two neuropeptides (short NPF and RYamide) that are known to modulate feeding behaviour in other mosquito species. Using molecular tools including in situ hybridization, the authors map the distribution of cells producing these neuropeptides in the nervous system and in the gut. Further, by implementing systemic RNA interference (RNAi), the study suggests that both neuropeptides (particularly in the brain, but not in the abdomen since knockdown outside the brain did not affect feeding behaviour) appear to promote blood-feeding while having no impact on sugar feeding. Interestingly, when either of these two neuropeptide gene transcripts were reduced independently by RNAi, the proportion of females acquiring a blood meal was not affected, whereas simultaneous knockdown of both sNPF and RYa led to a reduction in blood feeding behaviour but did not impact sugar feeding.

      Given that the expression of both neuropeptide genes was found in mostly in non-overlapping brain neurons, this suggests that these two neuropeptides may elicit at least partially complementary actions promoting blood feeding in A. stephensi. Indeed, their putative receptors appear to be colocalized within several neurons within the brain, which could explain why knockdown of both sNPF and RYa transcripts was required to affect blood feeding behaviour (although authors could not confirm if either of these neuropeptides act independently as only partial knockdown was achieved in the brain). Finally, while sNPF was mapped to brain neurons and midgut enteroendocrine cells, the authors mapped RYa only in the brain while reporting expression in the abdomen by qPCR, but that was not localized to the midgut EECs (like sNPF). Therefore, the source of RYamide in the abdomen remains unknown in this mosquito species, but could involve the abdominal ganglia where this neuropeptide has been localized in Ae. aegypti.

      Strengths and/or weaknesses:

      Overall, the manuscript was effectively communicated. Previous concerns and requested clarifications have been addressed in the revised manuscript. While advanced cell-specific tools are lacking in this mosquito species, one weakness here is that peptides could have been applied ectopically in attempts to rescue the deficit in blood feeding behaviour following knockdown by RNAi. Further insight in this regard may be provided in future studies by this and other research groups.

      Reviewing editor comment:

      Inclusion of a schematic in Supplementary Figure S9B addresses the point raised by reviewer 1 in the previous round.

    3. Author Response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Here Bansal et al., present a study on the fundamental blood and nectar feeding behaviors of the critical disease vector, Anopheles stephensi. The study encompasses not just the fundamental changes in blood feeding behaviors of the crucially understudied vector, but then use a transcriptomic approach to identify candidate neuromodulation path ways which influence blood feeding behavior in this mosquito species. The authors then provide evidence through RNAi knockdown of candidate pathways that the neuromodulators sNPF and Rya modulate feeding either via their physiological activity in the brain alone or through joint physiological activity along the brain-gut axis (but critically not the gut alone). Overall, I found this study to be built on tractable, well-designed behavioral experiments.

      Their study begins with a well-structured experiment to assess how the feeding behaviors of A. stephensi changes over the course of its life history and in response to its age, mating and oviposition status. The authors are careful and validate their experimental paradigm in the more well-studied Ae. aegypti, and are able to recapitulate the results of prior studies which show that mating is pre-requisite for blood feeding behaviors in Ae. aegypt. Here they find A. stephensi like another Anopheline mosquitoes has a more nuanced regulation of its blood and nectar feeding behaviors.

      The authors then go on to show in a Y- maze olfactometer that to some degree, changes in blood feeding status depend on behavioral modulation to host-cues, and this is not likely to be a simple change to the biting behaviors alone. I was especially struck by the swap in valence of the host-cues for the blood-fed and mated individuals which had not yet oviposited. This indicates that there is a change in behavior that is not simply desensitization to host-cues while navigating in flight, but something much more exciting happening.

      The authors then use a transcriptomic approach to identify candidate genes in the blood feeding stages of the mosquito's life cycle to identify a list of 9 candidates which have a role in regulating the host-seeking status of A. stephensi. Then through investigations of gene knockdown of candidates they identify the dual action of RYa and sNPF and candidate neuromodulators of host-seeking in this species. Overrall, I found the experiments to be welldesigned. I found the molecular approach to be sound. While I do not think the molecular approach is necessarily an all-encompassing mechanism identification (owing mostly to the fact that genetic resources are not yet available in A. stephensi as they are in other dipteran models), I think it sets up a rich lines of research questions for the neurobiology of mosquito behavioral plasticity and comparative evolution of neuromodulator action.

      Strengths:

      I am especially impressed by the authors' attention to small details in the course of this article. As I read and evaluated this article I continued to think how many crucial details I may have missed if I were the scientist conducting these experiments. That attention to detail paid off in spades and allowed the authors to carefully tease apart molecular candidates of blood-seeking stages. The authors top down approach to identifying RYamide and sNPF starting from first principles behavioral experiments is especially comprehensive. The results from both the behavioral and molecular target studies will have broad implications for the vectorial capacity of this species and comparative evolution of neural circuit modulation.

      I believe the authors have adequately addressed all of my concerns; however, I think an accompanying figure to match the explained methods of the tissue-specific knockdown would help readers. The methods are now explicitly written for the timing and concentrations required to achieve tissue-specific knockdown, but seeing the data as a supplement would be especially reassuring given the critical nature of tissue-specific knockdown to the final interpretations of this paper.

      We thank the reviewer for the suggestion and have now incorporated a schematic in the supplementary figure S9B, explaining our methodology for achieving tissue-specific knockdowns.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Bansal et al examine and characterize feeding behaviour in Anopheles stephensi mosquitoes. While sharing some similarities to the well-studied Aedes aegypti mosquito, the authors demonstrate that mated-females, but not unmated (virgin) females, exhibit suppression in their blood-feeding behaviour. Using brain transcriptomic analysis comparing sugar fed, blood fed and starved mosquitoes, several candidate genes potentially responsible for influencing blood-feeding behaviour were identified, including two neuropeptides (short NPF and RYamide) that are known to modulate feeding behaviour in other mosquito species. Using molecular tools including in situ hybridization, the authors map the distribution of cells producing these neuropeptides in the nervous system and in the gut. Further, by implementing systemic RNA interference (RNAi), the study suggests that both neuropeptides appear to promote blood-feeding (but do not impact sugar feeding) although the impact was observed only after both neuropeptide genes underwent knockdown.

      While the authors have addressed most of the concerns of the original manuscript, a few issues remain. Particularly, the following two points:

      (5) Figure 4

      The authors state that there is more efficient knockdown in the head of unfed females; however, this is not accurate since they only get knockdown in unfed animals, and no evidence of any knockdown in fed animals (panel D). This point should be revised in the results test as well.

      Perhaps we do not understand the reviewer's point or there has been a misunderstanding. In Figure 4D, we show that while there is more robust gene knockdown in unfed females, bloodfed females also showed modest but measurable knockdowns ranging from 5-40% for RYamide and 2-21% for sNPF.

      NEW-

      In both the dsRNA treatments where animals were fed, neither was significantly different from control. Therefore, there is no change, and indeed this is confirmed by the author's labelling of the figure stats in panel 4D.

      We agree with the reviewer and thank them for pointing it out. We have now revised the figure legend and the text to reflect these results (see lines 351-354).

      In addition, do the uninjected and dsGFP-injected relative mRNA expression data reflect combined RYa and sNPF levels? Why is there no variation in these data,...

      In these qPCRs, we calculated relative mRNA expression using the delta-delta Ct method (see line 975). For each neuropeptide its respective control was used. For simplicity, we combined the RYa and sNPF control data into a single representation. The value of this control is invariant because this method sets the control baseline to a value of 1.

      NEW-

      The authors are claiming that there is no variation between individual qPCR experiments (particularly in their controls)? Normally, one uses a known standard value (or calibrator) across multiple experiments/plates so that variation across biological replicates can be assessed. This has an impact on statistical analyses since there is no variation in the control data. Indeed, this impacts all figures/datasets in the manuscript where qPCR data is presented. All the controls have zero variation!

      We are truly thankful to this reviewer for insisting on this point. It has made us revisit what we thought we understood and now realise were doing wrong (though many in literature do it this way!). We were – incorrectly – setting each control to 1 and calculating relative fold changes for each replicate independently. While this is often seen in literature, we now realise that it is incorrect. We have revisited all our analyses and normalized all samples to the mean ΔCt of the control group, which captures biological variation in both control and experimental groups. All data are now re-plotted to show individual data points for both control and experimental groups, and the error bars on controls represent the biological variation across replicates (Figure 4D, 4F, 4G, S8, S9). Statistical analyses were also revised accordingly, and, importantly, they do not change any conclusions. Please note that the abdominal expression of sNPF and RYa are so low that the controls show very variable baseline expression values.

      Reviewer #3 (Public review):

      Summary:

      This manuscript investigates the regulation of host-seeking behavior in Anopheles stephensi females across different life stages and mating states. Through transcriptomic profiling, the authors identify differential gene expression between "blood-hungry" and "blood-sated" states. Two neuropeptides, sNPF and RYamide, are highlighted as potential mediators of host-seeking behavior. RNAi knockdown of these peptides alters host-seeking activity, and their expression is anatomically mapped in the mosquito brain (sNPF and RYamide) and midgut (sNPF only).

      Strengths:

      (1) The study addresses an important question in mosquito biology, with relevance to vector control and disease transmission.

      Transcriptomic profiling is used to uncover gene expression changes linked to behavioral states.

      (2) The identification of sNPF and RYamide as candidate regulators provides a clear focus for downstream mechanistic work.

      (3) RNAi experiments demonstrate that these neuropeptides are necessary for normal hostseeking behavior.

      (4) Anatomical localization of neuropeptide expression adds depth to the functional findings.

      Weaknesses:

      (1) The title implies that the neuropeptides promote host-seeking, but sufficiency is not demonstrated and some conclusions appear premature based on the current data. The support for this conclusion would be strengthened with functional validation using peptide injection or genetic manipulation.

      (2) The identification of candidate receptors is promising, but the manuscript would be significantly strengthened by testing whether receptor knockdowns phenocopy peptide knockdowns. Without this, it is difficult to conclude that the identified receptors mediate the behavioral effects.

      (3) Some important caveats, such as variation in knockdown efficiency and the possibility of offtarget effects, are not adequately discussed.

      These comments were addressed in the previous round.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Awesome paper everyone. A delight to read and review.

      Thank you very much! We appreciated your comments too!

    1. eLife Assessment

      This study presents valuable findings and employs modern analytical approaches on how transient absence of visual input (darkness) affects tactile encoding in the rat somatosensory cortex (S1). The evidence supporting the authors' claims is solid, as population-level neural activity recorded in S1 and decoded by a CNN carries more discriminable texture information in darkness. The underlying basis of this effect remains only partly resolved, however, because it is still unclear which neural features from the CNN drive the decoding and if visual interference is appropriately accounted for, which might confound true neural representational change.

    2. Reviewer #1 (Public review):

      Summary:

      The authors aimed to investigate how short-term visual deprivation influences tactile processing in the primary somatosensory cortex (S1) of sighted rats. They justify the study based on previous studies that have shown that long-term blindness can enhance tactile perception, and aim to investigate the change in neural representations underlying rapid, short-term cross-modal effects. The authors recorded local field potentials from S1 as rats encountered different tactile textures (smooth and rough sandpaper) under light and dark conditions. They used deep learning techniques to decode the neural signals and assess how tactile representations changed across the four different conditions. Their goal was to uncover whether the absence of visual cues leads to a rapid reorganization of tactile encoding in the brain.

      Strengths:

      The study effectively integrates high-density local field potential (LFP) recordings with convolutional neural network (CNN) analysis. This combination allows for decoding high-dimensional population-level signals, revealing changes in neural representations that traditional analyses (e.g., amplitude measures) failed to detect. The custom treadmill paradigm permits independent manipulation of visual and tactile inputs under stable locomotion conditions. Gait analysis confirms that motor behavior was consistent across conditions, strengthening the conclusion that neural changes are due to sensory input rather than movement artifacts.

      Weaknesses:

      (1) While the study interprets the emergence of more distinct texture representations in the dark as evidence of rapid cross-modal plasticity, the claim rests on correlational data from a short-term manipulation and decoding analysis. The authors show that CNN-derived feature embeddings cluster more clearly by texture in the dark, but this does not directly demonstrate plasticity in the classical sense (e.g., synaptic or circuit-level reorganization). The authors have noted this as a limitation and have clarified that the observed changes reflect functional reorganization rather than structural plasticity.

      (2) Although gait was controlled, changes in arousal or exploratory behavior in light versus dark conditions might play a role in the observed neural differences. The authors have controlled for various factors in relation to locomotion, but future studies would benefit from more direct behavioural readouts of arousal states (e.g., via pupillometry or cortical state indicators).

      (3) It should be noted that the time course of the observed changes (within 10 minutes) is quite rapid, and while intriguing, the study does not include direct evidence that the underlying circuits were reorganized-only that population-level signals become more discriminable. The authors have adequately discussed this as an avenue for more mechanistic future research.

      (4) The authors have adequately discussed that, while these findings are consistent with somatotopy and context-dependent dynamics, they do not provide strong independent evidence for novel spatial or temporal organization.

      (5) The authors have also discussed that, while the neural data suggest enhanced tactile representations, the study does not assess whether rats' actual tactile perception improved. Future studies including an assessment of a behavioral readout (e.g., discrimination accuracy), would be insightful.

      (6) The authors' discussion about the implications for sensory rehabilitation, including Braille training and haptic feedback enhancement was a bit premature, but they have amended this, and it remains an interesting translational potential to be explored in future studies.

      (7) While the CNN showed good performance, more transparent models (e.g., linear classifiers or dimensionality reduction) appear to not exceed chance level. The implications of this are that there is an underlying complex structure in the LFPs that has yet to be fully uncovered, on the mechanistic level. This would be important to push the findings forward in future studies.

      Therefore, while the authors raise interesting hypotheses around rapid plasticity, somatotopic dynamics, and rehabilitation, the evidence for each is indirect. Stronger claims will require future causal experiments, behavioral readouts, and mechanistic specificity beyond what the current data provides. However, the work represents an interesting starting point to a more mechanistic understanding in the future.

    3. Reviewer #2 (Public review):

      Summary:

      Yamashiro et al. investigated how transient absence of visual input (i.e. darkness) impacts tactile neural encoding in the rat primary somatosensory cortex (S1). They recorded local field potentials (LFPs) using a 32-channel array implanted in forelimb and hindlimb primary somatosensory cortex while rats walked on smooth or rough textures under illuminated and dark conditions. Employing a convolutional neural network (CNN), they successfully decoded both texture and lighting conditions from the LFPs. The authors conclude that the subtle differences in LFP patterns underlie tactile representation surface roughness and become more distinct in darkness, suggesting a rapid cross-modal reorganization of the neural code for this sensory feature.

      Strengths:

      • The manuscript addresses a valuable question regarding how sensory cortices dynamically adapt to changes in sensory context.<br /> • The use of machine learning (CNNs) enables the analysis to go beyond conventional amplitude-based metrics, potentially uncovering subtle but meaningful effects.<br /> • The authors have substantially improved the manuscript with clearer figures, additional statistical analyses (including permutation tests and cross-validation), and greater methodological transparency.

      Weaknesses:

      • The new analyses (grand-average LFPs, correlation maps, wavelet decompositions, attribution-score correlations) improve transparency but do not yet clarify which specific neural features the CNN exploits, leaving the central interpretability question unresolved.<br /> • A plausible alternative explanation for the increased discriminability in darkness remains insufficiently ruled out: visually driven activity in the light condition (e.g., ambient illumination changes or self-motion-induced visual input) could contaminate S1 LFPs and account for the effect without reflecting a true neural representational change.<br /> • Behavioural and order controls have been improved but remain somewhat limited in sample size.

      Overall assessment:

      The revised manuscript is clearer, more transparent, and technically strengthened. However, the true nature of the signal changes underlying the observed differences in discriminability remains unclear, limiting the scientific strength of the conclusions. The possibility that visual interference contributes to the observed effects remains a plausible and untested alternative interpretation. Additional experiments or analyses quantifying visually evoked activity in S1 would be required to confirm the claim of genuine reorganization of neural representation depending on the illumination condition.

    4. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) While the study interprets the emergence of more distinct texture representations in the dark as evidence of rapid cross-modal plasticity, the claim rests on correlational data from a short-term manipulation and decoding analysis. The authors show that CNN-derived feature embeddings cluster more clearly by texture in the dark, but this does not directly demonstrate plasticity in the classical sense (e.g., synaptic or circuit-level reorganization).

      Thank you for this insightful comment. We acknowledge that our claim of “rapid cross-modal plasticity” is based on correlational evidence and does not directly address synaptic or circuit-level reorganization, which would require more invasive methods. Our study instead focuses on changes in the representational structure of tactile stimuli when visual input is temporarily removed, highlighting the adaptability of sensory coding to environmental context. We agree that this distinction is important and have revised the manuscript to clarify that the observed changes reflect functional reorganization rather than structural plasticity, as indicated by the enhanced separability of texture representations in S1 during darkness.

      (2) Although gait was controlled, changes in arousal or exploratory behavior in light versus dark conditions might contribute to the observed neural differences. These factors are acknowledged but not directly measured (e.g., via pupillometry or cortical state indicators).

      Thank you for your insightful comment. We agree that arousal and exploratory behavior could influence neural differences and have considered these factors in our study. While gait was controlled, we did not directly measure arousal (e.g., via pupillometry or cortical indicators).

      To partially address this, we reviewed locomotor-speed traces (Supplementary Figure 1), which showed no significant differences between light and dark conditions, suggesting movement speed did not drive the neural differences. We also reversed the order of light and dark conditions, and although the separability of textures was not significantly different, it further supports that motivation did not confound our results.

      However, we acknowledge that arousal may still affect cortical dynamics, especially in the dark condition, where the lack of visual input might alter exploratory behavior. Due to technical limitations, we could not directly measure arousal states, and this is now discussed in the revised manuscript. While we cannot rule out the influence of arousal, the enhanced separability of texture representations suggests that sensory reorganization due to visual deprivation likely played a substantial role.

      (3) Moreover, the time course of the observed changes (within 10 minutes) is quite rapid, and while intriguing, the study does not include direct evidence that the underlying circuits were reorganized - only that population-level signals become more discriminable. As such, the term "plasticity" may overstate the conclusions and should be interpreted with caution unless validated by additional causal or longitudinal data.

      Thank you for your important comment. We agree that the term "plasticity" may overstate our conclusions, as our study focuses on population-level signal changes rather than direct evidence of circuit-level reorganization.

      To address this, we have revised the manuscript to clarify that while the observed changes in neural separability suggest functional reorganization of sensory representations, they do not confirm structural plasticity. We have updated the wording throughout the manuscript to emphasize that these findings reflect functional reorganization in response to short-term visual input loss, rather than structural or long-term plasticity.

      We also updated the discussion to highlight the need for future research with more invasive approaches to validate the causal mechanisms behind these rapid changes in neural dynamics.

      (4) The study highlights the forelimb region of S1 and a post-contact temporal window as particularly important for decoding texture, based on occlusion and integrated gradient analyses. However, this finding may be somewhat circular: The LFPs were aligned to forelimb contact, and the floor textures were sensed primarily via the forelimbs, making it unsurprising that forelimb electrodes were most informative. The observed temporal window corresponds directly to the event-aligned epoch, and while it may shift slightly in duration in the dark, this could reflect general differences in sensory gain or arousal, rather than changes in stimulus-specific encoding. Thus, while these findings are consistent with somatotopy and context-dependent dynamics, they do not provide strong independent evidence for novel spatial or temporal organization.

      Thank you for your insightful comment. We understand your concern that the finding of forelimb electrodes being most informative might seem circular, given that the LFPs were aligned to forelimb contact, and the floor textures were primarily sensed by the forelimbs. This design choice was intentional, as the task focused on texture perception through the forelimb, and the forelimb subregion of S1 is naturally expected to play a dominant role in this process. While this somatotopic specificity may make the results predictable, our aim was to emphasize the changes in temporal dynamics of neural processing under visual deprivation.

      We observed a shift in the temporal window's duration in the dark condition, which we interpret as a change in how texture information is processed without visual input. While this could reflect sensory gain or arousal differences, the lack of significant differences in locomotor speed or other behavioral measures (Supplementary Figure 1) suggests that these changes are more likely due to functional reorganization of sensory processing.

      We have clarified in the discussion that the shift in the temporal window is consistent with previous research on sensory reorganization involving both spatial and temporal cortical adjustments. While we do not claim novel spatial or temporal organization, we emphasize that the shift in temporal dynamics suggests adaptation in encoding strategy for texture perception in the absence of visual input. Future studies measuring arousal states (e.g., pupil diameter or cortical state markers) would help distinguish the contributions of arousal versus sensory reorganization to these dynamics.

      (5) While the neural data suggest enhanced tactile representations, the study does not assess whether rats' actual tactile perception improved. Without a behavioral readout (e.g., discrimination accuracy), claims about perceptual enhancement remain speculative.

      Thank you for raising this important point. We agree that while the neural data suggest enhanced separability of tactile representations in the dark condition, we do not directly assess whether these changes translate into improved tactile perception behaviorally.

      However, the primary aim of our study is not to claim perceptual enhancement, but to demonstrate that neural representations in the somatosensory cortex can rapidly reorganize in response to visual deprivation. To clarify this distinction, we have revised the manuscript to emphasize that the observed neural changes in S1 are consistent with functional reorganization of tactile representations, rather than a direct indication of perceptual improvement.

      Future studies will be crucial to directly test whether the enhanced separability of tactile representations in S1 correlates with improved tactile perception in a behavioral task. We have highlighted this as an avenue for future research to better understand the link between neural changes and perceptual outcomes.

      (6) In addition to point 4, the authors discuss implications for sensory rehabilitation, including Braille training and haptic feedback enhancement. However, the lack of actual chronic or even more acute pathological sensory deprivation, behavioral data, or subsequent intervention in this study limits the ability to draw translational conclusions. It remains unknown whether the more distinct neural representations observed actually translate into better tactile performance, discriminability, or perception. Additionally, extrapolating from rats walking on sandpaper in the dark to human rehabilitative contexts is speculative without a clearer behavioral or mechanistic bridge. The potential is certainly there, but the claim is currently aspirational rather than empirically grounded.

      Thank you for raising this important point. Upon careful consideration, we have decided to remove the discussion of sensory rehabilitation implications from the revised manuscript. We have refocused the manuscript to concentrate solely on the neural findings related to tactile encoding reorganization in response to short-term sensory deprivation, avoiding speculative extrapolation to human rehabilitative contexts. This revised approach ensures that the manuscript emphasizes the empirical findings without overstating the translational potential.

      (7) While the CNN showed good performance, details on generalization robustness and validation (e.g., cross-validation folds, variance across animals) are not deeply discussed. Also, while explainability tools were used, interpretability of CNNs remains limited, and more transparent models (e.g., linear classifiers or dimensionality reduction) could offer complementary insights.

      We appreciate the reviewer’s valuable feedback. In response to the concern about generalization robustness and validation, we have now conducted 5-fold cross-validation to assess the model's performance within animals (Figure 6C). We also have added supplementary information on the average silhouette scores across the different folds and animals (Supplementary Table 1, 2). These details are provided in the methods section and discussed in the results to offer a clearer picture of the model's robustness and consistency across rats.

      Regarding the interpretability of CNNs, we acknowledge that deep learning models can lack transparency. We also attempted classification using more transparent models such as PCA and SVM, but their performance did not exceed chance level (Supplementary Figure 2). This indicates that while these simpler models are more interpretable, they cannot capture the complex representations in the LFPs, making deep learning models like CNNs necessary for extracting these insights.

      Reviewer #2 (Public review):

      (1) Despite applying explainability techniques to the CNN-based decoder, the study does not clearly demonstrate the precise "subtle, high-dimensional patterns" exploited by the CNN for surface roughness decoding, limiting the physiological interpretability of the results. Additional analyses (e.g., detailed waveform morphology analysis on grand averages, time-frequency decompositions, or further use of explainability methods) are necessary to clarify the exact nature of the discriminative activity features enabling the CNN to decode surface roughness and how these change with the sensory context (i.e., in light or darkness).

      Thank you for your insightful comment. We recognize the importance of clarifying the exact nature of the high-dimensional neural patterns that the CNN exploits for surface roughness decoding. In response, we have performed additional analyses to provide a more detailed explanation of the CNN's decision-making process and the discriminative features it learned:

      Grand-Average LFP Waveforms Analysis: We calculated the grand-average LFP waveforms for each texture × lighting condition (Figure 4A). While visual inspection did not reveal distinct features in the averaged waveforms, we explored the channel-wise correlations between textures under both light and dark conditions (Figure 4B). We found that the correlation between textures was lower in the dark condition, suggesting that LFPs become more distinct between textures when visual input is absent, which aligns with the CNN’s output.

      Time-Frequency Decomposition (Wavelet Analysis): We also performed time-frequency decomposition of the LFPs using wavelet transforms (Figure 4D). No prominent differences emerged across texture × lighting conditions in the spectral domain. However, upon computing differences in wavelet features between light and dark conditions and analyzing the relationship with the CNN's attribution scores (Supplementary Figures 5A-C), we observed a negative correlation in the 50-60 Hz range and a positive correlation in the 80-90 Hz range. This suggests frequency-specific modulation in LFP activity that may contribute to texture representations, providing further support for the CNN’s learned features.

      (2) The claim regarding cross-modal representation reorganization heavily relies on a silhouette analysis (Figure 5C), which shows a modest effect size and borderline statistical significance (p≈0.05 with n=9+2). More rigorous statistical quantification, such as permutation tests and reporting underlying cluster distances for all animals, would strengthen confidence in this finding.

      Thank you for your thoughtful comment. We appreciate your suggestion to strengthen the statistical rigor of our analysis regarding the cross-modal representation reorganization. In response, we have implemented several additional analyses to more rigorously quantify the separability of neural representations between light and dark conditions:

      (1) Permutation Test for Cluster Separability: We performed a permutation test to assess whether the observed differences in cluster separability between light and dark conditions were statistically significant or could have arisen by chance. The results showed that the silhouette scores for the dark condition consistently exceeded the 95th percentile of the null distribution (Supplementary Figure 4). This permutation test strengthens the validity of our findings, indicating that the enhanced separability in darkness is a systematic reorganization of neural representations, not due to random fluctuations.

      (2) Reporting Cluster Distances: To address concerns about the modest effect size and borderline significance, we have explicitly reported the underlying cluster distances in the form of silhouette scores for each individual animal (Supplementary Table 1, 2). These values reflect the Euclidean distance between clusters within each rat, providing a clearer understanding of the separability observed.

      (3) Additional Statistical Analysis on Silhouette Scores: To further enhance the rigor of our statistical analysis, we recalculated the silhouette scores using 5-fold cross-validation within each animal, ensuring that our results are robust across multiple data splits (Figure 6C).

      By incorporating these additional analyses and reporting detailed cluster distances, we believe we have significantly strengthened the confidence in our claim of cross-modal reorganization.

      (3) While the authors recorded in the somatosensory cortex, primarily known for its tactile responsivity, I would be cautious not to rule out a priori the presence of crossmodal (visual) responses in the area. In this case, the stronger texture separation in darkness might be explained by the absence of some visually-evoked potentials (VEPs) rather than genuine cross-modal reorganization. Clarification is needed to rule out visual interference and this would strengthen the claim.

      Thank you for raising this important point. In response to your concern, we carefully examined whether visually-evoked potentials (VEPs) could be present in the S1 recordings, particularly under the light condition. However, we observed that this experiment did not involve any cue-guided visual stimulation, such as flashing lights or visual cues aligned with the LFP recordings. Without such external visual stimuli, it is unlikely that VEPs would be reliably evoked in the S1. Therefore, we believe the stronger texture separation observed in the dark condition is not due to visual interference, but rather reflects a genuine sensory reorganization in response to the absence of visual input.

      (4) Behavioural controls are limited to gross gait parameters; more detailed analyses of locomotor behavior and additional metrics (e.g., pupil size or locomotor variance) would robustly rule out potential arousal or motor confounds.

      Thank you for your insightful comment regarding behavioral controls. In response, we have added locomotor speed traces aligned with corresponding LFPs (Supplementary Figure 1) to demonstrate that locomotion remained consistent across trials, irrespective of environmental condition (light vs. dark). Additionally, we report locomotor speed variance over 10-minute blocks to confirm no significant motor changes affecting neural recordings. These analyses indicate that LFP differences are unlikely due to locomotor confounds.

      While measuring pupil size could be useful for assessing arousal, the camera resolution in our study was insufficient for reliable measurements. We have noted this limitation in the Discussion and recommend that future studies with high-resolution eye-tracking explore arousal's role in sensory processing in S1.

      (5) The consistent ordering of trials (10 minutes of light then 10 minutes of dark) could introduce confounds such as fatigue or satiation (and also related arousal state), which should be controlled by analyzing sessions with reversed condition ordering.

      Thank you for highlighting the potential confounds due to trial ordering. To address this, we reversed the condition order (dark before light) in a subset of sessions from six rats and reanalyzed the data (Supplementary Figure 3). The results showed not significant, but increase separability in the dark condition, suggesting that the enhanced separability in the dark condition is not due to trial order effects like fatigue or satiation. While order effects may contribute to trial-to-trial variability, the consistent pattern of enhanced separability in the dark further supports the interpretation that visual deprivation directly influences the reorganization of tactile representations in S1.

      (6) The focus on forelimb-aligned LFP analyses raises the possibility that hindlimb-aligned data might yield different conclusions, suggesting alignment effects might bias the results.

      Thank you for your insightful comment on the potential bias of forelimb-aligned LFP analyses. We acknowledge that the choice of alignment event can influence the results and appreciate the suggestion to consider hindlimb-aligned data. However, our experimental design specifically focused on forelimb S1. The forelimb region of S1 was oversampled in our array, and as expected, we observed larger responses there, consistent with the known somatotopic organization of S1.

      While hindlimb-aligned data could provide additional insights, it is not directly relevant to the primary question of how forelimb S1 codes tactile information under visual deprivation. We do not believe the forelimb alignment introduces a bias, as it aligns with the sensory task being investigated. However, we recognize the value of exploring alternative alignments and have now included a discussion in the Methods section regarding the rationale for our design choices.

      (7) The authors' dismissal of amplitude-based metrics as ineffective is inadequately substantiated. A clearer demonstration (e.g., event-related waveforms averaged by conditions, presented both spatially and temporally) would support this claim.

      Thank you for your constructive comment. In response, we have added a more detailed analysis of event-related waveforms, averaged across conditions (light vs. dark, smooth vs. rough textures), and presented them spatially and temporally aligned to forelimb contact (Figure 4A). These waveforms did not show clear, distinct features that could differentiate conditions, which highlights the limitations of traditional amplitude-based metrics in detecting subtle neural activity changes related to visual deprivation.

      We further performed channel-wise correlation analyses (Figure 4B), revealing stronger texture correlations in the light condition, indicating that averaged waveforms do not capture the nuanced differences in neural dynamics. Additionally, time-frequency spectrograms and channel–channel correlation matrices (Figures 4C and 4D) did not show distinct condition differences, reinforcing the limitations of amplitude-based metrics.

      These findings, along with the superior performance of machine learning-based decoding methods (e.g., CNN), support our claim that amplitude-based approaches are insufficient for fully capturing the complexity of the neural data.

      (8) Wording ambiguity regarding "attribution score" versus "activation amplitude" (Figure 5) complicates the interpretation of key findings. This distinction must be clarified for proper assessment of the results.

      Thank you for pointing out the ambiguity between "attribution score" and "activation amplitude." To address this, we have revised the manuscript to use "attribution score" only.

      (9) Generalization across animals remains unaddressed. The current within-subject decoding setup limits conclusions regarding shared neural representations across individuals. Adopting cross-validation strategies and exploring between-animal analyses would add significant value to the manuscript.

      Thank you for highlighting the importance of generalization across animals. While our study focused on within-subject decoding, we acknowledge that this limits conclusions about shared neural representations across individuals. We expect that inter-animal generalization would be challenging, as models trained on data from a single rat may not perform well on data from others due to differences in electrode placement, brain anatomy, and neural representations. We recognize the value of cross-validation strategies and between-animal analyses and will consider them in future work to address this limitation.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) I would strongly recommend that the authors refine their introduction to be more concise. Many concepts and study aims are repeated many times and, therefore, present as highly redundant text. The introduction may be half the length and still contain the important concepts to set up the justification for the study. I would also suggest refining to be less about sensory deprivation (e.g., with blindness) and more in relation to context, as the acute nature of the study allows one to conclude more about the latter than the former.

      Thank you for your feedback on the introduction. We have revised the section to reduce redundancy and present the key concepts more concisely. We also streamlined the study aims and focused more on the context of the acute nature of the study, as you suggested, rather than emphasizing sensory deprivation. This revision better aligns with the main focus of the research and improves clarity. We believe the updated introduction provides a more direct justification for the study.

      (2) I am not sure if Figures 1-3 are meant to be in grey-scale for some reason (perhaps to represent light and dark), but I would encourage the authors to examine if this is necessary, as the use of color generally helps one more easily follow Figures.

      Thank you for this suggestion. Upon review, we agree that the use of color would enhance the clarity and readability of our figures. We have revised the figures including the newly added supplementary figures to incorporate color.

      (3) Figure 5, Figure legend title - check wording.

      Thank you for pointing this out. The title has been adjusted for consistency with the other figure legends.

      Reviewer #2 (Recommendations for the authors):

      (1) Analyses that would strengthen the main claims (major):

      (a) Identify the features exploited by the CNN.

      (i) Provide grand-average LFP waveforms for each texture × lighting condition (fore- and hind-limb channels shown separately, spatially arranged as in Figure 3C) and try to relate them to the decoding strategy learned by the CNN.

      Thank you for your helpful suggestion. We have calculated the grand-average LFP waveforms for each texture × lighting condition and included them in Figure 4A, with fore- and hind-limb channels shown separately and spatially arranged as in Figure 3C. Upon visual inspection, the mean waveforms did not reveal clear, distinct features. To further investigate, we computed the channel-wise correlation between different textures under both dark and light conditions. By subtracting the correlation coefficients for the dark environment from those in the light, we observed that the correlation between textures was lower in the dark environment (Figure 4B). This suggests that LFPs are more distinct between textures in the dark, supporting the CNN model's output. However, this also indicates that the CNN has captured more complex, nuanced information, as it is able to discriminate between LFPs on a single-trial basis, rather than relying on mean traces.

      To assess how the correlation between average LFP waveforms varied across channels, we also calculated the channel-channel correlation matrix for all 32 channels in each condition. While we found stronger correlations within each S1 subregion, we did not observe clear differences of correlation matrix between light and dark conditions, nor between different textures (Figure 4C).

      (ii) Add channel-wise and time-frequency maps (e.g., wavelet or spectrograms) for each texture × lighting condition and try to relate them to the decoding strategy learned by the CNN.

      Thank you for the valuable suggestion. We calculated wavelet features for each LFP segment and averaged them across trials to assess differences in LFP between light and dark conditions, as well as across textures (Figure 4D). However, no distinct differences were observed in the spectral map. To investigate further, we computed the differences in spectral maps for LFPs in light and dark trials. We then calculated the difference in attribution scores derived from the integrated gradient map (Supplementary Figure 4A). Subsequently, we calculated the correlation coefficients between the differences in integrated gradients and the differences in power across each frequency band in the spectral map (Supplementary Figures 4B and 4C). A negative correlation was found in the 50-60 Hz range, while a positive correlation was observed in the 80-90 Hz range. These findings suggest that frequency-specific patterns of LFP activity in different conditions may be linked to the texture representations captured by the CNN model. We have included a discussion of these findings in [lines 463-468].

      (b) Quantify the "enhanced separability in darkness" more rigorously.

      (i) Report cluster-distances (e.g. Euclidean) for each individual animal.

      We thank the reviewer for this helpful comment. When calculating the silhouette score, we used Euclidean distance as the distance metric. The silhouette score is defined for each data point as the difference between the average distance to points within its assigned cluster and the average distance to points in the nearest other cluster, normalized by the larger of the two values. Thus, the silhouette score inherently reflects the relative cluster distances both within and across conditions for each individual animal. Because we report and statistically analyze silhouette scores (Figure 6C), these values already quantify and compare the Euclidean cluster distances across conditions at the animal level. For clarity, we have now added a definition of the silhouette score in the Methods section of the main text [lines 269-278]. We also included the calculated silhouette scores in Supplementary Table 1.

      (ii) Run a permutation or bootstrap test (shuffling darkness/light labels within animals) to obtain an empirical null distribution for cluster separability in the network embedding space.

      We thank the reviewer for this important suggestion. In response, we implemented a permutation test to assess the robustness of our cluster separability results. Specifically, we shuffled the darkness/light labels within each animal and recalculated silhouette scores across 1000 resamples to generate an empirical null distribution. The observed separability between light and dark conditions consistently exceeded the 95th percentile of the null distribution (Supplementary Figure 3). This confirms that the enhanced cluster separability in darkness was not attributable to random fluctuations in labeling but instead reflected a systematic reorganization of neural representations.

      (c) Control for possible visually-evoked potentials (VEPs).

      (i) Search the LFPs recorded in light for stereotyped VEP components and/or comment on this possible confound (i.e., VEPs in S1?).

      Thank you for raising this point. Although it would be interesting to observe if a VEP is present in the S1 of rats, this experiment did not involve cue-guided visual stimulation. Additionally, there was no environmental visual cue that could serve as an external trigger to align the LFPs for VEP analysis in S1. Furthermore, since even the somatosensory evoked potential was not clearly visible in the S1 LFP without averaging the aligned LFPs, it is unlikely that we would be able to observe VEPs in single trials.

      (d) Address behavioral and arousal confounds.

      (i) Provide example locomotor-speed traces (aligned with corresponding LFPs) and report locomotor-speed variance across the 10-min blocks.

      Thank you for your comment. We had speedometer installed for the recording of the last two rats. We have now provided example speed traces and the speed variance across blocks in Supplementary Figure 1. The traces show that the locomotor-speed was stable in each trial.

      (ii) If available from the camera recordings, include pupil diameter as a proxy for arousal; otherwise, discuss explicitly how arousal changes might affect S1 LFPs.

      Thank you for this suggestion. We strongly agree that measuring pupil diameters should be incorporated into future studies. However, because our camera did not have sufficient resolution to capture pupil diameters, we have addressed this limitation in the discussion section [lines 525-537].

      (e) Address order effects (and motivation/satiety confounds)

      (i) Present at least a subset of sessions in which the dark block precedes the light block; re-analyze the silhouette score/discriminability with block order as a factor.

      Thank you for this helpful suggestion. We conducted additional analyses using sessions from 6 rats in which the dark block preceded the light block (Supplementary Figure 5A). Using the same model architecture, we calculated the silhouette score for each rat (Supplementary Figure 5B). However, when the order was reversed (dark preceding light), this discriminability effect disappeared. Thus, while we observed a trend toward higher scores in the dark condition, no statistically significant differences in texture discriminability were observed.

      If trial order alone accounted for the increase in discriminability, reversing the order would be expected to yield higher silhouette scores in the light condition. Our findings suggest that factors related to order (e.g., thirst or motivation, as you proposed) are not the sole contributors. Furthermore, previous studies in human participants have shown that brief blindfolding can produce lingering increases in tactile sensitivity, indicating a lasting effect of visual deprivation. Thus, the absence of significant differences in texture representation when the dark condition preceded the light condition may reflect such lasting effects. We have included a discussion in [lines 441-452].

      (ii) Discuss explicitly the potential confounding effect of motivational state/thirst.

      We appreciate the reviewer’s insightful comment. In the revised manuscript, we now explicitly address the potential confounding role of motivational state and thirst in shaping our results. Because animals were water-restricted to maintain task engagement, it is possible that increasing thirst or fluctuating motivation over the course of a session could alter arousal or attentional state, thereby influencing neural separability. However, when the trial order was reversed (dark condition preceding light), silhouette scores did not show a significant increase in the second (light) trial. Thus, while we acknowledge that motivational state may contribute to trial-to-trial variability, the systematic increase in separability during darkness cannot be fully explained by thirst or motivational confounds. This addition has been incorporated into the discussion section [lines 441-452].

      (f) Alignment control and the role of forelimb S1.

      (i) Repeat the decoding analysis with LFPs aligned to hind-limb strike; report whether the fore-limb dominance persists.

      Thank you for your thoughtful suggestion. We appreciate the opportunity to clarify. Our study was designed to ask a different question: how the absence of visual input reorganizes tactile encoding for the body part that actually initiates texture contact in our paradigm (the forepaw). Accordingly, all analyses were aligned to forelimb strike and our array intentionally oversampled S1-forelimb relative to S1-hindlimb (18 vs. 14 electrodes; Fig. 1F–G), yielding clear topographic forelimb-locked event-related responses (Fig. 3B–D) and forelimb-channel dominance in the decoding explainability analyses (Fig. 5D–E). Repeating the full decoding locked to hind-limb strike would test a different hypothesis and would be difficult to interpret for three reasons:

      Design/measurement alignment. Our kinematic detection was built to identify forelimb foot strikes. Extending the detector to hindlimb would require new model training/validation and introduces uncertainty in the exact contact timing relative to the LFP segments we analyze.

      Sampling asymmetry. The array and cortical magnification are not balanced across subregions (18 forelimb vs. 14 hindlimb electrodes; Fig. 1G), so a hind-limb–aligned comparison would be confounded by unequal coverage and signal-to-noise across S1 subdivisions rather than reflecting true “dominance.”

      Scope of the claim. We do not claim that the forelimb is globally more informative about texture; we show the intuitive and topographically specific result that “forelimb S1 codes textures touching the forelimb,” and that these representations become more separable in darkness (silhouette increase; Fig. 5C). A hind-limb–locked re-analysis would likely reveal hindlimb contributions when the hindpaw is the alignment event — but that would not change the central conclusion about darkness enhancing tactile representational separability.

      To address the underlying concern about generality without introducing the above confounds, we have clarified these design choices and limitations in the revised Methods [lines 194-197].

      (g) Amplitude-based baseline.

      (i) Show that a simple linear discriminant or logistic-regression model on peak amplitudes (and/or other simple features like trough width/slope) cannot reach the CNN's accuracy. This kind of "baseline" analysis could also be useful to pinpoint the discriminative features learned by the CNN.

      Thank you for your insightful suggestion. We agree that performing a baseline comparison with a simpler model could help highlight the advantage of using a CNN. However, in our dataset, individual LFP traces do not exhibit clear peaks or well-defined features such as peak amplitude, width, or energy, which makes feature extraction using traditional methods like linear discriminants or logistic regression challenging.

      To address this, we performed principal component analysis (PCA) on the raw LFP traces to reduce the dimensionality and applied a support vector machine (SVM) classifier on the reduced features, in line with the approach used for the CNN models (Supplementary Figure 2A). The results of this analysis, demonstrate that the SVM model struggles to effectively discriminate between conditions, further reinforcing the necessity of the CNN model. The CNN’s ability to automatically learn complex features from the raw LFP data appears to be a crucial factor in achieving superior classification performance (Supplementary Figure 2B).

      (h) Cross-validation and inter-animal generalization.

      (i) Consider replacing the single 80/20 split with k-fold cross-validation within animals.

      Thank you for this suggestion. Instead of using an 80/20 split, we performed 5-fold cross-validation on all rats. The silhouette scores were averaged within each animal across the five folds, and Figure 6C was updated accordingly. After performing a paired t-test, we still observed a significant difference in silhouette scores between the light and dark conditions.

      (ii) Comment on inter-animal generalization.

      Thank you for this valuable feedback. Although we did not explicitly test inter-animal generalization, it is unlikely that a model trained on data from one rat would perform equally well when classifying data recorded from another animal. This limitation arises from two main factors. First, despite careful efforts to implant electrodes in the same brain region and cortical layer across experiments, it is impossible to align all 32 electrodes to identical coordinates. Consequently, the recorded LFPs are obtained from slightly different locations, which may reflect distinct neural processing. Second, even within the same species, individual animals differ in brain size and neural circuit organization. Thus, even if electrodes could be placed at identical anatomical locations, inter-individual variability in brain structure would still lead to differences in the recorded signals. Because deep learning models are often sensitive to small perturbations in their input data, we believe that robust inter-animal generalization is unlikely without fine-tuning the model using data from the target animal. This comment has been inserted in the Discussion [lines 494-507].

      (2) Writing, figure and terminology improvements (minor):

      (a) Figure 5F-G axis label. Decide on either "attribution score" or "activation amplitude" and use that term consistently in panels, legend, and text (currently, I believe it could be confused with raw signal amplitude).

      We have unified the terminology to "attribution score" and applied this consistently across the panels, legend, and text.

      (b) Throughout the manuscript, use "population-level activity" or "average population dynamics" when discussing LFPs (I believe it is more correct to reserve "population code" for multiple single-unit datasets).

      We agree with the reviewer’s point and have adapted the term "population dynamics" to describe LFP information consistently throughout the manuscript.

      (c) Lines 219-221, state down-sampling to 2 kHz, whereas line 289 mentions 10 kHz. Reconcile these numbers.

      We apologize for the confusion and thank the reviewer for thoroughly reading the manuscript. Our original sampling rate was 30 kHz, and all analyses were performed on data resampled to 10 kHz. The reference to 2 kHz was an error, and we have corrected it.

      (d) Specify the tail of each statistical test mentioned in the manuscript and any multiple-comparison correction used.

      We have specified the tail of each statistical test and any multiple-comparison corrections used in the "Data Analysis" section of the Methods.

      (e) Line 244: "variables (He et al., 2015)" → "variables (He et al., 2015)".

      We have corrected this formatting issue and revised it to "variables (He et al., 2015)".

      (f) Line 253: "one-dimentional" → "one-dimensional".

      We have corrected the spelling error and revised it to "one-dimensional".

      (3) Data and code sharing:

      (a) Consider depositing data and code for the analysis in public open repositories.

      Thank you for your suggestion. We have set up a public GitHub repository to share the code. Since the full dataset is quite large (~400GB), we have uploaded a smaller example dataset for the analysis.

    1. eLife Assessment

      The authors test the hypothesis that gonadal steroid signaling influences the transcriptional development of specific neurons in the mPOA during adolescence, and that such adolescent development of the mPOA is necessary for mating behaviors. The valuable findings are supported by convincing evidence. This work contributes new insight into hormone-sensitive transcriptional profiles within genetically defined neuron clusters in the mPOA during adolescence and will be of interest to systems and molecular neuroscientists and those interested in development, sex differences, and/or hormonal regulation.

    2. Reviewer #2 (Public review):

      Summary:

      An abundant literature documents molecular changes in the rodent hypothalamus that occur during the transition from prepubertal to mature reproductive physiology. Equally well documented is the role of sex steroids and their receptors during this important period of reproductive development, as well as the importance of GABAergic and glutamatergic neurons. The medial preoptic area (MPOA) is known to play a central role in expression of sexually dimorphic reproductive function and previously reported sexually dimorphic patterns of gene expression are consistent with this role. The present manuscript extends this knowledge base and reports the results of a detailed evaluation of transcriptional dynamics in the MPOA during the adolescent transition to maturity with a particular focus on the role of the estrogen receptor gene (Esr1). Both single cell RNA sequencing (scRNseq) and multiplex in situ hybridization methods were employed and the results subjected to detailed computational analyses to demonstrate that the transcriptomic structure of MPOA neurons displays both sex and cell type specific expression profiles. In addition, both hormonal and genetic manipulations of Esr1 signaling during puberty altered the transcriptional profiles of MPOA neurons, and these changes aligned with maturation of hormone-dependent reproductive function. The authors provide this evidence to illustrate Esr1-dependent control of gene regulatory networks required for normal expression of reproductive behaviors expressed during the transition from adolescence to adulthood. The results presented in this manuscript are extensive and represent the most comprehensive evaluation of transcriptomic changes during reproductive maturation to date. The methods appear strong and the results provide a rich data set that will support a good deal of future analysis.

      Strengths:

      (1) The major strength of this manuscript is the extensive set of images and graphs that illustrate molecular changes that occur in MPOA neurons during adolescence, although additional spatial detail as to locations of the source neurons would be welcome in order to place the changes in the proper circuitry context.

      (2) Targeting Esr1 deletion to MPOA GABA neurons is a good choice, given how these cells have been implicated in sexual differentiation of reproductive behavior previously, and the lack of comparable responses in glutamatergic neurons is convincing. The AAV-frtFlex-Cre virus created by the investigators is a most useful tool for such studies. Profiling distinct transcriptomic trajectories in GABA and glutamatergic neurons during reproductive maturation is impressive and leads to some of the best supported conclusions in this paper.

      (3) Cellular and molecular resolution of the transcriptomics data appears excellent, however, because the source tissue for the scRNAseq analysis was obtained by bulk dissection of the MPOA anatomical resolution is limited. This problem is addressed to some extent by careful comparison of scRNAseq results with previously published spatial transcriptomics data. The HM-HCR-FISH analysis clearly documents spatially restricted changes in gene expression, but it is hard to discern where these changes occur based on the images presented or the descriptions included in the Results. The anatomical schematic included in Figure 4 suggests that investigators are not familiar with components of the MPOA (see Allen Mouse Brain Atlas).

      Weaknesses:

      (1) A major conceptual flaw is that the authors do not distinguish between genetically determined sex differences in patterns of gene expression and differences caused by the fact that MPOA neurons are exposed to different endocrine environments in adolescent males and females, which can cause different transcriptional trajectories independent of genetic sex. This issue does not render their results invalid, but their terminology should address the issue in the discussion and "limitations" section. At the very least the endocrine status of "intact females" should be included.

      (2) A major technical flaw is that the MPOA is treated as a functionally distinct brain region (block dissections) with uniform distribution of cell types (FISH data are not illustrated or reported with sufficient spatial detail). Thus, an enormous amount of molecular data is provided that cannot be mapped to distinct neural circuits, thereby limiting the neurobiological impact. This is also a weakness of the FISH data, which is presented with only small regions illustrated without anatomical detail. In fact, some images are compared that appear to illustrate different MPOA structures, although it is impossible to be certain of this due to the lack of morphological landmarks. The analysis of how Esr1 orchestrates regulatory gene networks is impressive and interesting, but the fact that many of the observed transcriptional events occur in neural circuits that do not overlap confounds interpretation.

      (3) The locations of the AAV injections should be characterized because deleting Esr1 in multiple distinct parts of the MPOA will likely confound interpretation. This is especially problematic given the limited number of mice used for parts of the RNAscope analysis.

      (4) Although the focus of these experiments on adolescence is welcome, neither the Introduction nor the Discussion do a good job of placing these studies in the context of what is already known about brain maturation during puberty. It is true that this is very much a results-focused manuscript, but the scholarship can be improved. Simply stating that your results are consistent with previous reports places an undue burden on the reader to go figure out what is new.

      (5) Throughout the manuscript, the authors utilize obscure abbreviations, which often makes reading their text overly cumbersome. This is certainly justified in certain instances where complex names of analytical methods are used repeatedly, but the authors are encouraged to try and simply their use of non-standard abbreviations.

      Comments on revisions:

      The authors have considered issues raised during the initial review. Although there do not appear to be significant changes to analyses, figures or conclusions, the authors have added important revisions listing limitations in study design and methodology that impact interpretation.

    3. Reviewer #3 (Public review):

      The paper identifies effects of gonadal hormones within hormone-responsive GABAergic neurons in the MPOA. Although it is not surprising that hormones have effects on neurons that express hormone receptors, the current paper adds insights with higher cellular and spatial resolution than previous work and focuses on adolescence period. The paper also identifies a major role for Esr1-dependent mechanisms on behavior using an intersectional genetic strategy to ablate Esr1 in GABAergic or glutamatergic neurons in the MPOA.

      The authors have thoughtfully addressed the reviews, in particular by focusing quantitative analyses on Vgat+Esr1+ clusters and adding important technical and conceptual considerations in the limitations section.

      I have one remaining minor concern. I appreciate that the text now defines "transcriptional maturation". However, the term seems inappropriate when describing the "minimal transcriptional changes" in Vgat+hormone RLow clusters, which implies that they are transcriptionally immature. Do the authors mean to imply that transcriptional maturation is observed in Vgat+Esr1+ clusters but not Vgat+hormone RLow clusters? The authors also use the term "hormone-dependent transcriptional dynamics", which I think is more appropriate. For example, hormone-dependent transcriptional dynamics are observed in Vgat+Esr1+ clusters but not Vgat+hormone RLow clusters.

    4. Author Response:

      The following is the authors’ response to the original reviews.

      Public review:

      Reviewer #1 (Public review):

      Weaknesses:

      Two minor comments

      (1) Fig 4 (hormone treatment): In this experiment, testosterone is given to males, yet in Sup Fig 6 it is argued that Esr1 is more influential in driving transcriptional changes compared to AR. Does DHT treatment have the same outcome as testosterone? Or, does estrogen treatment in males have the same outcome as testosterone?

      We agree that to distinguish AR and Esr1 activation by testosterone and converted estrogen respectively is a limitation in our study. We added discussion in the “limitation of the study” section.

      Although HM-HCR experiments showed the bidirectional control of transcriptional progression during adolescence, it is unclear if the facilitation in male by testosterone supplement is via activation of AR or Esr1 or both because testosterone will likely be converted to estrogen in the brain. Future studies using dihydrotestosterone (DHT) and estrogen to males may address this issue.

      (2) Fig 3i: There appears to be an age-dependent transcriptional change in male Vgat HR-low cells. Can the authors comment on age-dependent (hormone-independent) transcriptional changes in males versus females.

      We agree that it is important to clarify hormone dependent changes and age dependent changes. We added pair-wise DE results in Vgat HR low population in the main text. As consistent with trajectory analysis, the number of age-dependent genes were fewer than hormonally associated genes.

      “Pair-wise DEG analysis consistently showed that larger number of DEGs between P35 and P23 in Vgat+Esr1+ (male: 146 genes; female: 162 genes) than Vgat+ hormone R<sup>Low</sup> (male: 26 genes; female: 1 gene).”

      Reviewer #2 (Public review):

      Weaknesses:

      (1) A major conceptual flaw is that the authors do not distinguish between genetically determined sex differences in patterns of gene expression and differences caused by the fact that MPOA neurons are exposed to different endocrine environments in adolescent males and females, which can cause different transcriptional trajectories independent of genetic sex. This issue does not render their results invalid, but their terminology should address the issue in the discussion and "limitations" section. At the very least the endocrine status of "intact females" should be included.

      We agree that this was ideal if perinatal and pubertal dynamics are analyzed within the same study to distinguish these two processes. We added discussion in the “limitation section”.

      “2. Although we have identified hormone/Esr1 dependent transcriptional trajectories during adolescence, the relations and interplay with genetically determined perinatal event, which is earlier and robust, are unclear. Some sex differences during adolescence might be an extension of perinatally established sex differences while others might be unique adolescent changes.”

      (2) A major technical flaw is that the MPOA is treated as a functionally distinct brain region (block dissections) with uniform distribution of cell types (FISH data are not illustrated or reported with sufficient spatial detail). Thus, an enormous amount of molecular data is provided that cannot be mapped to distinct neural circuits, thereby limiting the neurobiological impact. This is also a weakness of the FISH data, which is presented with only small regions illustrated without anatomical detail. In fact, some images are compared that appear to illustrate different MPOA structures, although it is impossible to be certain of this due to the lack of morphological landmarks. The analysis of how Esr1 orchestrates regulatory gene networks is impressive and interesting, but the fact that many of the observed transcriptional events occur in neural circuits that do not overlap confounds interpretation.

      We agree that while MPOA is defined based on brain atlas consistently across samples, the boundary is somewhat less obvious compared to other nuclei (e.g. hippocampus, VHM etc). To minimize the contaminations from adjacent areas, we have restricted quantitative analysis to mostly Vgat+ Esr1+ population which are densely located within the MPOA but not in immediately adjacent areas, except posterior BNST which is readily distinguishable. We added clarification in the method as well as added technical limitation in the discussion below.

      Method

      “To disambiguate the MPOA and adjacent brain regions, quantitative analysis is restricted to Vgat+ Esr1+ neurons and is devoid of posterior BNST.”

      Discussion

      “3. While we have observed robust effect of Esr1-KO in scRNAseq experiment which was further validated with FISH experiment, it is possible that there are further heterogeneous Vgat-Esr1 populations in the MPOA which might be differentially targeted in each virally injected sample. To mitigate this, 3-4 mice were pooled for each sample in scRNAseq experiment and in HCR-FISH experiment, in addition to confirming recombinase RNA expression within the MPOA, we included samples with robust Esr1 deletion in the MPOA. Interestingly, due to the technical challenge, Esr1 deletion tends to be more robust than weakly detected recombinase RNA expression (data not shown).”

      (3) The locations of the AAV injections should be characterized because deleting Esr1 in multiple distinct parts of the MPOA will likely confound interpretation. This is especially problematic given the limited number of mice used for parts of the RNAscope analysis.

      We agree that similar to #2, this is an important matter. For HCR experiment, we only included animal with recombinase RNA (Cre or Flp) expression within MPOA. Although the recombinase expression was sufficient enough to qualitatively determine the hit or miss, the detection was weak and it was challenging to determine the extent of viral spread. Thus, we also used successful Esr1 deletion as an additional inclusion criteria for AAV-Cre-YFP group. We have added inclusion criteria in the method and technical consideration in discussion.

      Method

      “For HCR2, AAV was injected unilaterally so that successful targeting of the MPOA with AAVCre-YFP (detection of recombinase RNA within the MPOA) and the deletion of Esr1 were confirmed for inclusion of samples.”

      Discussion

      “3. While we have observed robust effect of Esr1-KO in scRNAseq experiment which was further validated with FISH experiment, it is possible that there are further heterogeneous Vgat-Esr1 populations in the MPOA which might be differentially targeted in each virally injected sample. To mitigate this, 3-4 mice were pooled for each sample in scRNAseq experiment and in HCR-FISH experiment, in addition to confirming recombinase RNA expression within the MPOA, we included samples with robust Esr1 deletion in the MPOA. Interestingly, due to the technical challenge, Esr1 deletion tends to be more robust than weakly detected recombinase RNA expression (data not shown).”

      (4) Although the focus of these experiments on adolescence is welcome, neither the Introduction nor the Discussion do a good job of placing these studies in the context of what is already known about brain maturation during puberty. It is true that this is very much a results focused manuscript, but the scholarship can be improved. Simply stating that your results are consistent with previous reports places an undue burden on the reader to go figure out what is new.

      We agree that contextualizing our study in the scholarship will clarify the novelty and impacts that this study provides to the community. We have updated the introduction adding a review highlighting puberty associated genomic studies in the brain, which are all bulk (brain region level) as well as the very first puberty scRNAseq study in Human testis.

      “Despite the well-established role of these hormones in shaping behavior, the molecular mechanisms underlying their influence on brain development during adolescence are still limited to brain-region level (bulk)[8]in humans and model organisms and adolescent transcriptional dynamics at single cell resolution in the brain remain poorly understood (but see a pioneering study in the human testis[9]).”

      (5) Throughout the manuscript the authors utilize obscure abbreviations, which often makes reading their text overly cumbersome. This is certainly justified in certain instances where complex names of analytical methods are used repeatedly, but the authors are encouraged to try and simplify their use of non-standard abbreviations.

      We agree that this is helpful for readers to have the reference of abbreviations in handy at single location. We added an “abbreviation” section as a reference for readers.

      Medial preoptic area (MPOA)

      Single-cell RNA sequencing (scRNAseq)

      Estrogen receptor 1 (Esr1)

      GABAergic neurons (Vgat+)

      Glutamatergic neurons (Vglut2+)

      Hybridized chain reaction fluorescent in situ hybridization (HCR-FISH)

      Gonadectomized (GDX)

      Partition-based graph abstraction (PAGA)

      Hormone-associated differentially expressed genes (HA-DEGs)

      Multiplexed error-robust fluorescence in situ hybridization (MERFISH) differential gene expression (DE)

      Differentially expressed genes (DEGs)

      Support vector machine (SVM)

      Manifold Enhancement Latent Dimension (MELD)

      Potential of Heat-diffusion for Affinity-based Trajectory Embedding (PHATE)

      Androgen receptor (AR)

      single-cell regulatory network inference (SCENIC)

      Reviewer #3 (Public review):

      We appreciate reviewer for the constructive comments to improve our manuscript.

      Weaknesses:

      We already know that Esr1 is important within GABAergic but not glutamatergic neurons for mating behavior. However, there is not enough data to support the claim that disrupting Esr1 in glutamatergic MPOA neurons "had no observable effect." The MPOA is involved in many behaviors and physiologies that were not investigated. More assays would be required to report "no observable effect."

      The small number of cells included in the transcriptional studies is a general concern, as noted by the authors. This is a particular concern for conclusions related to the role of adolescence in glutamatergic MPOA neurons. The paper reports 24,627 neurons across all treatment groups, which include 3 time points, 2 sexes, and GDX conditions. It seems likely that not much was detected in the glutamatergic neurons because of insufficient power.

      Esr1 knockout is initiated in adolescence, not restricted to adolescence. Do we know that the effects on mating behavior are due to what is happening in adolescence vs. the function of Esr1 in adults? Are the effects different if Esr1 is knocked out in mature adults? This comparison would be important to demonstrate that adolescence is a critical time window for Esr1 function.

      We agree that 1. the relatively mild effects observed in Glutamatergic neurons may be partially due to the scale of the study, and 2. Esr1 deletion is permanent once induced and it is challenging to distinguish adolescent and adult transcriptional dynamics using existing viral strategies.

      We added discussion in the “limitation” section.

      “4. While we have observed robust transcriptional progression in Vgat<sup>+</sup> Esr1<sup>+</sup> neurons during adolescence, we observed more mild alternations in VgluT2<sup>+</sup> neurons. Although the scale of our study is comparable or exceeds prior scRNAseq studies in MPOA[22,29], future larger studies may have more sensitivity to detect adolescent transcriptional dynamics in VgluT2<sup>+</sup> neurons.”

      “5. Although we demonstrated adolescent transcriptional changes were observed as early as P35, and either hormonal deprivation or Esr1 KO in prior to adolescence prevented the transcriptional progression (arrested transcriptional state even at adult), given the viral incubation time and permanent deletion of Esr1 after viral injection, it is challenging to disambiguate the role of Esr1 during adolescence and adult. Future studies injecting the virus at adult may provide additional insights on the similarity and difference between transcriptional changes during puberty and maintained transcriptional states at adult.”

    1. eLife Assessment

      Using the clownfish model, this study examines how growth, feeding, and agonistic behavior result in socially dominant or subordinate states in size- and age-matched individuals of the clownfish, Amphiprion percula. The authors complement this work with whole-body transcriptomics and find significant variation in genes and gene co-expression modules related to growth and satiety-related pathways, as well as ossification-related genes. They provide solid evidence that emerging dominants grow more, eat more, and behave more aggressively than subordinate or solitary individuals; these phenotypic differences are accompanied by distinct gene expression profiles, including variation in growth- and satiety-related pathways. The work is valuable in advancing our understanding of how the social environment regulates phenotypic change; however, claims regarding the mechanistic role of gene expression are only partially supported by the current analyses.

    2. Reviewer #1 (Public review):

      Summary:

      Overall, this is an interesting and well-written manuscript on a fascinating question in a "charismatic" model system.

      Strengths:

      1) The Introduction is concise, though it might be helpful to the non-specialist reader to learn a bit more about what is known about the social control of somatic growth across diverse species (including humans), which would help to make this work more generally interesting.

      (2) The experiment is well-designed.

      (3) The data collected are comprehensive.

      (4) The complementary analysis of both feeding and aggression/submission data with and without known social roles is a neat idea and compelling!

      Weaknesses:

      (1) I was surprised that the HPA/stress axis was not considered here at all. Wouldn't we expect that subordinates have increased stress axis activation, which in turn could inhibit their growth and aggressive behavior?

      (2) To what extent are growth, food intake, agonistic behavior, and/or gene expression patterns coordinated across P1 vs P2 pairs? The lack of such an analysis seems like a missed opportunity.

      (3) What was the rationale for using whole bodies for the transcriptome analysis? Given the hypotheses, the forebrain or hypothalamus and certain other organ systems (e.g., liver, gonads, skin, etc.) would have been obvious candidate tissues here. I realize that cost is always a consideration, but maybe a focus on the fore-/midbrain could have been prioritized.

      (4) Given the preceding point, why was a fold-change threshold used for assessing DEGs (supplementary Figure 3)? There is no biological justification to ever use a fold-change threshold, especially in bulk RNA-seq analysis. This is particularly true here, where whole bodies were used for RNA-seq analysis, which is a bit unusual. Relatively small cell populations (such as hypothalamic neurons that regulate growth or food intake) may show substantial gene expression variation across social types, yet will be masked by the masses of other cells in the whole body sample. However, gene expression may still vary significantly, albeit the fold-difference may be small. I therefore suggest a reanalysis that omits any fold-change threshold.

      (5) Why is the analysis of color (hue, saturation) buried in the supplementary materials? Based on the hypotheses that motivated the study, color seems just as relevant as food intake, growth, and agonistic behavior, so even if the results are negative, they should be presented in the main paper.

      (6) The Discussion is sometimes difficult to follow. The authors may want to consider including a conceptual graphic that integrates the different aspects of growth and satiety regulation, etc., into a work-in-progress model of sorts, which would also facilitate clearer hypotheses for future research.

    3. Reviewer #2 (Public review):

      In this manuscript, the authors test growth, behavior, and gene expression in pairs of clownfish as they establish social dominance hierarchies, examining patterns of gene expression in these pairs after dominance has been established. The authors show solid evidence that emerging dominant clownfish show increased growth, aggression, and food consumption compared to their submissive or solitary counterparts, eventually adopting distinct gene expression profiles.

      Major Comments:

      (1) The Introduction is comprehensive, but it could be condensed. Likewise, the discussion could be condensed. There is considerable redundancy between the methods, the results, and the legend in Figure 1. The authors should consolidate and remove the redundancy.

      (2) For Figure 3, the authors are showing PC2 and PC3; why is PC1 not shown? There is so much overlap between the three groups in PC2 vs PC3; it seems unlikely that researchers could conclusively identify any individual as belonging to a group based on the expression profile. The ovals shown do not capture all the points within each of the groups, and particularly the grey S oval seems misaligned with the datapoints shown.

      (3) The authors indicate that the 15 replicates exhibiting the greatest size difference between P1 and P2 were selected for gene profiling. Does this mean that each of the P1 and P2 were pairs with each other? Have the authors tried examining the gene expression patterns in a paired manner? E.g., for the pairs that showed the greatest size differences, do they also show the greatest differences in gene expression? Do the P1s show the most extreme differences from P2s that also show the most extreme P2 differences? Perhaps lines on Figure 3A connecting datapoints from the P1 and P2 pairs would be informative.

      (4) For the specific target pathways that are up- and downregulated in the different backgrounds, I recommend that the authors include boxplots (or heatmaps) showing the actual expression values for these targets. Figure 6 shows a heatmap for appetite-related genes, and it would be great to see a similar graph for the metabolism and glycolysis genes; it would also be informative to see similar graphs for hormonal and sexual maturation pathways as well.

      (5) Particularly given that there is a relatively small number of genes enriched in the different rank conditions, I did not understand the need to do the WGCNA module analysis. I thought that an analysis of GO terms across the dataset would have been more meaningful than the GO term analysis shown in Figure 4, which considers only genes assigned to the "brown WGCNA module". This should be simplified or clarified.

      (6) The authors say that they have identified coordinated changes in behaviors and the "underlying gene expression, leading to the emergence" of social roles. This is a little bit misleading, since the gene expression analysis occurred well after the behavioral and phenotypic differences emerged. Presumably, the hormonal and genetic shifts that actually caused the behavioral and phenotypic difference occurred during the weeks during which the experiment was underway, and earlier capture of the transcriptome would presumably reveal different patterns, and ones that would be considered more causative. The authors acknowledge this in 434-435, but it could be emphasized further.

      (7) The authors have measured a number of differences between the different dominance classes of fish. All these differences were measured relative to the other classes, but in my view, the Solitary group was the closest to a baseline control. So I'm not sure that it is fair to say that "P2 and S individuals showed consistent downregulation of these genes and pathways" (line 401). I encourage the authors to emphasize the differences in gene expression from the "perspective" of the P1 individuals compared to the baseline of P2 and S individuals. Line 474 says that "P2 fish showed significant upregulation" of a number of pathways. It should be very clear what that is compared to (compared to P1, presumably?)

      (8) Along the same lines, the authors say in line 514 that subordinates and solitaries strategically downregulate their growth. I'm not convinced that this is the case: I would consider this growth trajectory to be the default and the baseline. I would interpret that under certain social conditions, a P1 dominant pattern of growth, behavior, and gene expression is allowed to emerge.

    4. Reviewer #3 (Public review):

      Summary:

      The authors tested the hypothesis that interactions among size- and age-matched rivals will lead to the emergence of social roles, accompanied by divergence in four aspects of individual phenotypes: growth, feeding behavior, fighting behaviors, and gene expression in clownfish.

      Strengths:

      The data on growth, feeding rate, and fighting behaviors support the authors' claims.

      Weaknesses:

      Gene analysis conducted in this study is not sufficient to clarify how the relevant genes actually regulate growth and behavior.

      The information obtained from whole-body gene expression analysis is very limited. Various gene expression is associated with the regulation of fighting behaviors, food intake, growth, and metabolism, and these genes are regulated differently across tissues, even within a single individual. Gene expression analysis should be performed separately for each tissue.

      Clownfish undergo sex change depending on social status and body size, as the authors mention in the manuscript. Numerous gene expressions are affected by sex change. It is unclear how this issue was addressed.

    5. Author Response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Overall, this is an interesting and well-written manuscript on a fascinating question in a"charismatic" model system.

      Strengths:

      (1) The Introduction is concise, though it might be helpful to the non-specialist reader to learn a bit more about what is known about the social control of somatic growth across diverse species (including humans), which would help to make this work more generally interesting.

      (2) The experiment is well-designed.

      (3) The data collected are comprehensive.

      (4) The complementary analysis of both feeding and aggression/submission data with and without known social roles is a neat idea and compelling!

      Thank you for the positive feedback!

      Here, we investigate phenotypic plasticity associated with the adoption of social roles in the clown anemonefish, with strategic growth being just one aspect of that plasticity. Strategic growth, also known as social control of growth, is a fascinating form of adaptive phenotypic plasticity, whereby individuals modify their growth and size in response to fine-scale changes in social conditions (Buston & Clutton-Brock, 2022). In cooperative breeding systems with high reproductive skew, particularly fishes and mammals (possibly including humans), individuals have been shown to i) increase growth/size on the acquisition of dominant status (Dengler-Crish & Catania, 2007; Johnston et al., 2021; Thorley et al., 2018; Van Schaik & Van Hooff, 1996; Walker & McCormick, 2009), ii) increase growth/size when paired with size matched reproductive rivals (Huchard et al., 2016; Reed et al., 2019; this study), and iii) decrease growth/size to avoid conflict (Buston, 2003; Heg et al., 2004; Wong et al., 2007). While strategic growth is fascinating and clearly occurring in this study, we show coordinated changes of multiple aspects of the phenotype as fish adopt social roles. Therefore, we deliberately framed the Introduction broadly to avoid biasing the reader toward viewing growth as the sole or main driver.

      Weaknesses:

      (1) I was surprised that the HPA/stress axis was not considered here at all. Wouldn't we expect that subordinates have increased stress axis activation, which in turn could inhibit their growth and aggressive behavior?

      We also expected to see the HPA/stress axis activated in subordinates, which is why we carried out a targeted exploration of genes known to play a role in this axis. We did not find any genes that were significantly differentially expressed. We believe that there could be two explanations for this. First, from a methodological perspective, it could be due to our use of a whole-body RNA-seq, which may have masked this signal. Alternatively, the stress axis might play a more complex role than just acting as a simple on/off switch for reduced growth. Its activation may peak when competition over size is at its highest (during week one) or, conversely, it may peak later and help maintain reduced growth once hierarchies are firmly established (particularly after the dominant individual reaches its maximum size). To understand the role of the stress axis, future studies should observe how its activation varies over time. We acknowledge that the absence of a stress‑axis signal and its potential explanations were not clearly discussed in the original manuscript, in the revised version, we will address this issue.

      (2) To what extent are growth, food intake, agonistic behavior, and/or gene expression patterns coordinated across P1 vs P2 pairs? The lack of such an analysis seems like a missed opportunity.

      We had a similar thought. Specifically, we were interested in testing the hypothesis that the final size ratio of pairs, which is indicative of the amount of conflict remaining, would predict gene expression. We examined gene expression within pairs to test for coordinated changes and repeated the analysis, accounting for the pair size ratio. In both cases, we found no clear or consistent pattern within pairs. We will consider including these figures in the Supplementary Materials document.

      (3) What was the rationale for using whole bodies for the transcriptome analysis? Given the hypotheses, the forebrain or hypothalamus and certain other organ systems (e.g.,liver, gonads, skin, etc.) would have been obvious candidate tissues here. I realize that cost is always a consideration, but maybe a focus on the fore-/midbrain could have been prioritized.

      We decided to use whole-body samples for this initial transcriptomic analysis to capture a broad view of gene-expression differences while keeping sequencing costs and sample requirements manageable. We agree with the reviewer that future work should explore specific tissues sampled from individuals at multiple time points to disentangle transcriptomic differences across tissue types.

      (4) Given the preceding point, why was a fold-change threshold used for assessing DEGs (supplementary Figure 3)? There is no biological justification to ever use a fold-change threshold, especially in bulk RNA-seq analysis. This is particularly true here, where wholebodies were used for RNA-seq analysis, which is a bit unusual. Relatively small cell populations (such as hypothalamic neurons that regulate growth or food intake) may show substantial gene expression variation across social types, yet will be masked by the masses of other cells in the whole body sample. However, gene expression may still vary significantly, albeit the fold-difference may be small. I therefore suggest a reanalysis that omits any fold-change threshold.

      We thank the reviewer for this important point, and agree that an arbitrary fold‑change cutoff is inappropriate/unnecessary. It should be noted that this fold-change cut-off was only used in this single figure, and all other analyses used p-values from the entire dataset. We will remove the fold‑change threshold cutoff and correct Supplementary Figure 3, and any corresponding text.

      (5) Why is the analysis of color (hue, saturation) buried in the supplementary materials?Based on the hypotheses that motivated the study, color seems just as relevant as food intake, growth, and agonistic behavior, so even if the results are negative, they should be presented in the main paper.

      We agree that color can be an important social signal, so we included color measurements in our experimental design. However, after careful consideration of the color results, we decided that our experimental timing and husbandry changes introduced multiple confounding factors, preventing us from drawing confident conclusions. Specifically, our fish were ≈1 month old at the transfer from larval to experimental tanks and had already begun to deepen their orange hue, before our experiment. (In the wild, they would settle at two weeks of age, prior to the deepening of the orange hue). Once individuals attain a certain hue, it seems that color development can be halted, but not reversed. The transfer also involved changes in lighting, tank background, and diet, factors known to strongly affect coloration. Our results show a uniform shift in orange hue and saturation across social groups, suggesting that these confounding factors might have dominated changes in hue.

      For transparency, we report the color data in the Supplementary Materials, but we caution against drawing any strong conclusions. In the revised manuscript, we will recommend that future work include a targeted experiment to robustly test for the effect of the adoption of social roles on coloration or the effect of coloration on the adoption of social roles.

      (6) The Discussion is sometimes difficult to follow. The authors may want to consider including a conceptual graphic that integrates the different aspects of growth and satiety regulation, etc., into a work-in-progress model of sorts, which would also facilitate clearer hypotheses for future research.

      Thank you for flagging that parts of the Discussion are a bit difficult to follow. In the revised manuscript, we will work to improve readability of the Discussion. We also appreciate the suggestion of including a conceptual schematic. We will consider whether adding such a graphic will add value to this manuscript or future manuscripts.

      Reviewer #2 (Public review):

      In this manuscript, the authors test growth, behavior, and gene expression in pairs of clownfish as they establish social dominance hierarchies, examining patterns of gene expression in these pairs after dominance has been established. The authors show solid evidence that emerging dominant clownfish show increased growth, aggression, and food consumption compared to their submissive or solitary counterparts, eventually adopting distinct gene expression profiles.

      Major Comments:

      (1) The Introduction is comprehensive, but it could be condensed. Likewise, the discussion could be condensed. There is considerable redundancy between the methods, the results,and the legend in Figure 1. The authors should consolidate and remove the redundancy.

      Thank you for flagging that parts of the manuscript could be condensed, we will work on this as we revise the manuscript.

      (2) For Figure 3, the authors are showing PC2 and PC3; why is PC1 not shown? There is so much overlap between the three groups in PC2 vs PC3; it seems unlikely that researchers could conclusively identify any individual as belonging to a group based on the expression profile. The ovals shown do not capture all the points within each of the groups, and particularly the grey S oval seems misaligned with the datapoints shown.

      We understand the concern raised by the reviewer about the overlap among points in the PCA. We have explored PC1-PC3 and found that PC2 and PC3 showed the clearest, statistically significant clustering by social position, while PC1 did not capture any variation due to social position. We have explored whether other factors might be masking differences, such as genetic relatedness, tank effects, total read count per sample, and found that none of these factors explained sample clustering. Regarding the ellipses shown around the points, they were not intended to capture all points, but rather they show the estimated 95% multivariate t-distribution for that given social group. We will make sure this is clearly explained in the figure legend, and Methods section. In addition, in the revised version, we will show PC1 and PC2, and PC1 and PC3, in the Supplements for transparency.

      (3) The authors indicate that the 15 replicates exhibiting the greatest size difference between P1 and P2 were selected for gene profiling. Does this mean that each of the P1and P2 were pairs with each other? Have the authors tried examining the gene expression patterns in a paired manner? E.g., for the pairs that showed the greatest size differences,do they also show the greatest differences in gene expression? Do the P1s show the most extreme differences from P2s that also show the most extreme P2 differences? Perhaps lines on Figure 3A connecting datapoints from the P1 and P2 pairs would be informative.

      Yes, “15 replicates exhibiting the greatest size difference between P1 and P2 were selected for gene profiling” refers to pairs of P1 and P2, we will make sure this is clearly stated in the revised Methods. Yes, we have explored gene expression data considering the size difference between pairs, and found that it showed no clear differences in gene expression patterns (see earlier response to Reviewer #1). We will consider including these figures in the Supplementary Materials document, as well as adding a version of Figure 3A that clearly shows information on pairs, as suggested by the reviewer.

      (4) For the specific target pathways that are up- and downregulated in the different backgrounds, I recommend that the authors include boxplots (or heatmaps) showing the actual expression values for these targets. Figure 6 shows a heatmap for appetite-related genes, and it would be great to see a similar graph for the metabolism and glycolytic genes; it would also be informative to see similar graphs for hormonal and sexual maturation pathways as well.

      We have explored genes across a broad set of metabolic pathways (glycolysis, TCA cycle, lactic fermentation, PDH complex, cholesterol biosynthesis, fatty-acid synthesis, and beta-oxidation) and show all metabolic genes that showed significant differential expression between P1, P2, and S in Figure 6. Overall, very few metabolism-associated genes were significantly differentially expressed, which is why we decided to combine appetite-regulation and metabolism-associated genes into a single figure (Figure 6). In the revised version, we will ensure that Figure 6 clearly shows the gene sets associated with appetite and metabolism.

      We also examined hormonal pathways (glucocorticoid and thyroid signaling), but did not find genes in these pathways that were significantly differentially expressed. Finally, we would like to clarify that our samples consist of two-month-old juvenile individuals that are sexually immature —under ideal conditions, clown anemonefish can mature in one to two years, but they can also remain sexually immature for a decade or more (Buston & García, 2007) — which is why we did not observe distinct molecular signatures of sexual maturation. We recognize that the sentence at line 520 may be misleading, as we did not identify any gene expression signature that we could confidently associate with signs of sexual maturation. We will make sure that these are clearly stated in the revised version of the manuscript.

      (5) Particularly given that there is a relatively small number of genes enriched in the different rank conditions, I did not understand the need to do the WGCNA module analysis. I thought that an analysis of GO terms across the dataset would have been more meaningful than the GO term analysis shown in Figure 4, which considers only genes assigned to the "brown WGCNA module". This should be simplified or clarified.

      To clarify, GO enrichment analysis does not establish correlations with traits, it only describes which functions or pathways are over-represented in a given gene set. That is why we began by using WGCNA to define gene sets (modules) that are correlated to phenotypes. Our primary rationale for WGCNA was to identify modules of co-expressed genes that show significant statistical correlation with the phenotypes of interest (social role: P1, P2, S; growth; and food intake). Pairwise differential expression analysis (Figure 3B) identified a few hundred significantly differentially expressed genes, but those tests treat genes independently and are not able to help us link coordinated changes of co-expressed genes to phenotypes of interest. Because WGCNA is blind to traits, it first identifies groups of co-expressed genes, which can help resolve gene expression patterns.

      We therefore ran WGCNA on the rlog-transformed dataset to identify modules of co-expressed genes that show significant correlation with phenotypes of interests. For every module that showed such a correlation, we performed GO enrichment and carefully evaluated the resulting GO enrichment trees (see Supplementary Figs. 4–5). The brown module was highlighted in the main text because it was one of the modules with a significant correlation to growth, and its associated GO enrichment showed clear growth-related signals that were not identified in the pairwise differential expression analysis results.

      (6) The authors say that they have identified coordinated changes in behaviors and the"underlying gene expression, leading to the emergence" of social roles. This is a little bit misleading, since the gene expression analysis occurred well after the behavioral and phenotypic differences emerged. Presumably, the hormonal and genetic shifts that actually caused the behavioral and phenotypic difference occurred during the weeks during which the experiment was underway, and earlier capture of the transcriptome would presumably reveal different patterns, and ones that would be considered more causative.The authors acknowledge this in 434-435, but it could be emphasized further.

      We appreciate the reviewer raising this point. In the updated version of the manuscript, we will revise wording to convey that food intake, agonistic behavior, size and growth, and gene expression are all changing continuously, in response to each other and in response to social feedback. An underappreciated aspect of this system (and likely many other systems) is that phenotype (including transcriptome) influences the outcome of social interactions, and the outcome of social interactions influences the phenotype (including the transcriptome). Earlier capture of the transcriptome would reveal different levels of gene expression, reflecting the state of the system at that moment in time.

      (7) The authors have measured a number of differences between the different dominance classes of fish. All these differences were measured relative to the other classes, but in my view, the Solitary group was the closest to a baseline control. So I'm not sure that it is fair to say that "P2 and S individuals showed consistent downregulation of these genes and pathways" (line 401). I encourage the authors to emphasize the differences in gene expression from the "perspective" of the P1 individuals compared to the baseline of P2and S individuals. Line 474 says that "P2 fish showed significant upregulation" of a number of pathways. It should be very clear what that is compared to (compared to P1, presumably?)

      We agree with the reviewer that solitary individuals are the most intuitive baseline. Indeed, the experimental design included solitary fish because we expected they would serve as a useful control. Without social restraint, we anticipated they would show unrestricted growth, feeding, behavior, and associated gene‑expression patterns, similar to dominants.

      We initially ran analyses using solitaries as the baseline, but after examining the results, which showed subordinate‑like characteristics for the solitary individuals, we concluded that solitary individuals are not an ecologically appropriate control for this context. Removing juveniles from a social context and housing them in isolation may be stressful and can affect physiology and behavior in ways that do not reflect a natural baseline. From a life‑history standpoint, solitary living is not the typical state for A. percula.

      For these reasons, we reanalysed the dataset using the dominant (P1) as the reference to enable more ecologically meaningful comparisons (this choice was somewhat arbitrary, subordinates could also have been used as the reference). Given that gene expression is relative, we interpret results from both the dominant (P1) and subordinate (P2) perspectives in the Discussion to provide a complete view. We will clarify wording throughout the manuscript to make it clear that everything is relative (e.g., revising Line 474).

      (8) Along the same lines, the authors say in line 514 that subordinates and solitaries strategically downregulate their growth. I'm not convinced that this is the case: I would consider this growth trajectory to be the default and the baseline. I would interpret that under certain social conditions, a P1 dominant pattern of growth, behavior, and gene expression is allowed to emerge.

      We respectfully disagree with the idea that a single baseline/reference growth trajectory exists for any individual of this species. Growth of individuals is entirely social context-dependent: neither fast nor slow growth represents an inherent baseline. When two size‑matched juveniles meet and compete to establish dominance, accelerated growth is the expected trajectory. By contrast, juveniles joining an existing hierarchy are expected to exhibit reduced growth, which minimizes conflict and facilitates their social integration. Unlike species that show non socially mediated growth trajectories, clown anemonefish do not have a context‑independent growth rate, rather, individuals constantly readjust their growth according to their immediate social environment.

      Therefore, growth trajectories must be considered from the perspective of all group members, because they emerge from interactions among individuals rather than reflecting an intrinsic baseline. In this study, we were interested in the establishment of dominance hierarchy and how individuals adjust their phenotypes during this process. By experimentally pairing size‑matched rivals, both individuals are initially expected to pursue the dominant trajectory, and thus neither individual represents a default state. Instead, the outcome reflects a social decision, after which both individuals reinforce their emerging social roles through coordinated changes.

      Reviewer #3 (Public review):

      Summary:

      The authors tested the hypothesis that interactions among size- and age-matched rivals will lead to the emergence of social roles, accompanied by divergence in four aspects of individual phenotypes: growth, feeding behavior, fighting behaviors, and gene expression in clownfish.

      Strengths:

      The data on growth, feeding rate, and fighting behaviors support the authors' claims.

      Thank you for the positive feedback!

      Weaknesses:

      Gene analysis conducted in this study is not sufficient to clarify how the relevant genes actually regulate growth and behavior.

      The information obtained from whole-body gene expression analysis is very limited.Various gene expression is associated with the regulation of fighting behaviors, food intake, growth, and metabolism, and these genes are regulated differently across tissues,even within a single individual. Gene expression analysis should be performed separately for each tissue.

      We understand the reviewer’s concern about whole‑body transcriptomes and agree that tissue‑specific sampling would provide greater resolution of the mechanisms linking gene expression to growth, agonistic behaviors, and food intake. For this initial study, however, we deliberately chose whole‑body samples to capture a broad, unbiased view of gene expression differences while keeping sequencing costs and sample requirements manageable. We explicitly acknowledge the resulting interpretational limits in the Discussion (lines 464; 529–533), and suggest in the last paragraph that the patterns reported here should be used to build on in future studies exploring targeted, tissue‑specific hypotheses.

      Clownfish undergo sex change depending on social status and body size, as the authors mention in the manuscript. Numerous gene expressions are affected by sex change. It is unclear how this issue was addressed.

      We thank the reviewer for raising this point. Sex change and sexual maturation can indeed drive major transcriptional shifts in clown anemonefish, but our experiment did not encompass such a life‑history transition. All individuals in this experiment were juveniles (≈1 month old at the start, ≈2 months old at the end) and were sexually immature at these ages. Clown anemonefish reach sexual maturation around one to two years under ideal conditions, can delay sexual maturation for years under normal conditions (Buston & García, 2007), and sex change in the genus Amphiprion is known to take over ~5 months (Moyer & Nakazono, 1978). Accordingly, individuals in this study were not sexually mature, and sex change was not biologically plausible over the five-week experimental period of our study. We recognize that the sentence at line 520 may be misleading, as we did not identify any gene expression signature that we could confidently associate with signs of sexual maturation. We will make sure that it is clearly stated that the fish in this study were sexually immature in the revised version.

      References:

      Buston, P. (2003). Forcible eviction and prevention of recruitment in the clown anemonefish. Behavioral Ecology, 14(4), 576–582. https://doi.org/10.1093/beheco/arg036

      Buston, P. M., & García, M. B. (2007). An extraordinary life span estimate for the clown anemonefish Amphiprion percula. Journal of Fish Biology, 70(6), 1710–1719. https://doi.org/10.1111/j.1095-8649.2007.01445.x

      Buston, P., & Clutton-Brock, Tim. (2022). Strategic growth in social vertebrates (WITH REVIEWER COMMENTS). Trends in Ecology & Evolution, 37(8), 694–705. https://doi.org/10.1016/j.tree.2022.03.010

      Dengler-Crish, C. M., & Catania, K. C. (2007). Phenotypic plasticity in female naked mole-rats after removal from reproductive suppression. THE JOURNAL OF EXPERIMENTAL BIOLOGY.

      Heg, D, Bender, N, & Hamilton, I. (2004). Strategic growth decisions in helper cichlids. Proceedings of the Royal Society of London. Series B: Biological Sciences, 271(suppl_6). https://doi.org/10.1098/rsbl.2004.0232

      Huchard, E, English, S, Bell, M B. V., Thavarajah, N, & Clutton-Brock, T. (2016). Competitive growth in a cooperative mammal. Nature, 533(7604), 532–534. https://doi.org/10.1038/nature17986

      Johnston, R A., Vullioud, P, Thorley, J, Kirveslahti, H., Shen, L., Mukherjee, S., Karner, C. M., Clutton-Brock, T, & Tung, J (2021). Morphological and genomic shifts in mole-rat ‘queens’ increase fecundity but reduce skeletal integrity. eLife, 10, e65760. https://doi.org/10.7554/eLife.65760

      Moyer, J. T., & Nakazono, A. (1978). Protandrous Hermaphroditism in Six Species of the Anemonefish Genus Amphiprion in Japan (No. 2). The Ichthyological Society of Japan. https://doi.org/10.11369/jji1950.25.101

      Reed, C., Branconi, R., Majoris, J., Johnson, C., & Buston, P. (2019). Competitive growth in a social fish. Biology Letters, 15(2), 20180737. https://doi.org/10.1098/rsbl.2018.0737

      Thorley, J, Katlein, N, Goddard, K, Zöttl, M, & Clutton-Brock, T. (2018). Reproduction triggers adaptive increases in body size in female mole-rats. Proceedings of the Royal Society B: Biological Sciences, 285(1880), 20180897. https://doi.org/10.1098/rspb.2018.0897

      Van Schaik, C P., & Van Hooff, J A. R. A. M. (1996). Toward an understanding of the orangutan’s social system. In Linda F. Marchant, Toshisada Nishida, & William C. McGrew (Eds.), Great Ape Societies (pp. 3–15). Cambridge University Press. https://doi.org/10.1017/CBO9780511752414.003

      Walker, S P. W., & McCormick, M I. (2009). Sexual selection explains sex-specific growth plasticity and positive allometry for sexual size dimorphism in a reef fish. Proceedings of the Royal Society B: Biological Sciences, 276(1671), 3335–3343. https://doi.org/10.1098/rspb.2009.0767

      Wong, M. Y. L., Buston, P. M., Munday, Philip L., & Jones, Geoffrey P. (2007). The threat of punishment enforces peaceful cooperation and stabilizes queues in a coral-reef fish. Proceedings of the Royal Society B: Biological Sciences, 274(1613), 1093–1099. https://doi.org/10.1098/rspb.2006.0284

    1. eLife Assessment

      In this important study, Bready et al. investigate how a highly conserved long-range enhancer mediates neural-specific SOX2 regulation during neural differentiation using human neural stem cells. This study has broad appeal to developmental neuroscience; however, the data remain incomplete given the need for homozygous enhancer knockouts and biological replicates in the scRNAseq assays.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, the authors examine how a developmentally regulated cis-regulatory element controls SOX2 expression during neural differentiation of human stem cells. The results suggest that this highly conserved long-range enhancer mediates neural-specific SOX2 regulation and offer insight into the role of promoter-enhancer contacts in this process. Although the findings are interesting, several limitations need to be addressed.

      Strengths:

      A central question in developmental biology is how genes are regulated in a context-dependent manner. SOX2, a major pluripotency factor, is expressed in diverse tissues during development, and therefore understanding the mechanisms that control its spatiotemporal expression is critical. This study addresses this important question by examining the functional relevance of a neural-specific, developmentally regulated SOX2 enhancer and its associated promoter-enhancer contacts in driving gene expression during human neural development. Using multiple model systems and techniques, the authors test the requirement of this enhancer by analyzing SOX2 expression in mutant lines, providing evidence for its role in this process.

      Weaknesses:

      A key limitation of the study is the absence of data from homozygous SOX2 enhancer deletion, which leaves the analysis incomplete and tempers the conclusions that can be drawn. Furthermore, the suitability of teratomas as a model system is questionable, given their limited capacity to recapitulate the spatial patterning, regional specification, and organized developmental processes characteristic of the human forebrain. Finally, the manuscript remains largely descriptive with little mechanistic insight.

    3. Reviewer #2 (Public review):

      Summary:

      The authors use a combination of genomics, genome conformation assays, and CRISPR-mediated deletion to study the transcriptional regulation of the SOX2 gene in human neural stem cells (hNSCs).

      Strengths:

      The authors show that two distal elements, located ~550kb downstream of the SOX2 gene, are important for SOX2 transcription in hNSC. They investigate both the deletion of these elements in established hNSCs and in hNSCs generated by differentiation of human pluripotent stem cells, suggesting these elements are important in both the establishment and maintenance of SOX2 expression in hNSCs.

      Weaknesses:

      Homologous elements have been studied in the mouse genome and have conserved function in mouse NSCs, yet these findings are not mentioned. Inclusion of biological replicates for the scRNA-seq and replicate CRISPR-deleted clones would strengthen the study.

    4. Author Response:

      eLife Assessment

      In this important study, Bready et al. investigate how a highly conserved long-range enhancer mediates neural-specific SOX2 regulation during neural differentiation using human neural stem cells. This study has broad appeal to developmental neuroscience; however, the data remain incomplete given the need for homozygous enhancer knockouts and biological replicates in the scRNAseq assays.

      We thank the expert reviewers and eLife editors Drs. Eade and White for complementing our work and deeming it an “important study” of “broad appeal to developmental neuroscience”. We also acknowledge some of the limitations of our work, including the lack of homozygous deletion of the enhancer element. As we detail below, we tried tirelessly to identify human embryonic stem cell (hESC) clones with homozygous deletions but were unable to. As we speculate in the discussion, this failure may represent a biological property of the enhancer element (possibly an essentiality manifested even in hESCs), or a technical limitation related to the large size (2.7 kb) of the genomic element targeted for deletion. We also clarify that every scRNAseq assay included cells from multiple teratomas.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this study, the authors examine how a developmentally regulated cis-regulatory element controls SOX2 expression during neural differentiation of human stem cells. The results suggest that this highly conserved long-range enhancer mediates neural-specific SOX2 regulation and offer insight into the role of promoter-enhancer contacts in this process. Although the findings are interesting, several limitations need to be addressed.

      Strengths:

      A central question in developmental biology is how genes are regulated in a context-dependent manner. SOX2, a major pluripotency factor, is expressed in diverse tissues during development, and therefore understanding the mechanisms that control its spatiotemporal expression is critical. This study addresses this important question by examining the functional relevance of a neural-specific, developmentally regulated SOX2 enhancer and its associated promoter-enhancer contacts in driving gene expression during human neural development. Using multiple model systems and techniques, the authors test the requirement of this enhancer by analyzing SOX2 expression in mutant lines, providing evidence for its role in this process.

      We thank the reviewer for highlighting the significance of our work in the field of developmental biology.

      Weaknesses:

      A key limitation of the study is the absence of data from homozygous SOX2 enhancer deletion, which leaves the analysis incomplete and tempers the conclusions that can be drawn. Furthermore, the suitability of teratomas as a model system is questionable, given their limited capacity to recapitulate the spatial patterning, regional specification, and organized developmental processes characteristic of the human forebrain. Finally, the manuscript remains largely descriptive with little mechanistic insight.

      We appreciate the reviewer’s disappointment with lack of data from a homozygous SOX2 enhancer deletion. We too felt disappointed when we started genotyping our hESC clones. In fact, we spent a year screening multiple hESC clones for a homozygous deletion but were unable to find one. We performed several assays to better characterize the heterozygous clones, including Sanger sequencing, whole-genome sequencing (WGS) and fluorescent in situ hybridization (FISH). All assays pointed in the direction of hemizygous deletion. We do not understand the reasons for the absence of homozygous deletion clones. One possibility is that homozygous deletion of the enhancer is selected against in hESCs, thus preventing growth of colonies. Another possibility is the technical challenge of achieving a large deletion (2.7 kb) in hESCs. We also entertained the possibility of the excised enhancer being excised from the genome but retained as extrachromosomal (ec) DNA, thus producing the hemizygous genotype. However, several assays, such as FISH and PCR diagnostics, argued against this possibility.

      The teratoma assay was chosen as an in vivo metric of spontaneous differentiation of hESCs into the three germ layers, because our overarching hypothesis was that perturbing the enhancer element and 3D chromatin loop regulating SOX2 transcription would impair specification of neuroectodermal precursors. We believe that teratomas offer an opportunity to allow pluripotent cells to declare any predilections toward germ layers in unbiased fashion. Importantly, we did not rely solely on teratomas to assess effects of our genomic perturbations on specification of neuroectoderm, but also pursued cerebral organoids as an orthogonal approach focused on the tissue of interest, the central nervous system.

      Our work does not only describe an important mechanism for regulation of SOX2 transcription in the transition from pluripotency to neuroectodermal specification, but also provides mechanistic insight into the question of whether the developmentally co-regulated activation of the enhancer and formation of the 3D chromatin loop are dependent on each other. Our findings indicate that the two processes occur independently of each other, as evidenced by the fact that the enhancer is uncoupled from chromatin folding, as occurs when the adjacent CTCF motif is deleted. This finding raises the possibility that enhancer activation occurs through yet to be determined transcriptional events, and that establishment of the local 3D chromatin architecture helps fine-tune its influences in the Topologically Associating Domain (TAD) of interest.

      We are further pursuing mechanisms that regulate activation of the enhancer within neuroectodermal lineages and may explain its actions on genomic elements other than the SOX2 locus within the relevant TAD. We are also investigating reasons explaining why hemizygous enhancer deletion produces stronger phenotypes than deletion of the CTCF motif that helps stabilize the 3D chromatin loop.

      Reviewer #2 (Public review):

      Summary:

      The authors use a combination of genomics, genome conformation assays, and CRISPR-mediated deletion to study the transcriptional regulation of the SOX2 gene in human neural stem cells (hNSCs).

      Strengths:

      The authors show that two distal elements, located ~550kb downstream of the SOX2 gene, are important for SOX2 transcription in hNSC. They investigate both the deletion of these elements in established hNSCs and in hNSCs generated by differentiation of human pluripotent stem cells, suggesting these elements are important in both the establishment and maintenance of SOX2 expression in hNSCs.

      We thank the reviewer for appreciating the importance of this regulatory mechanism in the establishment and maintenance of SOX2 expression in the human neural lineage.

      Weaknesses:

      Homologous elements have been studied in the mouse genome and have conserved function in mouse NSCs, yet these findings are not mentioned. Inclusion of biological replicates for the scRNA-seq and replicate CRISPR-deleted clones would strengthen the study.

      We appreciate the recommendation of the reviewer to better acknowledge prior work in mouse neural development. We will ensure full acknowledgment of these studies in the revised manuscript.

      We also appreciate the suggestion for biological replicates in our scRNA-seq assays. We clarify that each scRNA-seq arose from combining multiple teratomas from each experimental group, thus ensuring that findings reflect reproducible biology rather than isolated findings from single teratomas. This clarification will be emphasized in the revised manuscript.

      Finally, we absolutely agree with the reviewer that more CRISPR-deleted clones would have strengthened the study. Unfortunately, we realized that characterization of each clone takes multiple years and addition of more clones would have made the study too lengthy.

    1. eLife Assessment

      This fundamental work substantially advances our understanding of short-term plasticity mechanisms by providing evidence for release-independent low-frequency synaptic depression that reflects a redistribution of vesicles within the readily releasable pool, via a reduction in docking site occupancy due to vesicle undocking. The evidence supporting this model is convincing, with rigorous electrophysiological and computational analysis. The work will be of broad interest to cellular neuroscientists and synaptic physiologists.

    2. Reviewer #1 (Public review):

      Summary:

      In this work, the authors investigate the mechanisms of low-frequency synaptic depression at cerebellar parallel fiber to interneuron synapses using unitary recordings that allow direct quantification of synaptic vesicle release. They show that sparse stimulation can induce robust synaptic depression even in the absence of substantial vesicle consumption, and that this depressed state is rapidly reversed when stimulation frequency is increased. To account for these observations, the authors propose a model in which low-frequency depression reflects a redistribution of vesicles within the readily releasable pool, in particular, a reduction in docking site occupancy due to vesicle undocking.

      Strengths:

      I found the experimental work to be of high quality throughout. The use of simple synapse recordings to count individual vesicle release events is particularly powerful in this context and allows questions to be addressed that are difficult to approach with more conventional approaches. The demonstration that low-frequency depression can occur independently of prior vesicle release, together with the rapid recovery observed during high-frequency stimulation, places strong constraints on possible underlying mechanisms and represents a clear strength of the study.

      The modeling framework is clearly laid out and helps organize a broad set of observations across stimulation frequencies. Several of the experimental tests appear well-motivated by the model, including the recovery train experiments, the analysis of failures, and the use of doublet stimulation. Taken together, the data provide a coherent phenomenological description of low-frequency depression and its relationship to vesicle availability within the readily releasable pool.

      Weaknesses:

      While the experimental results are strong, the manuscript would benefit from rebalancing the strength of the mechanistic conclusions drawn from the modeling in light of its limitations. The framework is clearly useful and provides a coherent interpretation of the data, but it is not uniquely constrained by the experimental observations, and alternative models or interpretations could plausibly account for the findings. The use of different model regimes concatenated across time, with substantially different parameter values, highlights the abstract nature of the approach. For these reasons, the model seems best presented as one plausible explanatory framework rather than a definitive biological mechanism. Clarifying the distinction between data-driven observations and model-based inferences would help readers assess which conclusions are strongly supported and which remain more speculative.

      The interpretation of the Ca2+-related experiments would benefit from more cautious wording. The absence of detectable changes in presynaptic Ca2+ signals does not exclude more localized or subtle Ca2+-dependent mechanisms, and conclusions regarding Ca2+ independence should therefore be framed accordingly. In addition, while low-frequency depression is still observed at reduced extracellular Ca2+, these experiments appear less diagnostic of the specific model-derived mechanism emphasized elsewhere in the manuscript - namely, a selective reduction in docking-site occupancy - and should be discussed with appropriate qualification in the text.

      Major points:

      (1) Clarify and qualify mechanistic claims derived from the model.

      Throughout the manuscript, changes in model parameters are at times described as if they directly reflected underlying physiological mechanisms. As a result, the conceptual distinction between experimentally observed phenomena, model-derived variables, and biological interpretation is not always clear. Several conclusions in the Results and Discussion are phrased as mechanistic statements, although they rest on assumptions intrinsic to the modeling framework. The authors should systematically review the text and explicitly distinguish between (i) experimentally observed changes in synaptic responses and (ii) inferences about vesicle docking states or transitions within the model.

      In particular, statements implying that vesicle undocking is the mechanism underlying low-frequency depression should be rephrased to reflect that this is an interpretation within the proposed framework rather than a uniquely demonstrated biological process. For example, statements such as "Low-frequency depression is caused by synaptic vesicle undocking" should be replaced with formulations such as "Within the framework of our model, low-frequency depression is accounted for by a redistribution of synaptic vesicles away from docking sites" or "Our results are consistent with a model in which changes in vesicle docking-state occupancy contribute to low-frequency depression."

      A particularly problematic example is the statement that "these experiments further confirm that LFD only involves a decrease in δ, without accompanying changes in ρ or IP size." Here, an experimentally defined phenomenon (LFD) is directly equated with changes in model-derived variables. Such statements should be revised to make clear that δ, ρ, and IP size are inferred quantities within the model, and that the experimental data are interpreted through this framework rather than directly confirming changes in these parameters. Similarly, over-generalizing statements such as "Undocking therefore represents the key mechanism controlling short-term depression across stimulation frequencies" should be softened to reflect that this conclusion emerges from the model rather than from direct experimental evidence.

      (2) Address the biological interpretation of time-dependent model regimes.

      The model relies on distinct parameter regimes applied at different time points, with some transitions effectively suppressed in certain regimes. While this approach captures the data well, its biological interpretation remains unclear. The authors should either (i) expand the discussion to outline plausible biological processes that could give rise to such regime changes (for example, calcium-dependent modulation of transition rates or activity-dependent changes in vesicle state stability), or (ii) more explicitly frame this aspect of the model as a descriptive abstraction rather than a mechanistic proposal. This further underscores the need to clearly separate the descriptive role of the model from claims about underlying biological mechanisms.

      (3) Reframe conclusions drawn from calcium-related experiments.

      The calcium imaging data demonstrate no detectable changes in the measured presynaptic calcium signals under the tested conditions, but they do not rule out that calcium signals contribute in ways undetectable by the assay. Conclusions should therefore be revised to reflect this limitation, avoiding statements that exclude a role for calcium-dependent mechanisms. Wording such as "we did not detect evidence for..." would be more appropriate than conclusions implying the absence of an effect.

      Similarly, while low-frequency depression is still observed at reduced extracellular calcium (1.5 mM Ca²⁺), the specific mechanistic signature emphasized elsewhere in the manuscript - namely a selectively reduced first response during a high-frequency recovery train - is no longer apparent. These experiments should therefore be discussed as consistent with the proposed framework, but not as providing independent support for a selective reduction in docking-site occupancy. Explicitly acknowledging this limitation would improve clarity and avoid over-interpreting these data.

      (4) Soften interpretations based on non-significant comparisons.

      In several places, comparisons that do not reach statistical significance are used to argue for equivalence between conditions (for example, comparisons involving failure versus non-failure trials or different LFD conditions). These conclusions should be revised to emphasize the limits of statistical power and framed as a lack of evidence for a difference rather than evidence of independence.

    3. Reviewer #2 (Public review):

      Summary:

      Silva and co-workers exploit their previously established methods of analyzing release events at single parallel fiber to molecular layer interneuron synapses. They observed synaptic depression at low transmission frequencies (< 5 Hz), which rapidly recovers during high-frequency transmission. Analysis of the time course of low-frequency depression revealed an initial rapid and a slow linearly increasing time course. Strikingly, the initial depression occurred even in the absence of preceding release, arguing against vesicle depletion as the underlying mechanism.

      Strengths:

      The main strength of the study is the careful demonstration of an interesting synaptic phenomenon challenging the classical vesicle-centered interpretation of synaptic depression.

      Weaknesses:

      No major weaknesses were identified by this reviewer.

      The finding of release-independent synaptic depression is important and would have widespread implications. Therefore, some more analyses to increase the confidence in these findings could be performed.

      My concern is whether rundown could explain the findings. If the rate of failures in s1 increases and at the same time the amplitude decreases during the experiments, an apparent depression in s2 could arise. The Supplementary Figure 5A addresses run-down, but the figure is not easy to understand, and, as far as I understood, it does not address the question of whether the release-independent depression could be caused by a rundown. To address this, the analysis of Figure 5 could be repeated by investigating the failure rate and amplitude separately or by analyzing the 1st and 2nd half of the recordings separately.

    4. Reviewer #3 (Public review):

      Summary:

      The manuscript builds on the observation that, at some synapses, low-frequency stimulation causes synaptic depression, which can be reversed by subsequent high-frequency stimulation. Such low-frequency depression (LFD) cannot be easily explained by the depletion of a single vesicle pool. Here, Silva and colleagues propose a model of activity-dependent vesicle trafficking to explain LFD at synapses between cerebellar granule cells and molecular layer interneurons.

      Strengths:

      Overall, LFD is interesting and worthy of examination, and the authors provide new experimental results that are of the high quality expected from this group.

      Weaknesses:

      The study proposes a novel model of vesicle trafficking that is not explained by known biological mechanisms, and the manuscript does not adequately compare or discuss alternative models.

      I have several concerns about how the authors interpret the data. First, the manuscript's primary conceptual advance is the idea that LFD involves vesicle undocking, rather than depletion. However, most experiments were performed under conditions that promote vesicle depletion (3 mM extracellular Ca2+). When experiments were repeated in physiological Ca2+, there appeared to be little or no LFD (stats are not provided). Second, the RS/DS/DU/undocking model, though not outside the realm of possibility, is not readily explained by known mechanisms and is only loosely supported by experimental findings. Third, when simulating LFD, the authors do not compare alternative models and use inappropriate language to imply that a model fit represents the truth (e.g., "the finding of identical experimental and simulated values confirms that the undocking mechanism accounts for LFD"). Finally, the model is presented in an overly complicated manner. The sheer amount of terms and nomenclature makes the manuscript confusing and difficult to read. Overall, the manuscript would benefit from added experiments and more statistics, a better justification and evaluation of the model, and more nuanced language.

      Major concerns:

      (1) Most experiments were performed under conditions that exacerbate depletion

      In order to attribute LFD to vesicle undocking rather than depletion, it is important to show LFD under conditions where depletion is minimal. As mentioned above, the authors only report significant LFD in elevated extracellular Ca2+. In a small number of experiments performed in more physiological Ca2+ (1.5 mM), there is no depression after a single stimulus, and it is not clear that there was statistically significant depression during a low-frequency train. Several studies cited in support of LFD share this problem:

      • Abrahamsson et al., (2007) recorded from Schaffer collaterals in 4 mM Ca, 3-4X physiological Ca2+.

      • Doussau et al., (2010) recorded from aplysia synapses in 3X Ca compared to seawater.

      • Rudolph et al., (2011) is cited as an example of LFD. However, this study performed experiments at high release probability cerebellar climbing fibers, and reported depression that increased monotonically with

      stimulation frequency, so it does not resemble the phenomenon studied in this paper. Lin et al., (2022) also largely describe monotonic depression at the calyx.

      The authors note that their results differ from those of Atluri and Regehr, but do not mention that a possible reason for the difference is the increased release probability in their experiments.

      The authors should provide statistics for the data obtained in 1.5 mM Ca, and discuss why LFD is increased in conditions that also elevate vesicle release probability.

      (2) Lack of biological mechanisms supporting the model

      The model is presented without compelling biological support. The evidence in support of vesicle undocking comes from experiments by the Watanabe lab, which showed fewer-than-expected docked vesicles under EM when cultured synapses were stimulated immediately prior to high-pressure freezing. Kusick et al were careful to note that these vesicles may have been lost to fusion.

      The putative undocking Kusick describes is immediate (< 5 ms after stimulation), and was not shown to be Ca2+ sensitive. This manuscript describes "calcium-dependent undocking" that proceeds from 10 ms - 200 ms. Multiple studies from the Watanabe lab show that a single stimulus lowers the number of docked vesicles, and subsequently, there is a transient redocking of vesicles that can be blocked by EGTA or Syt7 knockout.

      I also question the rationale for the authors' model that 2 vesicles are coupled in series to a single release site. Previous papers from this lab cited EM studies from frog and neuromuscular that showed filamentous connections between vesicles (do these synapses show LFD?). Here, the authors primarily cite their previous models to support their arguments. I encourage them to continue searching for ultrastructural evidence for 2-vesicle-docking-units and to cite such studies.

      (3) Comparison to other vesicle models

      The authors use overly assertive language to suggest that the model proves a mechanism. "Altogether, these results indicate that the slow phase of LFD ... reflects a δ decrease without significant changes in pr, in ρ or in IP size". Simulating data does not conclusively "indicate" the underlying mechanism, but the authors could state their data can be "explained by a model where..".

      However, LFD does not require activity-dependent undocking. Instead, the phenomenon has been explained by high-release probability, paired with an activity-dependent increase in either docking or release probability (Chiu and Carter, 2024; Doussau et al., 2017). Does the new model do a better job of replicating some facet of the data? If multiple models can explain the same data, how can we determine which model is correct? The "Alternative Presynaptic Depression Mechanisms" should be expanded to discuss these issues.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this work, the authors investigate the mechanisms of low-frequency synaptic depression at cerebellar parallel fiber to interneuron synapses using unitary recordings that allow direct quantification of synaptic vesicle release. They show that sparse stimulation can induce robust synaptic depression even in the absence of substantial vesicle consumption, and that this depressed state is rapidly reversed when stimulation frequency is increased. To account for these observations, the authors propose a model in which low-frequency depression reflects a redistribution of vesicles within the readily releasable pool, in particular, a reduction in docking site occupancy due to vesicle undocking.

      Strengths:

      I found the experimental work to be of high quality throughout. The use of simple synapse recordings to count individual vesicle release events is particularly powerful in this context and allows questions to be addressed that are difficult to approach with more conventional approaches. The demonstration that low-frequency depression can occur independently of prior vesicle release, together with the rapid recovery observed during high-frequency stimulation, places strong constraints on possible underlying mechanisms and represents a clear strength of the study.

      The modelling framework is clearly laid out and helps organize a broad set of observations across stimulation frequencies. Several of the experimental tests appear well-motivated by the model, including the recovery train experiments, the analysis of failures, and the use of doublet stimulation. Taken together, the data provide a coherent phenomenological description of low-frequency depression and its relationship to vesicle availability within the readily releasable pool.

      We thank the Reviewer for his positive assessment of our work.

      Weaknesses:

      While the experimental results are strong, the manuscript would benefit from rebalancing the strength of the mechanistic conclusions drawn from the modelling in light of its limitations. The framework is clearly useful and provides a coherent interpretation of the data, but it is not uniquely constrained by the experimental observations, and alternative models or interpretations could plausibly account for the findings. The use of different model regimes concatenated across time, with substantially different parameter values, highlights the abstract nature of the approach. For these reasons, the model seems best presented as one plausible explanatory framework rather than a definitive biological mechanism. Clarifying the distinction between data-driven observations and model-based inferences would help readers assess which conclusions are strongly supported and which remain more speculative.

      The interpretation of the Ca<sup>2+</sup>-related experiments would benefit from more cautious wording. The absence of detectable changes in presynaptic Ca<sup>2+</sup> signals does not exclude more localized or subtle Ca<sup>2+</sup>-dependent mechanisms, and conclusions regarding Ca<sup>2+</sup> independence should therefore be framed accordingly. In addition, while low-frequency depression is still observed at reduced extracellular Ca<sup>2+</sup>, these experiments appear less diagnostic of the specific model-derived mechanism emphasized elsewhere in the manuscript - namely, a selective reduction in docking-site occupancy - and should be discussed with appropriate qualification in the text.

      Concerning Ca<sup>2+</sup> signals, the Reviewer is right. While we found no change in Ca<sup>2+</sup> signalling apart from a slow Ca<sup>2+</sup> accumulation during long trains at 1 Hz, the possibility of an undetected change cannot be excluded. We have added a word of caution in this direction on p. 11. Concerning the 1.5 mM Ca<sup>2+</sup> experiments, the Reviewer presumably alludes to the first recovery train (yellow) point in Supplementary Fig. 2C. This is also the last point (s11) of the slow train at 0.5 Hz because no delay at all was interposed between the slow train and the recovery train. We have now included one more experiment (with a present total number n = 6), and we have corrected Fig. S2C accordingly. In the new version the depression measured for s4-s10 vs s1 during the 0.5 Hz trains is 0.69 +/- 0.05 (p = 0.00058, paired one-tail t-test). The ratio of the s1 value of the recovery train compared to control s1 is 0.83 +/- 0.08 (p = 0.028, paired one-tail t-test).

      Major points:

      (1) Clarify and qualify mechanistic claims derived from the model.

      Throughout the manuscript, changes in model parameters are at times described as if they directly reflected underlying physiological mechanisms. As a result, the conceptual distinction between experimentally observed phenomena, model-derived variables, and biological interpretation is not always clear. Several conclusions in the Results and Discussion are phrased as mechanistic statements, although they rest on assumptions intrinsic to the modelling framework. The authors should systematically review the text and explicitly distinguish between (i) experimentally observed changes in synaptic responses and (ii) inferences about vesicle docking states or transitions within the model.

      In particular, statements implying that vesicle undocking is the mechanism underlying low-frequency depression should be rephrased to reflect that this is an interpretation within the proposed framework rather than a uniquely demonstrated biological process. For example, statements such as "Low-frequency depression is caused by synaptic vesicle undocking" should be replaced with formulations such as "Within the framework of our model, low-frequency depression is accounted for by a redistribution of synaptic vesicles away from docking sites" or "Our results are consistent with a model in which changes in vesicle docking-state occupancy contribute to low-frequency depression."

      A particularly problematic example is the statement that "these experiments further confirm that LFD only involves a decrease in δ, without accompanying changes in ρ or IP size." Here, an experimentally defined phenomenon (LFD) is directly equated with changes in model-derived variables. Such statements should be revised to make clear that δ, ρ, and IP size are inferred quantities within the model, and that the experimental data are interpreted through this framework rather than directly confirming changes in these parameters. Similarly, overgeneralizing statements such as "Undocking therefore represents the key mechanism controlling short-term depression across stimulation frequencies" should be softened to reflect that this conclusion emerges from the model rather than from direct experimental evidence.

      As suggested, we clarify the distinction in the revised version between experimental data and modelling, and we refrain from making definitive statements on underlying cellular mechanisms.

      (2) Address the biological interpretation of time-dependent model regimes.

      The model relies on distinct parameter regimes applied at different time points, with some transitions effectively suppressed in certain regimes. While this approach captures the data well, its biological interpretation remains unclear. The authors should either (i) expand the discussion to outline plausible biological processes that could give rise to such regime changes (for example, calcium-dependent modulation of transition rates or activity-dependent changes in vesicle state stability), or (ii) more explicitly frame this aspect of the model as a descriptive abstraction rather than a mechanistic proposal. This further underscores the need to clearly separate the descriptive role of the model from claims about underlying biological mechanisms.

      We thank the Reviewer for drawing our attention to this important point. Below 10 ms, rate constants are largely determined by the large amplitude, fast decaying Ca<sup>2+</sup> signal occurring near voltage-dependent Ca<sup>2+</sup> channels (‘Ca<sup>2+</sup> nanodomain’). After 10 ms, the rate constants depend on the low amplitude, slowly decaying Ca<sup>2+</sup> signals averaged over the entire varicosity (‘volume-averaged Ca<sup>2+</sup>’). We explain this better in the revised version (Materials and Methods, p. 21).

      (3) Reframe conclusions drawn from calcium-related experiments.

      The calcium imaging data demonstrate no detectable changes in the measured presynaptic calcium signals under the tested conditions, but they do not rule out that calcium signals contribute in ways undetectable by the assay. Conclusions should therefore be revised to reflect this limitation, avoiding statements that exclude a role for calcium-dependent mechanisms. Wording such as "we did not detect evidence for..." would be more appropriate than conclusions implying the absence of an effect.

      Similarly, while low-frequency depression is still observed at reduced extracellular calcium (1.5 mM Ca<sup>2+</sup>), the specific mechanistic signature emphasized elsewhere in the manuscript - namely a selectively reduced first response during a high-frequency recovery train - is no longer apparent. These experiments should therefore be discussed as consistent with the proposed framework, but not as providing independent support for a selective reduction in docking-site occupancy. Explicitly acknowledging this limitation would improve clarity and avoid overinterpreting these data.

      This has been discussed above (‘weaknesses’).

      (4) Soften interpretations based on non-significant comparisons.

      In several places, comparisons that do not reach statistical significance are used to argue for equivalence between conditions (for example, comparisons involving failure versus non-failure trials or different LFD conditions). These conclusions should be revised to emphasize the limits of statistical power and framed as a lack of evidence for a difference rather than evidence of independence.

      We have attended this point in the revised version.

      Reviewer #2 (Public review):

      Summary:

      Silva and co-workers exploit their previously established methods of analyzing release events at single parallel fiber to molecular layer interneuron synapses. They observed synaptic depression at low transmission frequencies (< 5 Hz), which rapidly recovers during high-frequency transmission. Analysis of the time course of low-frequency depression revealed an initial rapid and a slow linearly increasing time course. Strikingly, the initial depression occurred even in the absence of preceding release, arguing against vesicle depletion as the underlying mechanism.

      Strengths:

      The main strength of the study is the careful demonstration of an interesting synaptic phenomenon challenging the classical vesicle-centered interpretation of synaptic depression.

      We thank the Reviewer for his positive assessment of our work.

      Weaknesses:

      No major weaknesses were identified by this reviewer.

      The finding of release-independent synaptic depression is important and would have widespread implications. Therefore, some more analyses to increase the confidence in these findings could be performed.

      My concern is whether rundown could explain the findings. If the rate of failures in s1 increases and at the same time the amplitude decreases during the experiments, an apparent depression in s2 could arise. The Supplementary Figure 5A addresses run-down, but the figure is not easy to understand, and, as far as I understood, it does not address the question of whether the release-independent depression could be caused by a rundown. To address this, the analysis of Figure 5 could be repeated by investigating the failure rate and amplitude separately or by analyzing the 1st and 2nd half of the recordings separately.

      The Reviewer makes a very important point that had escaped our attention. If the responses were declining over the course of an experiment, near the end of the recordings, a high proportion of failures would be associated with a weak response to the second AP. This could distort the relation between initial failures and amount of LFD, perhaps to the point of indicating LFD after failures when there were none. As suggested by the Reviewer, we tested this possibility by examining the stability of the synaptic responses during experiments. We found a mean s<sub>1</sub> value of 0.87 ± 0.13 for the first half of the experiments used in Fig. 5, and of 1.10 ± 0.17 for the second half (p > 0.05, n = 10). This analysis shows that there was no rundown during these experiments. We show in Author response image 1 a plot of s1 as a function of the number of experiments. These plots do not suggest any artefactual correlation between failures, mean s1, and rundown.

      Author response image 1.

      Plot of s1 as a function of train number for the experiments of Fig. 5. In response to a request of Reviewer 2, this figure illustrates the evolution of s1 values as a function of train number for the experiments used to produce Figure 5. In each experiment, about 20 s1 values were obtained at two ISIs (either 10 ms and 500 ms, or 800 ms and 1600 ms). The figure shows two examples of s1 values as a function of train number (these values fluctuate widely between 0 and 3), and the average across cells and ISI values. There is no indication of a rundown of S1 values as a function of train number

      Reviewer #3 (Public review):

      Summary:

      The manuscript builds on the observation that, at some synapses, low-frequency stimulation causes synaptic depression, which can be reversed by subsequent high-frequency stimulation. Such low-frequency depression (LFD) cannot be easily explained by the depletion of a single vesicle pool. Here, Silva and colleagues propose a model of activity-dependent vesicle trafficking to explain LFD at synapses between cerebellar granule cells and molecular layer interneurons.

      Strengths:

      Overall, LFD is interesting and worthy of examination, and the authors provide new experimental results that are of the high quality expected from this group.

      Weaknesses:

      The study proposes a novel model of vesicle trafficking that is not explained by known biological mechanisms, and the manuscript does not adequately compare or discuss alternative models.

      I have several concerns about how the authors interpret the data. First, the manuscript's primary conceptual advance is the idea that LFD involves vesicle undocking, rather than depletion. However, most experiments were performed under conditions that promote vesicle depletion (3 mM extracellular Ca<sup>2+</sup>). When experiments were repeated in physiological Ca<sup>2+</sup>, there appeared to be little or no LFD (stats are not provided). Second, the RS/DS/DU/undocking model, though not outside the realm of possibility, is not readily explained by known mechanisms and is only loosely supported by experimental findings. Third, when simulating LFD, the authors do not compare alternative models and use inappropriate language to imply that a model fit represents the truth (e.g., "the finding of identical experimental and simulated values confirms that the undocking mechanism accounts for LFD"). Finally, the model is presented in an overly complicated manner. The sheer amount of terms and nomenclature makes the manuscript confusing and difficult to read. Overall, the manuscript would benefit from added experiments and more statistics, a better justification and evaluation of the model, and more nuanced language.

      We respectfully disagree with these sweeping criticisms, as described in more detail below.

      Major concerns:

      (1) Most experiments were performed under conditions that exacerbate depletion

      In order to attribute LFD to vesicle undocking rather than depletion, it is important to show LFD under conditions where depletion is minimal. As mentioned above, the authors only report significant LFD in elevated extracellular Ca<sup>2+</sup>. In a small number of experiments performed in more physiological Ca<sup>2+</sup> (1.5 mM), there is no depression after a single stimulus, and it is not clear that there was statistically significant depression during a low-frequency train. Several studies cited in support of LFD share this problem:

      - Abrahamsson et al., (2007) recorded from Schaffer collaterals in 4 mM Ca, 3-4X physiological Ca<sup>2+</sup>.

      - Doussau et al., (2010) recorded from Aplysia synapses in 3X Ca compared to seawater.

      - Rudolph et al., (2011) is cited as an example of LFD. However, this study performed experiments at high release probability cerebellar climbing fibers, and reported depression that increased monotonically with stimulation frequency, so it does not resemble the phenomenon studied in this paper. Lin et al., (2022) also largely describe monotonic depression at the calyx.

      The Reviewer suggests that LFD may only occur under non-physiological conditions, if the release probability has been increased by artificially elevating the extracellular Ca<sup>2+</sup>. The implication is that LFD is at best a curiosity with little or no significance for brain signalling. We disagree with this point of view for several reasons.

      Concerning the statement ‘In order to attribute LFD to vesicle undocking rather than depletion, it is important to show LFD under conditions where depletion is minimal’: This is the purpose of the analysis shown in Fig. 5.

      The statement ‘the authors only report significant LFD in elevated extracellular Ca<sup>2+</sup>’ is inaccurate. Fig. S2C shows a clear LFD in 1.5 mM Ca<sup>2+</sup>, as acknowledged by Reviewer 1 (‘low-frequency depression is still observed at reduced extracellular Ca<sup>2+</sup>’). However, we failed to provide a p-value for the depression in the initial version of the paper (p = 0.004, n = 5, with this data set; paired t-test, one-tailed). In the revised version, we document the 1.5 mM results more extensively, including the incorporation of the results of an additional experiment, and an explicit statistical analysis of the data (p = 0.00058, n = 6; paired t-test, one-tailed).

      Concerning the statement ‘there is no depression after a single stimulus’: We find that the onset kinetics of LFD is slower in 1.5 Ca<sup>2+</sup> than in 3 Ca<sup>2+</sup> (respectively 1.8 ISI and 0.51 ISI, Fig. 2C and Fig. S2C). This explains that the PPR is not significantly <1 in 1.5 Ca<sup>2+</sup> without implying any weakening of extent of LFD at steady state.

      As explained in the manuscript (p. 5), in a previous work, we developed a method to ascribe changes in SV pools, within the RS/DS model, with specific modifications of s1, s2 and s5-s8 during test 100 Hz trains (Tran et al., 2022). This method was developed in 3 mM Ca<sup>2+</sup> conditions, and for this reason, we performed most experiments for the present work in 3 mM Ca<sup>2+</sup>.

      Chiu and Carter (2024) demonstrated LFD in neocortical synapses; they performed their study in 1.2 mM Ca<sup>2+</sup>, not in elevated Ca<sup>2+</sup>.

      Rudolph et al. (2011) showed low frequency depression not only in elevated external Ca<sup>2+</sup>, but also in 0.5 mM Ca<sup>2+</sup>. While Rudolph et al. (2011) did not make an explicit link between their observations and LFD, there is no reason to doubt that these observations are an example of LFD. They showed a biphasic depression when switching the stimulation frequency from 0.05 Hz to 2 Hz. In one of the founding papers of LFD, Doussau et al. (2010) describe a biphasic depression when switching the stimulation frequency from 0.025 Hz to 1 Hz; the Fig. 1 of the two papers (Rudolph 2011 and Doussau 2010) are strikingly similar.

      Lin et al. (2022) would probably not agree with the statement that the depression at the calyx is ‘largely monotonic’, as they stress the finding of quasi-constant depression between 5 and 50 Hz.

      The authors note that their results differ from those of Atluri and Regehr, but do not mention that a possible reason for the difference is the increased release probability in their experiments.

      In fact, we clearly listed the difference in external Ca<sup>2+</sup> as a likely source of the discrepancy by saying ‘This discrepancy presumably stems from differences in experimental conditions (room temperature, stimulation of multiple presynaptic PFs and 2 mM external Ca<sup>2+</sup> concentration in the previous work, vs. near-physiological temperature, single presynaptic stimulation and 3 mM external Ca<sup>2+</sup> here)’.

      The authors should provide statistics for the data obtained in 1.5 mM Ca, and discuss why LFD is increased in conditions that also elevate vesicle release probability.

      See our comments above: the revised version includes the requested statistics. On p. 6 of the manuscript, we do provide an explanation for the apparent lack of LFD at 1.5 Ca<sup>2+</sup> and 2 Hz, namely a superimposition of LFD with facilitation. At 1.5 Ca<sup>2+</sup> and 0.5 Hz, our LFD numbers are not weaker than at 3 mM Ca<sup>2+</sup> and 0.5 Hz of 1 Hz.

      Altogether, it is correct that many LFD experiments have been carried out in high release probability synapses, and/or under conditions of elevated Ca<sup>2+</sup>. However, the reasons underlying these choices are diverse (in our case, to build on the previous SV pool analysis developed in Tran et al. 2022 in 3 Ca<sup>2+</sup> conditions) and do not imply a limitation to the phenomenon. LFD is present in physiological conditions for low-to-moderate release probability synapses (as shown in our work), and altogether, there is no reason to dismiss LFD as nonphysiological.

      (2) Lack of biological mechanisms supporting the model

      The model is presented without compelling biological support. The evidence in support of vesicle undocking comes from experiments by the Watanabe lab, which showed fewerthanexpected docked vesicles under EM when cultured synapses were stimulated immediately prior to high-pressure freezing. Kusick et al were careful to note that these vesicles may have been lost to fusion.

      The Watanabe lab showed an SV deficit at docking sites at times ranging from about 100 ms to several seconds (Kusick et al., 2020, their Fig. 5E). This corresponds to the ISI values where we see paired-pulse depression. In their Summary, Kusick et al. raise the possibility of SV fusion as an alternative to undocking at the 100 ms time point. But, the same issue had previously been considered in Miki et al., 2018 with other techniques (their Fig. 2d), where it was shown that the SV deficit seen in paired-pulse experiments could not be explained by fusion. This leaves undocking as the most likely explanation, at least in our preparation. We have added a new paragraph on p. 14 to clarify this point.

      The putative undocking Kusick describes is immediate (< 5 ms after stimulation), and it was not shown to be Ca<sup>2+</sup> sensitive. This manuscript describes "calcium-dependent undocking" that proceeds from 10 ms - 200 ms. Multiple studies from the Watanabe lab show that a single stimulus lowers the number of docked vesicles, and subsequently, there is a transient redocking of vesicles that can be blocked by EGTA or Syt7 knockout.

      This is not an accurate description of the Kusick results or of our results. In the Kusick paper, the SV deficit seen at <5 ms after stimulation is attributed to exocytosis, not to undocking. Clearly, it is Ca<sup>2+</sup> dependent. Our manuscript describes potential calcium-dependent undocking not during the time 10 ms- 150 ms, during which our undocking rate is assumed to be calcium-independent, but starting at 150 ms, and lasting a few hundred ms thereafter.

      I also question the rationale for the authors' model that 2 vesicles are coupled in series to a single release site. Previous papers from this lab cited EM studies from frog and neuromuscular that showed filamentous connections between vesicles (do these synapses show LFD?). Here, the authors primarily cite their previous models to support their arguments. I encourage them to continue searching for ultrastructural evidence for 2-vesicle-docking-units and to cite such studies.

      It is important to remember that our sequential two-step model was not based on EM data, but on a series of functional data including variance-mean analysis of summed SV release numbers; covariance analysis among subsequent SV release numbers; analysis of release latencies as a function of stimulus number during an AP train; analysis of SV release numbers under conditions of very high release probability. We note that the phenomenon of Ca<sup>2+</sup>-dependent docking that we proposed based on these observations has been consistent with flash-and-freeze or zap-and-freeze results from several laboratories. Concerning potential filamentous connections between SVs and the AZ plasma membrane at a distance of several 10s of nm, this has been seen not only in frog or mice neuromuscular junctions, but also at brain synapses (ex: Siksou et al., Journal of Neuroscience 2007; Cole et al., Journal of Neuroscience 2016; Fernandez-Busnadiego, Journal of Cell Biology 2010; 2013).

      (3) Comparison to other vesicle models

      The authors use overly assertive language to suggest that the model proves a mechanism. "Altogether, these results indicate that the slow phase of LFD ... reflects a δ decrease without significant changes in pr, in ρ or in IP size". Simulating data does not conclusively "indicate" the underlying mechanism, but the authors could state their data can be "explained by a model where..".

      Please see our response above to a similar point by Reviewer 1.

      However, LFD does not require activity-dependent undocking. Instead, the phenomenon has been explained by high-release probability, paired with an activity-dependent increase in either docking or release probability (Chiu and Carter, 2024; Doussau et al., 2017). Does the new model do a better job of replicating some facet of the data? If multiple models can explain the same data, how can we determine which model is correct? The "Alternative Presynaptic Depression Mechanisms" should be expanded to discuss these issues.

      We could not find statements in the Chiu and Carter paper or in the Doussau et al. paper explaining LFD ‘by high-release probability, paired with an activity-dependent increase in either docking or release probability’. As far as we can see, Chiu and Carter do not propose any specific mechanism for LFD, beyond saying that depression and facilitation must be separate. Doussau et al. (their Fig. 6) clearly frame their interpretation in a sequential two-step model. As in the preceding Miki et al. paper (which they cite extensively), they assume a rapid (a few ms), Ca-dependent transition between their ‘reluctant pool’ and their ‘fully-releasable pool’, respectively homologous to RS and DS. Thus, the Doussau et al. interpretation is close to that presented in our present work, even though significant differences exist. An important difference is that Doussau et al. did not use simple synapses, so that they did not have access to key synaptic parameters such as the number of docking sites or the release probability per docking site. Consequently, the model in Doussau et al. does not have the same level of detail as ours. The revised version explains better the differences and similarity between the models of Doussau et al. and that exposed in our work (new paragraph on p. 14).

    1. eLife Assessment

      Mechanical transduction channels of sensory hair cells possess lipid scramblase activity. Membrane lipid disruption resulting from mechanical transduction is thought to be restored by flippase activities. This fundamental study provides compelling evidence that ATP8B1, a P4-ATP flippase and its subunit TMEM30B, are key in mediating this restorative function in outer hair cells of the mammalian cochlea.

    2. Reviewer #1 (Public review):

      Sensory hair cells of the inner ear convert mechanical sound vibrations into electrical signals through mechano-electrical transduction (MET), a process critically dependent on the specialized organization and lipid composition of their plasma membrane. Although the protein components of the MET complex are relatively well characterized, the role of the lipid environment remains poorly understood and often overlooked. Recent discoveries that core MET proteins TMC1 and TMC2 function as lipid scramblases, disrupting membrane lipid asymmetry, expose a significant gap in our understanding of how lipid homeostasis is regulated in hair cells and how membrane dynamics influence MET function.

      In this study, the authors address this gap by identifying the P4-ATPase ATP8B1 and its chaperone TMEM30B as essential regulators of membrane lipid asymmetry in outer hair cells. They also generated HA-tagged knock-in mice to precisely localize the P4-ATPase ATP8B1 and its chaperone TMEM30B within outer hair cells, demonstrating their enrichment in stereocilia, and convincingly demonstrate that loss of these proteins causes phosphatidylserine externalization, hair cell degeneration, and hearing loss in mouse models, phenocopying defects observed in TMC1 mutant mice with constitutive scrambling activity. While these findings establish lipid flippase pathways as critical for hair cell survival and auditory function, they also raise important questions about the precise mechanisms linking lipid asymmetry disruption to MET dysfunction and hair cell pathology.

      Overall, the data convincingly support the conclusion that ATP8B1-TMEM30B flippase activity is required to maintain stereocilia lipid asymmetry and auditory function. The study substantially advances understanding of how lipid homeostasis intersects with MET. However, several points require clarification to ensure that localization claims and mechanistic interpretations are fully supported by the presented data.

      Revisions considered essential by this reviewer are:

      (1) Figure 1D.<br /> The authors should clarify how the qPCR data were normalized and specify the reference (housekeeping) genes used. This information is necessary to evaluate the robustness and comparability of the gene expression data.

      (2) Figure 1F.<br /> The lack of F-actin staining at the hair cell base raises the possibility that the permeabilization conditions may have limited antibody access to certain membrane regions. This is especially important given that the authors used a gentle permeabilization agent such as saponin to preserve membrane integrity. Because the authors conclude that ATP8B1 and TMEM30B are localized "almost exclusively to OHC bundles and the apical membrane, with minimal staining in the remaining plasma membrane," (line 128). Including co-labeling with a plasma membrane marker or more comprehensive F-actin visualization of lateral and basal regions would help ensure that the restricted localization is biological rather than technical. In the absence of such controls, the localization claim may be somewhat overstated and should be tempered accordingly.

      (3) Figure 7B.<br /> Although quantification of ATP8B1-HA intensity at the bundle appears similar between WT and Cib2 KO samples, the representative image suggests that some bundles lack detectable labeling. To better capture phenotype variability, it would be helpful to include an additional quantification showing the fraction or number of bundles with detectable ATP8B1-HA signal in Cib2 KO mice.

      (4) Lines 346-349.<br /> The manuscript suggests that IHCs lack stereocilia-enriched P4-ATPases. However, this conclusion is not directly supported by the presented data. The authors should either provide supporting localization or expression data for other P4-ATPases or soften the statement to indicate that no stereocilia-enriched P4-ATPases were detected under the conditions examined.

      Recommendations:

      (5) The authors convincingly demonstrate that TMEM30B loss results in ATP8B1 mislocalization. While not essential to the central conclusions, examining TMEM30B localization in ATP8B1 KO hair cells would clarify whether this interdependence is reciprocal, as described for other P4-ATPase-CDC50 complexes.

      (6) Lines 359-374.<br /> The discussion of Annexin V labeling is careful and balanced. This paragraph would benefit from referencing other studies that showed minimal Annexin V labeling in healthy P6 organ of Corti, reinforcing that robust PS externalization in the present study is pathological rather than developmental.

      (7) Lines 392-399.<br /> The proposed feedback model linking MET activity and ATP8B1-TMEM30B localization is compelling. The discussion could be strengthened by noting that in TMC1/2 double knockout hair cells, PS externalization is not observed, consistent with the idea that flippase activity becomes critical specifically when scrambling occurs. The mislocalization observed in Cib2 KO hair cells further supports the coupling between TMC-mediated scrambling and flippase-mediated membrane restoration.

    3. Reviewer #2 (Public review):

      Summary:

      Prior work identified TMEM30B (knockout mice) as well as ATP8B1 (human genetics and mouse model), ATP8A2 (knockout mice), and ATP811A (human genetics) as relevant for hearing. The authors also reasoned that, given the recent discovery of TMC1 and TMC2's dual function as mechanotransduction channels of the inner ear and as lipid scramblases, a counterpart flippase should be in the sensory hair-cell stereocilia bundle where mechanotransduction happens. They use CRISPR/CAS to modify the endogenous mouse genes and add an HA tag at the N-terminus of the ATP8B1, ATP8A1, ATP8A2, and ATP11A proteins. Their experiments with these mice unambiguously localized ATP8B1 at the base of outer hair cell stereocilia bundles. Knockout of ATP8B1 results in loss of outer hair cells, deficient auditory function (ABR), and degeneration of outer hair cell stereocilia bundles. Similarly, hair cells from genetically modified mice with endogenous HA-tagged TMEM30B proteins show localization of this protein to outer hair cell stereocilia bundles. TMEM30B knock-out mice phenocopy the ATP8B1 knock-out model. Interestingly, the authors show that annexing V staining precedes hair cell loss in ATP8B1 and TMEM30B knockout mice and that proper localization of these proteins is lost in mice that lack CIB2, a protein essential for hair cell mechanotransduction.

      Strengths:

      (1) Use of knock-in HA-tagged proteins, rather than antibody staining, to unambiguously localize ATP8B1 and TMEM30B.

      (2) Systematic characterization of auditory function (ABR), hair cell loss, and hair-cell stereocilia bundle morphology.

      (3) Advances our understanding of the role played by lipid homeostasis in auditory function.

      (4) Reports on mouse models that will be helpful to further understand the mechanistic role played by ATP8B1 and TMEM30B in normal hearing and hereditary deafness.

      Weaknesses:

      (1) Are the HA tags causing any functional issues? Function and localization of tagged proteins can sometimes be compromised. It would be good to know, for each knock-in model (TMEM30B, ATP8B1, ATP8A1, ATP8A2, and ATP11A ), whether the HA-tagged protein is causing any issues with the mice and particularly with hearing (ABRs). Are these mice normal? Can they hear? These data are missing.

      (2) Following on the point above, is it possible that ATP8B1-HA is well localized, but localization for the other three flippases (ATP8A1-HA, ATP8A2-HA, and ATP11A-HA) is compromised by the tag? Is this potential mislocalization causing any functional phenotypes? (ABRs of point 1). I find it surprising that there are flippases only in outer hair cells, and only formed by ATP8B1. A possible explanation is that the tag is interfering with trafficking. If so, there should be a phenotype (ABRs), although this might be masked by redundancy among these flippases or caused by systemic issues (admittedly difficult to sort out). Given that this manuscript will likely become foundational, and that there is evidence that at least two of the other flippases are involved in hearing loss, it would be good to provide more information about the mice and HA-tagged proteins in the other knock-ins (ATP8A1-HA, ATP8A2-HA, and ATP11A-HA). Depending on the data available for the knock-ins, the authors may want to discuss these scenarios and soften the statement indicating that inner-hair cells may lack flippase activity altogether.

      (3) Expression of ATP8B1 at P0 (Figure 1D), when there should not be protein in outer hair cells yet, seems high. Does this mean that other cells in the cochlea also express ATP8B1? Is this a concern?

      (4) Fluorescence scales in Figure 6 B and D and Figure 7 B and D are very different. So are the values for WT. One would expect that the WT would be similar in all cases (at least within the same compartments), given that the methods section indicates that "All images were collected using identical acquisition parameters, including zoom and laser power, across genotypes". If WT shows such variability, how can we compare?

    4. Author Response:

      Summary of Planned Revisions:

      We will clarify the qPCR methodology and interpretation to address potential misunderstandings.

      We will assess hearing in the generated HA-tagged mouse lines and, where appropriate, include a properly powered ABR analysis in the revised manuscript.

      We will address concerns regarding the z-stack in Figure 1f.

      We will include additional quantification for Figure 7B to strengthen the analysis.

      We will revise the relevant statement to read: “No IHC stereocilia-enriched P4-ATPases were detected under the conditions examined.”

      While we appreciate the suggestion to examine TMEM30B localization on the ATP8B1 KO background, this is not feasible within a reasonable timeframe; we will clarify this limitation in the manuscript.

      We will incorporate relevant prior work (e.g., George and Ricci, 2026) demonstrating minimal Annexin V labeling prior to P6 and lack of PS externalization in TMC1/2 double knockout models.

      We will clarify that hearing thresholds for TMEM30B-HA and ATP8B1-HA lines will be addressed in this study, while additional HA-tagged flippase lines (ATP8A1, ATP8A2, ATP11A) are part of ongoing work to be reported separately.

      We will soften statements regarding HA-tag insertion and clarify that, to our knowledge, localization and function are not disrupted, while acknowledging this as a potential limitation.

      We will revise the Methods section to clarify differences in fluorescence measurements across experiments.

      In addition to the experiments in response to reviewer’s suggestions, we will add the following data that we have generated while the paper was in review:

      Distortion product otoacoustic emission (DPOAEs) of the Atp8b1 KO and Tmem30b KO mice. Consistent with OHC function, their DPOAEs thresholds were elevated.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) Figure1D.

      The authors should clarify how the qPCR data were normalized and specify the reference (housekeeping) genes used. This information is necessary to evaluate the robustness and comparability of the gene expression data.

      We thank the reviewer for this comment. qPCR data were normalized to GAPDH as the reference (housekeeping) gene. We will clarify this in the Methods section to ensure transparency and reproducibility.

      (2) Figure 1F.

      The lack of F-actin staining at the hair cell base raises the possibility that the permeabilization conditions may have limited antibody access to certain membrane regions. This is especially important given that the authors used a gentle permeabilization agent such as saponin to preserve membrane integrity. Because the authors conclude that ATP8B1 and TMEM30B are localized "almost exclusively to OHC bundles and the apical membrane, with minimal staining in the remaining plasma membrane," (line 128). Including co-labeling with a plasma membrane marker or more comprehensive F-actin visualization of lateral and basal regions would help ensure that the restricted localization is biological rather than technical. In the absence of such controls, the localization claim may be somewhat overstated and should be tempered accordingly.

      We appreciate this important point. The image shown represents a single z-slice from a larger stack, and the hair cell body lies outside the plane of this section. To clarify this, we will revise the figure presentation. Specifically, we can provide the full z-stack (already available via OSF) and/or replace the image with a resliced whole-mount view to better visualize the full cellular context.

      In terms of the possibility that the lack of staining in the hair cell’s plasma membrane might be due to insufficient antibody penetrance, we routinely perform Prestin (located in OHC plasma membrane) staining after saponin-mediated permeabilization and have never experienced antibody accessibility issues. Nevertheless, we will perform co-labeling for Prestin and include in the new submission.

      (3) Figure 7B.

      Although quantification of ATP8B1-HA intensity at the bundle appears similar between WT and Cib2 KO samples, the representative image suggests that some bundles lack detectable labeling. To better capture phenotype variability, it would be helpful to include an additional quantification showing the fraction or number of bundles with detectable ATP8B1-HA signal in Cib2 KO mice.

      We thank the reviewer for this suggestion. To better capture variability, we will include an additional quantification measuring the fraction of hair cell bundles with detectable ATP8B1-HA and TMEM30B-HA signal per field of view. This analysis will complement the existing intensity-based quantification.

      (4) Lines 346-349

      The manuscript suggests that IHCs lack stereocilia-enriched P4-ATPases. However, this conclusion is not directly supported by the presented data. The authors should either provide supporting localization or expression data for other P4-ATPases or soften the statement to indicate that no stereocilia-enriched P4-ATPases were detected under the conditions examined.

      We agree with the reviewer and will revise this statement to read: “No IHC stereocilia-enriched P4-ATPases were detected under the conditions examined.”

      Recommendations:

      (5) The authors convincingly demonstrate that TMEM30B loss results in ATP8B1 mislocalization. While not essential to the central conclusions, examining TMEM30B localization in ATP8B1 KO hair cells would clarify whether this interdependence is reciprocal, as described for other P4-ATPase-CDC50 complexes.

      We appreciate this insightful suggestion. However, performing this experiment would require generating a compound mouse line (crossing TMEM30B-HA into the ATP8B1 knockout background), which is not feasible within the revision timeframe. Additionally, the lack of a robust commercial antibody for TMEM30B further complicates this approach. We will note this as a future direction in the revised manuscript.

      (6) Lines 359-374.

      The discussion of Annexin V labeling is careful and balanced. This paragraph would benefit from referencing other studies that showed minimal Annexin V labeling in healthy P6 organ of Corti, reinforcing that robust PS externalization in the present study is pathological rather than developmental.

      We thank the reviewer for this suggestion and will incorporate relevant prior work, including George and Ricci (2026), which demonstrates minimal Annexin V labeling prior to P6, and further supports our interpretation.

      (7) Lines 392-399.

      The proposed feedback model linking MET activity and ATP8B1-TMEM30B localization is compelling. The discussion could be strengthened by noting that in TMC1/2 double knockout hair cells, PS externalization is not observed, consistent with the idea that flippase activity becomes critical specifically when scrambling occurs. The mislocalization observed in Cib2 KO hair cells further supports the coupling between TMC-mediated scrambling and flippase-mediated membrane restoration.

      We agree and will expand the discussion to include that TMC1/2 double knockout hair cells do not exhibit phosphatidylserine externalization, supporting the idea that flippase activity becomes critical in the context of scrambling.

      Reviewer #2 (Public review):

      Weaknesses:

      (1) Are the HA tags causing any functional issues? Function and localization of tagged proteins can sometimes be compromised. It would be good to know, for each knock-in model (TMEM30B, ATP8B1, ATP8A1, ATP8A2, and ATP11A), whether the HA-tagged protein is causing any issues with the mice and particularly with hearing (ABRs). Are these mice normal? Can they hear? These data are missing.

      We thank the reviewer for raising this important point. In this study, we will focus on TMEM30B-HA and ATP8B1-HA mouse lines, while additional HA-tagged flippase lines (ATP8A1, ATP8A2, ATP11A) are part of ongoing work to be reported separately.

      Both TMEM30B-HA and ATP8B1-HA mice are viable and exhibit normal breeding and aging. Preliminary (pilot) ABR measurements indicate wild-type–like hearing thresholds. We agree that this is important and will attempt to raise sufficient mouse numbers (in the time given) for a properly powered ABR analysis in the revised manuscript.

      (2) Following on the point above, is it possible that ATP8B1-HA is well localized, but localization for the other three flippases (ATP8A1-HA, ATP8A2-HA, and ATP11A-HA) is compromised by the tag? Is this potential mislocalization causing any functional phenotypes? (ABRs of point 1). I find it surprising that there are flippases only in outer hair cells and only formed by ATP8B1. A possible explanation is that the tag is interfering with trafficking. If so, there should be a phenotype (ABRs), although this might be masked by redundancy among these flippases or caused by systemic issues (admittedly difficult to sort out). Given that this manuscript will likely become foundational, and that there is evidence that at least two of the other flippases are involved in hearing loss, it would be good to provide more information about the mice and HA-tagged proteins in the other knock-ins (ATP8A1-HA, ATP8A2-HA, and ATP11A-HA). Depending on the data available for the knock-ins, the authors may want to discuss these scenarios and soften the statement indicating that inner-hair cells may lack flippase activity altogether.

      We appreciate this concern. To our knowledge, the HA tag does not appear to disrupt localization or function of the tagged proteins. However, we agree that this cannot be fully excluded. We will therefore soften our conclusions about IHC flippases and clarify that additional flippases (ATP8A1, ATP8A2, ATP11A) are under investigation and will be described in a separate study.

      (3) Expression of ATP8B1 at P0 (Figure 1D), when there should not be protein in outer hair cells yet seems high. Does this mean that other cells in the cochlea also express ATP8B1? Is this a concern?

      We thank the reviewer for this observation. We interpret the elevated signal at P0 as reflecting transcription preceding detectable protein expression. While expression in other cochlear cell types is possible, we have not observed detectable ATP8B1 localization outside hair cells using the HA-tagged model. We will clarify this point in the manuscript.

      (4) Fluorescence scales in Figure 6 B and D and Figure 7 B and D are very different. So are the values for WT. One would expect that the WT would be similar in all cases (at least within the same compartments), given that the methods section indicates that "All images were collected using identical acquisition parameters, including zoom and laser power, across genotypes". If WT shows such variability, how can we compare?

      We appreciate the need for clarification. Identical acquisition parameters were maintained within each experiment used for direct comparison (e.g., within a given panel). However, different panels (e.g., Figures 6B vs. 6D) were acquired on different days using different imaging settings.

      We will revise the Methods section to explicitly state this and clarify that comparisons are intended only within panels, not across experiments.

    1. eLife Assessment

      This important study examines the stability and compensatory plasticity in the retinotopic mapping in patients with congenital achromatopsia. It provides convincing evidence for a stable mapping of the visual field in V1, alongside changes of the readout from V1 into V3, which shows revised receptive field location and size. This paper would be of interest to scientists studying the visual system, brain plasticity, and development.

    2. Reviewer #1 (Public review):

      Summary:

      This paper examines plasticity in early cortical (V1-V3) areas in an impressively large number of rod monochromats (individuals with achromatopia). The paper examines three things:

      (1) Cortical thickness. It is now well established that early complete blindness leads to increases in cortical thickness. This paper shows increased thickness confined to the foveal projection zone within achromats. This paper replicates work by Molz (2022) and Lowndes (2021), but the detailed mapping of cortical thickness as a function of eccentricity and the inclusion of higher retinotopic areas is particularly elegant.

      (2) Failure to show largescale reorganization of early visual areas using retinotopic mapping. This is a replication of a very recent study of Molz et al. but I believe, given anatomical variability, the larger n in this study, and how susceptible pRF findings are to small changes in procedure, this replication is also of interest.

      (3) Connective field modelling, examining the connections between V3-V1. The paper finds changes in the pattern of connections, and smaller connective fields in individuals with achromatopsia than normally sighted controls, and suggests that these reflect compensatory plasticity, with V3 compensating for the lower resolution V1 signal in individuals with achromatopsia.

      This is a carefully done study (both in terms of data collection and analysis) that is an impressive amount of work.

      *Effects of eye-movements

      The authors have carried out the eye-movement analyses I asked of them. Unfortunately, in 4 individuals they couldn't calibrate the eyetracker (it's impressive they managed in 10). I think this means that 4 of 13 (since a different participant was excluded from head motion) individuals weren't included in correlation analyses. Limiting the correlation analysis to individuals with better fixation has obvious issues. I'd recommend redoing (or additionally including) stats using non-parametric measures while classifying these 4 as having fixation instability of 3 (i.e. greater instability than the participant with the worst fixation who was successfully calibrated).

      *Interpreting pRFs

      The paper would be strengthened by a little more explicit clarity about what pRFs represent and how that affects their interpretation of their findings as plasticity vs. non-plasticity (I know the authors are aware of this, but I think it would be helpful for readers who are less experienced in pRFs). In the introduction it would be helpful to point out that pRFs represent the collective response of a large population of neurons, and as a result pRF estimates can vary depending on which population of neurons that stimulus drives.

      For example, imagine for the sake of argument that rods only project to V1 neurons with larger receptive fields. If one measured pRFs in a control observer under phototopic vs. scotopic conditions one would see smaller pRFs in the photopic conditions. This wouldn't represent 'plasticity' - it would represent the fact that the firing neurons contributing to the pRF signal are a slightly different population because of a change in the stimulus content. This is of course exactly what you see in 2C. And indeed, the authors make this identical point ". In the non-selective condition, the smaller pRFs in controls are in line with the higher spatial resolution of the<br /> cone system, which is not active in the achromat group." But this point would be clearer if more of the conceptual underpinnings were made explicit in the introduction (or at this point in the paper).

      Shifts in which population of neurons drive your pRFs can explain main of the more puzzling results in the paper without detracting from your main conclusions. For example, in 2D, I don't think it's differences in S/N driving your results (pRFs are at least theoretically meant to be robust to S/N). If smaller RFs 'drop out' under low luminance and these smaller RFs also tend to be more central, then one would expect the control results of 1D. And I think a similar argument might even be made for the smaller difference in the rod monochromats.

      It would be possible to make the point of Figure 4B more simply if Figure 4B was replaced by additional Panels in Figure 2 simply showing V3 pRF sizes/eccentricity distributions. That would make the point that you don't see the same expansion in pRF sizes in V3 in a way that is just as clear, and is closer to the data.

      *Interpreting cRFs

      Similarly, I think the paper would be improved with more clarity about the underlying signal in CF modeling. Once again, I appreciate that the authors are familiar with this, but it will help the reader in interpretation. (And I do believe thinking carefully about this may alter your interpretations). CF receptive fields 'find' the region in V1 that best predict the V3 signal in a given voxel. In resting state this likely represents a combination of:

      (1) visually driven signal - correlations that may or may not reflect connectivity but represent the fact that regions that represent the same region of visual space will be active at the same time.

      (2) global bilaterally symmetrical signal consisting of enhanced correlations between iso-eccentric regions (Raemaekers et al., 2014), which may arise from vasculature that symmetrically stems from the posterior cerebral artery (Tong et al., 2013; Tong and Frederick, 2014).

      (3) intrinsic neural fluctuations that are more strongly correlated between connected neurons. These are likely quite weak compared to the other contributions.

      I think if you ignore 2, (which is not likely to differ between rod mono and controls) and model 1 and 3, you might well see shifts in CFs towards the boundary of the scotoma - essentially the CF's location will be biased towards the region of V1 that has stronger correlations - which = the region which has a visual signal.

      I do find convincing the argument that you don't see the same shift in controls in the rod-selective condition. So I think the results of 4A are fine. But a little more clarity about 'what's under the hood' in CF modeling would be nice.

      *Interpreting the relationship between pRFs and cRFs

      So there's something here that confuses me. We are all agreed that V3 pRF sizes are similar across RM and control. V1 pRFs are larger in RM. It feels intuitive that smaller CFs would compensate but I can't make it make sense to myself when I think it through. Each pRF represents a combination of receptive field location scatter and bandwidth. You want to argue that eccentricity mapping looks pretty normal, so there's no reason to think increased rf scatter, and I can believe that (though I do think this assumption should be discussed explictly).

      So far I think we agree.

      But let's think about what drives a CF during visual stimulation ... Specifically lets think about 'the pRF of the CF' (the region of visual space represented by the cluster of voxels in the CF). If pRFs for individual voxels in V1 are big, then the pRF for the CF is also going to be large. But we know that pRFs for V3 are normal size. So, the V3 CF will 'find' a smaller number of voxels in V1, in order to try to find the 'correct sized' CF pRF. Note that this explanation is very similar to yours. But doesn't require ANY 'intrinsic' connectivity. It's really just assuming the whole thing is driven by the visual signal and the CF size is determined by the ratio of the pRF sizes in V3 vs. V1.

      One possible solution would be to regress out the visual stimulus and redo this analysis based on the residuals.

    3. Reviewer #3 (Public review):

      Summary:

      This study addresses a long-standing question in visual neuroscience concerning how the human visual system balances stability and plasticity when sensory input is altered from early in life. Using achromatopsia as a model of lifelong cone deprivation, the authors examine whether early visual cortex undergoes retinotopic reorganization to compensate for the absence of foveal cone input, or whether canonical retinotopic organization is largely preserved. By combining fMRI-based population receptive field (pRF) mapping with connective field (CF) modelling, the authors characterize changes across multiple hierarchical stages of visual processing.

      The main findings indicate that primary visual cortex (V1) shows no systematic remapping of the foveal projection zone, whereas extrastriate cortex, particularly V3, exhibits altered patterns of sampling from V1. The authors interpret these results as evidence for hierarchical adaptation, whereby downstream readout mechanisms adjust to make more efficient use of degraded rod-mediated input while preserving early-stage retinotopic organization.

      Strengths:

      A major strength of this work is the use of silent substitution to generate rod-selective stimuli. This approach enables a principled comparison between achromats and typically sighted controls by isolating rod-driven responses in both groups. In doing so, the study overcomes a key limitation of prior work, where differences in cortical organization could often be confounded by differences in photoreceptor class rather than reflecting neural reorganization per se. The inclusion of a rod-driven baseline in controls provides an important reference for distinguishing long-term adaptation from transient or stimulus-driven effects.

      Another notable strength is the integration of CF modelling alongside conventional pRF mapping. While pRF analyses alone suggest enlarged receptive fields in V1, consistent with reduced spatial resolution, the CF analysis offers a more mechanistic account by revealing changes in how V3 samples information from the V1 surface. This multi-level modelling approach moves beyond descriptive accounts of cortical map structure and provides a framework for interpreting how downstream areas may adjust their integration strategies under conditions of altered input.

      Weaknesses:

      Although the study is methodologically strong, the central claims regarding stability and compensatory plasticity require clearer conceptual framing and stronger empirical support. Stability is primarily defined as the absence of large-scale retinotopic remapping in V1, yet the presence of significantly enlarged V1 pRFs indicates substantial tuning-level plasticity at the input stage; distinguishing topographic stability from functional reorganization would therefore strengthen the interpretation. Moreover, the proposed compensatory mechanism raises a signal-processing concern, as reduced downstream sampling (smaller CFs in V3) cannot restore spatial information lost due to coarse upstream representations, and may instead limit integration. The mechanistic link between altered CF properties and normalization of extrastriate pRFs is not directly tested, as group differences are not shown to covary across individuals or visual field locations. Finally, the interpretation of these changes as compensatory implies functional benefit, yet no behavioral or performance measures are provided to establish that the observed reorganization preserves or enhances visual function, leaving open whether these effects reflect adaptive optimization or passive downstream consequences of altered input.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper examines plasticity in early cortical (V1-V3) areas in an impressively large number of rod monochromats (individuals with achromatopia). The paper examines three things:

      (1) Cortical thickness. It is now well established that early complete blindness leads to increases in cortical thickness. This paper shows increased thickness confined to the foveal projection zone within achromats. This paper replicates the work by Molz (2022) and Lowndes (2021), but the detailed mapping of cortical thickness as a function of eccentricity and the inclusion of higher visual areas is particularly elegant.

      (2) Failure to show largescale reorganization of early visual areas using retinotopic mapping. This is a replication of a very recent study by Molz et al. but I believe, given anatomical variability (and the very large n in this study) and how susceptible pRF findings are to small changes in procedure, this replication is also of interest.

      (3) Connective field modelling, examining the connections between V3-V1. The paper finds changes in the pattern of connections, and smaller connective fields in individuals with achromatopsia than normally sighted controls, and suggests that these reflect compensatory plasticity, with V3 compensating for the lower resolution V1 signal in individuals with achromatopsia.

      Strengths:

      This is a carefully done study (both in terms of data collection and analysis) that is an impressive amount of work. I have a number of methodological comments but I hope they will be considered as constructive engagement - this work is highly technical with a large number of factors to consider.

      Weaknesses:

      (1) Effects of eye-movements

      I have some concerns with how the effects of eye-movements are being examined. There are two main reasons the authors give for excluding eye-movements as a factor in their results. Both explanations have limitations.

      (a) The first is that R2 values are similar across groups in the foveal confluence. This is fine as far as it goes, but R2 values are going to be low in that region. So this shows that eyemovements don't affect coverage (the number of voxels that generate a reliable pRF), but doesn't show that eye-movements aren't impacting their other measures.

      We agree with the reviewer that eye movements could affect pRF measures. We have now also included data for all participants where we were able to obtain eye tracking measures and directly tested this relationship. Relevant results are copied below.

      Recap of results: 1) as expected gaze was less stable in achromats than controls, 2) achromats with more stable gaze did not show more activation in the scotoma projections zone, which we might have observed if fixation instability masks signals in this region 3) Gaze instability was not correlated with pRF size and eccentricity across V1 in achromats. We note that the relationship between nystagmus and visual sampling is complex - patients experience a stable image and may sample only during a specific phase of the eye movement. It is therefore not inherently clear if and how nystagmus affects pRF size.

      Relevant Manuscript text incorporating these analyses is copied below.

      To quantify eye movement, we used the following methods added to the manuscript:

      “Fixation stability

      Participants’ gaze was tracked throughout all pRF mapping runs. Collecting reliable gaze data from individuals with nystagmus is a challenge because out of the box calibration procedures mostly fail without stable fixation. To account for this, we implemented a post-hoc custom calibration procedure (Tailor et al., 2021). The eye-tracker was first precalibrated on a typically sighted individual. Then, before every other run, we collected gaze data from a 5-point fixation task (at fixation and above, below, left, and right of fixation at 5 eccentricity). This data allowed us to subsequently map the patient's recorded gaze coordinates to their precise locations on the screen. In 10 out of the 14 achromats we acquired reliable enough data to assess fixation stability.

      Calibration data processing: We first removed the first 0.5 seconds for each fixation location to allow for fixation to arrive on the target. We then performed (a) blink removal, (b) filtered out time points with eye movement velocity outliers (±2SD), and (c) filtered out any positions >3SDs to the left or right of the mean fixation location, and >1SD above or below. We took the median of the remaining gaze measurements as an approximate fixation estimate. The resulting 5 median fixation locations were used to fit an affine transformation that remapped the recorded gaze positions into screen space. 

      Quantifying fixation stability: after applying the transformation of the post-hoc calibration, data was filtered for blinks and extreme velocities (<2SD). For each functional run, fixation instability was measured as the standard deviation of gaze x-positions across 1second windows. Measures were then averaged across the two run repeats.”

      We report the resulting new fixation data results as follows:

      Results (coverage section):

      “Another potential confound in our findings is fixation instability. In pRF mapping, which is usually conducted under photopic (cone-dominant) conditions, unstable fixation can cause a signal drop in the foveal projection zone. As expected due to nystagmus, the achromatopsia group showed higher fixation instability compared to controls (rodselective: t<sub>(9.08)</sub>=-3.19, p=0.01; non-selective: t<sub<(9.41)</sub>=-4.88, p<0.001 degrees-offreedom corrected for unequal-variance; see Supplement Figure S2a). However, several lines of evidence suggest this instability cannot fully account for the lack of "filling in" in achromats. First, within the achromat group, we found no correlation between fixation stability and coverage (rod-selective: spearman-r<sub>(8)</sub> = -0.36, p=0.31; non-selective spearman-r<sub>(8)</sub>=0.07,p=0.85); Individuals with more stable, control-like fixation did not show more signal inside the scotoma (see Supplement 2). Second, in adults with achromatopsia, typically with less severe nystagmus (Kohl et al., 1993), two recent studies also found absence of filling in (Anderson et al., 2024; Molz et al., 2023).

      So, while we cannot fully exclude nystagmus masking foveal signals in the cortex of some patients, this converging evidence from structural and functional MRI measures across different studies and groups, strongly suggests that the deprived cortex does not substantially ‘fill in’ with peripheral rod inputs in achromatopsia.”

      Results (pRF size + eccentricity):

      “Larger pRFs indicate that neuronal populations in achromats’ V1 cortex, combine information across larger areas in visual space than in typically sighted controls. This could reflect true neural tuning differences as well as be driven by larger eye movement. However, fixation instability in achromats do not significantly correlate with pRF size in our sample (rod-selective: spearman-r<sub>(8)</sub> = -0.41, p=0.24; non-selective spearman-r<sub>(8)</sub>=0.37,p=0.29)

      It has been shown that fitting artefacts around scotoma edges, can give rise to similar outward eccentricity shifts (Binda et al., 2013). However, when accounting for fitting artefacts around the foveal scotoma edge by modelling the rod-free zone during pRF fitting, pRF size and eccentricity differences remain unchanged (see Supplement 3). Finally, we found no significant correlations between gaze stability and the eccentricity shift (rod-selective: spearman-r<sub>(8)</sub> = 0.58, p=0.08; non-selective spearman-r<sub>(8)</sub>=0.09,p=0.8, Supplement 4D)

      Together, these analyses reveal subtle differences in how V1 of achromats responds to rod signals outside the foveal zone, which are consistent with results from other studies (Molz et al. 2023, Anderson et al. 2024). While we found no direct evidence that these are being driven by confounding factors such as eye-movements or fitting artefacts, more work is needed to understand the underlying processes that give rise to these shifts.”

      The following text has been added to Supplement 2

      “As expected, achromats showed significant higher fixation instability compared to controls (as reported in the main text). We found no significant correlation between fixation instability and either coverage, pRF size, eccentricity in achromats. Results of Spearman R correlations in both rod- and non-selective conditions are reported in the figure. We note that the relationship between nystagmus and visual sampling is complex- patients experience a stable image and may sample only during specific eyemovement phases. It is therefore not fully clear if and how nystagmus should give rise to altered pRFs.”

      (b) The authors don't see a clear relationship between coverage and fixation stability. This seems to rest on a few ad hoc examples. (What happens if one plots mean fixation deviation vs. coverage (and sets the individuals who could not be calibrated as the highest value of calibrated fixation deviation. Does a relationship then emerge?).

      In any case, I wouldn't expect coverage to be particularly susceptible to eye-movements. If a voxel in the cortex entirely projects to the scotoma then it should be robustly silent. The effects of eye-movements will be to distort the size and eccentricity estimates of voxels that are not entirely silent.

      There are many places in the paper where eye-movements might be playing an important role. 

      Examples include the larger pRF sizes observed in achromats. Are those related to fixation instability?

      We thank the reviewer for their comment. As detailed in our previous response, we have now extracted fixation instability data from additional patients and have expanded our discussion of its potential effects throughout the manuscript.

      Given that fixation instability is expected to increase pRF size by a fixed amount, that would explain why ratios are close to 1 in V3 (Figure 4).

      We agree with the reviewer’s point, that the ratio change on its own is not strong evidence of compensation, this analysis was meant to complement the CF result. The plot in Figure 4 is intended to reconcile the connective field (CF) and pRF results. Its purpose is to illustrate that even though larger pRFs in achromats might seem counterintuitive alongside their smaller V3 CF sizes, the pRF data do not contradict the CF findings but they are in fact consistent with one another. We also agree that there are alternative explanations for the differences in pRF size, such as fixation stability, and we have now added this point to the text.

      Results (CF size):

      “To understand how this finer cortical sampling in V3 (smaller connective fields) impacts visual processing, we consider its effect on population receptive fields (pRFs). In V1, pRF sizes in achromats were significantly larger than in controls for both stimulus conditions, indicating coarser spatial tuning at the cortical input stage (Figure 4C, left). By selectively sampling from a smaller area of the V1 surface (smaller CFs), V3 can effectively compensate for this coarser input. If so, this process should result in a relative normalisation of pRF size in V3 compared to V1 (Figure 4C, right).

      To test this prediction, we plotted the ratio of pRF sizes between achromats and controls, where a value of 1 indicates parity between the groups (Figure 4B). As our compensatory connective field hypothesis predicts, the ratio was closer to 1 in V3 than in V1 across both stimulus conditions, confirming the pRF size difference was significantly reduced at the higher cortical stage. Together this shows converging evidence across the two models (pRF and CF) of hierarchical refinement as a possible compensatory mechanism, where V3's altered connectivity helps to normalize the processing of degraded sensory input from V1.”

      Discussion:

      “The hierarchical reorganisation observed in V3 is unlikely to be driven by fixation instability. Connective field (CF) estimates are robust to eye movements (Tangtartharakul et al., 2023), because they are anchored to V1 inputs rather than absolute screen position. Considered alone, the pRF results could alternatively be explained by eye movements introducing a fixed size offset that affects smaller V1 pRFs more strongly than those in V3. While we found no evidence for this relationship between pRF size and gaze measures in our patients, we cannot fully rule out the possibility. Nevertheless, the internal consistency between the CF and pRF measures provides a more parsimonious account; that sampling across the hierarchy accounts for coarser tuning at the input stage.”

      (2) Topography

      The claim of no change in topography is a little confusing given that you do see a change in eccentricity mapping in achromats. 

      Either this result is real, in which case there *is* a change in topography, albeit subtle, or it's an artifact. 

      Perhaps these results need a little bit of additional scrutiny. 

      One reason for concern is that you see different functions relating eccentricity to V1 segments depending on the stimulus. That almost certainly reflects biases in the modelling, not reorganization - the curves of Figure 2D are exactly what Binda et al. predict. 

      Another reason for concern is that I'm very surprised that you see so little effect of including/not including the scotoma - the differences seem more like what I'd expect from simply repeating the same code twice. (The quickest sanity check is just to increase the size of the estimated scotoma to be even bigger?).

      We thank the reviewer for their comment. We have double-checked our scotoma modelling, confirming its correct implementation. The results of the scotoma modelling are not identical to the full one, just similar (see below).

      Previous studies on “artificial scotomas” (such as the one reported by Binda et al.) have shown mixed results. While Binda and colleagues found that modelling artificial scotomas normalised pRF shifts, others found no effect (Haak et al. 2012, Prabhakaran et al. 2020). Notably, the rodfree zone in achromatopsia is considerably smaller (~0.5° radius) than most tested artificial scotomas. Moreover, it is unclear whether scotoma modelling is beneficial in clinical populations as artificial scotomas (screen-based masking) are not equivalent to retinal scotomas from inactive photoreceptors. A recent achromatopsia study (Anderson et al. 2024) also found no change in pRF estimates with scotoma modelling.

      In our scotoma analyses, we found meaningful differences only in the non-selective condition in controls where cones in the rod-free zone are stimulated - which would be the main expected effect of this modelling exercise (see below). In all other conditions (rod-selective in controls, both conditions in achromats), only rods are stimulated, we found no difference in coverage, eccentricity or pRF size when modelling the scotoma likely because the foveal signal is weak/absent, and did not contribute much to pRF estimates in the unmasked analyses.

      This means we cannot account for the eccentricity shift as an edge effect with this scotoma model – but we remain cautious about interpreting it as real. This is because first, as we mention in the paper, in the non-selective condition, which has a higher signal-to-noise ratio, the eccentricity estimates in achromats match those of the control group's rod system. Second, it is still possible that the observed shift is an artefact of modelling that was not accounted for by the approach of scotoma modelling.

      Our claim of "no change in topography" specifically referred to the absence of "filling-in" as measured by cortical coverage - the percentage of activated tissue regardless of fitted parameters. However, to avoid confusing given the eccentricity and pRF size results we now rephrased our claim.

      Abstract:

      “Cortical input stages (V1) exhibited high stability, with input-deprived cortex showing no retinotopic remapping and exhibiting structural hallmarks of deprivation.”

      Results (pRF eccentricity):

      “It has been shown that fitting artefacts around scotoma edges, can give rise to similar outward eccentricity shifts (Binda et al., 2013). However, when accounting for fitting artefacts around the foveal scotoma edge by modelling the rod-free zone during pRF fitting, pRF size and eccentricity differences remain unchanged (see Supplement 3). Finally, we found no significant correlations between gaze stability and the eccentricity shift (rod-selective: spearman-r<sub>(8)</sub> = 0.58, p=0.08; non-selective spearman-r<sub>(8)</sub>=0.09,p=0.8, Supplement 4D)

      Together, these analyses reveal subtle differences in how V1 of achromats responds to rod signals outside the foveal zone, which are consistent with results from other studies (Molz et al. 2023, Anderson et al. 2024). While we found no direct evidence that these are being driven by confounding factors such as eye movements or fitting artefacts, more work is needed to understand the underlying processes that give rise to these shifts.”

      To better illustrate the effect of scotoma modelling text has been added to Supplement 3:

      “Studies on artificial scotomas, where part of the visual field is masked, suggest that pRF estimates of eccentricity and size can be biased by fitting scotoma-edge artefacts, and that these can be mitigated by modelling the scotoma in the pRF fitting procedure (e.g., Binda et al. 2013).

      We therefore repeated the pRF modelling procedure with the rod-scotoma being modelled as a black oval mask (1.25°x0.9°) over the stimulus aperture model. As expected, a visible difference between the two models is only apparent in the nonselective condition in controls where the cones in the rod-free zone are being stimulated. In all the other conditions (rod-selective in controls, and both stimulation conditions in achromats) only the rods are stimulated, therefore the masked stimulus still matches the retinal activation, and no major differences can be observed. Performing the same statistical tests applied to the full model in the main text yields equivalent results of equivalent coverage in the rod-selective condition, with equivalent coverage across groups(t(47) = 0.78, p=0.43, BF10=0.31) and controls show a higher coverage in the non-selective stimulation condition compared to achromats (Mann U(52)=141, p<0.01; unequal variance, reverted to non-parametric).

      This consistency in pRF properties when modelling the rod scotoma, is in line with previous results from scotoma modelling; While Binda and colleagues found that this normalised pRF shifts, others found no effect (Haak et al. 2012, Prabhakaran et al. 2020). Notably, the rod-free zone in achromatopsia is considerably smaller (~0.5° radius) than most tested artificial scotomas, and as artificial scotomas (screen-based masking) are not equivalent to retinal scotomas from inactive photoreceptors, it is unclear how artificial scotoma findings generalise to clinical populations. Our results are in line with a recent achromatopsia study (Anderson et al. 2024) which also found no change in pRF estimates with scotoma modelling.”

      I'd also look at voxels that pass an R2>0.2 threshold for both the non-selective and selective stimulus. Are the pRF sizes the same for both stimuli? Are the eccentricity estimates? If not, that's another clear warning sign.

      Comparable results were obtained when using higher R2 thresholds. These results are now included in Supplement 6.

      (3) Connective field modelling

      Let's imagine a voxel on the edge of the scotoma. It will tend to have a connective field that borders the scotoma, and will be reduced in size (since it will likely exclude the cortical region of V1 that is solely driven by resting state activity). This predicts your rod monochromat data. The interesting question is why this doesn't happen for controls. One possibility is that there is topdown 'predictive' activity that smooths out the border of the scotoma (there's some hint of that in the data), e.g., Masuda and Wandell.

      One thing that concerns me is that the smaller connective fields don't make sense intuitively. When there is a visual stimulus, connective fields are predominantly driven by the visual signal. In achromats, there is a large swath of cortex (between 1-2.5 degrees) which shows relatively flat tuning as regards eccentricity. The curves for controls are much steeper, See Figure 2b. This predicts that visually driven connective fields should be larger for achromats. So, what's going on?

      The reviewer raises interesting points about the interpretation of our connective field results. The possibility of differential top-down modulation between controls and achromats is intriguing, however it is not supported by the data, if top-down modulation is activating foveal V1 in controls then we shouldn’t see a drop in the amount of significant vertices sampling from the fovea in the rod-selective condition compared to the non-selective, but in fact we do see quite a large drop in the amount of significant vertices in that area in the rod-selective condition. Therefore, at the moment we do not think there is strong basis to assume our data could be explained by achromats lacking top-down predictive activity in the scotoma area that is present in controls.

      Regarding the concern about smaller CFs seeming counterintuitive given the flat eccentricity tuning in achromats' V1: we believe there is not a straightforward prediction from pRF properties to CF sizes. The relationship between V1 pRF characteristics and V3 CF sampling is complex and not well-established in the literature, and the two can be decoupled to some degree. For instance, in our data, controls show flat V1 pRF sizes in the rod-selective condition (similar to achromats), yet their V3 CF sizes maintain the typical eccentricity-dependent increase seen in the non-selective condition. This suggests that CF size patterns don't simply mirror V1 pRF properties or visual stimuli responses.

      Importantly, CF modelling fundamentally differs from pRF analysis in how it might be affected by scotomas. Unlike pRF analysis where a scotoma creates a "silent" region in visual space, in CF modelling the deprived cortex remains physically present and continues generating neural signals (albeit not visually-driven ones). If V3-V1 connectivity were anatomically fixed, V3 would continue sampling from deprived V1 regions even if they do not produce visual-driven signals. A change in this sampling pattern, as we see in our data, is therefore evidence for plasticity.

      Our data support this interpretation. First, in achromats, the CF size pattern observed cannot be easily explained by scotoma-edge artefacts. V3 vertices sampling from the immediate vicinity of the scotoma (1°-3°) show CF sizes comparable to controls. The effect is only significant further away from the scotoma (4°-6°).

      Second, to assess how the presence of a scotoma affects CF measure we can compare the two conditions in the controls, since the rod-selective condition has a scotoma present and the nonselective condition does not. For this purpose, we performed an additional analysis, quantifying on a vertex-by-vertex level the differences in CF fitted parameters between the two stimulation conditions across V1. See results below. In achromats there are no systematic shifts between the stimulation conditions, as expected as both are rod-driven. In controls, this analysis reveals only subtle shifts (~0.45° in the rod-selective condition). CF size has also changed slightly although not significantly different from that observed in achromats. These shifts are much smaller than the CF size and eccentricity differences between controls and achromats, so we consider it unlikely that our findings are driven by scotoma artefacts.

      Author response image 1.

      Results (CF size):

      “The significant CF size differences are unlikely to be a model-fitting bias around a scotoma edge, as V3 vertices sampling from the immediate vicinity of the scotoma (1°3°) show CF sizes comparable to controls. The significant reduction in CF size occurs only further in the periphery (4°-6°), in regions that are primarily stimulus-driven.

      To understand how this finer cortical sampling in V3 (smaller connective fields) impacts visual processing, we consider its effect on population receptive fields (pRFs). In V1, pRF sizes in achromats were significantly larger than in controls for both stimulus conditions, indicating coarser spatial tuning at the cortical input stage (Figure 4C, left). By selectively sampling from a smaller area of the V1 surface (smaller CFs), V3 can effectively compensate for this coarser input. If so, this process should result in a relative normalisation of pRF size in V3 compared to V1 (Figure 4C, right).

      To test this prediction, we plotted the ratio of pRF sizes between achromats and controls, where a value of 1 indicates parity between the groups (Figure 4B). As our compensatory connective field hypothesis predicts, the ratio was closer to 1 in V3 than in V1 across both stimulus conditions, confirming the pRF size difference was significantly reduced at the higher cortical stage. Together this shows converging evidence across the two models (pRF and CF) of hierarchical refinement as a possible compensatory mechanism, where V3's altered connectivity helps to normalize the processing of degraded sensory input from V1.”

      Discussion (added paragraph):

      “The hierarchical reorganisation observed in V3 is unlikely to be driven by fixation instability. Connective field (CF) estimates are robust to eye movements (Tangtartharakul et al., 2023), because they are anchored to V1 inputs rather than absolute screen position. Considered alone, the pRF results could alternatively be explained by eye movements introducing a fixed size offset that affects smaller V1 pRFs more strongly than those in V3. While we found no evidence for this relationship between pRF size and gaze measures in our patients, we cannot fully rule out the possibility. Nevertheless, the internal consistency between the CF and pRF measures provides a more parsimonious account; that sampling across the hierarchy accounts for coarser tuning at the input stage.”

      The beta parameter is not described (and I believe it can alter connective field sizes).

      In Author response image 2, we plot the beta parameter of the pRF modelling in V1 with no R<sup>2</sup> filtering, error bars are 95% CIs:

      Author response image 2.

      The reviewer did not specify how beta might alter connective field sizes. We assume he meant that as in pRF mapping, the slope of activity from deprived to non-deprived cortex will artefactually create a CF model fit with smaller CF sizes. To test this, we calculated the slope of beta values between 0° and 3° in each participant in the rod-selective condition, as this range includes the scotoma and the area at the edge of the scotoma. We then used the slope as a covariate in an ANCOVA when comparing the CF sizes across groups in each sampled V1 segment. Accounting for the beta slope of V1 did not change the reported results. This analysis still shows smaller CF sizes in V3 in the rod-selective conditions between 4°-6° eccentricity – these differences remain significant (p<0.001 for 4°-5° and p<0.05 for 5°-6° when comparing achromats vs controls).

      Similarly, it's possible to get very small connective fields, but there wasn't a minimum size described in the thresholding.

      CF sizes were fit with a grid fit. Possible values were [0.5,1,2,3,4,5,7,10]. Therefore, the minimum size is 0.5. Filtering out the smallest connective field sizes does not change the results:

      Author response image 3.

      I might be missing something obvious, but I'm just deeply confused as to how the visual maps and the connectome maps can provide contradictory results given that the connectome maps are predominantly determined by the visual signal. Some intuition would be helpful.

      We agree that this appears counterintuitive, and now added further clarification. The two models (pRF and CF) fundamentally differ in what they measure and how they relate to visual processing. V1 pRF sizes reflect the relationship between neural activity and visual stimuli - essentially how much of a visual stimulus drives a voxel's response - while V3 CF sizes reflect how V3 samples from the V1 cortical surface, indicating how many V1 voxels contribute to a V3 voxel's activity.

      The measures constrain each other, as a V3 voxel's pRF size is expected to match the pooling of its connected V1 inputs. But they can be decoupled: A V3 voxel could sample from a small area of V1 cortex (a small CF in mm) that happens to represent a large area of visual space if those V1 voxels have large pRFs. The aim of Figure 4B is to clarify that the measures are consistent with one another even though they diverge in direction. In achromats, where V1 voxels have larger pRFs (coarser spatial resolution), V3 appears to compensate by sampling more selectively from V1 via smaller CF sizes. Theoretically, this should reduce the pRF size difference between controls and patients in V3, a prediction that our data supports.

      Results (CF size):

      “To understand how this finer cortical sampling in V3 (smaller connective fields) impacts visual processing, we consider its effect on population receptive fields (pRFs). In V1, pRF sizes in achromats were significantly larger than in controls for both stimulus conditions, indicating coarser spatial tuning at the cortical input stage (Figure 4C, left). By selectively sampling from a smaller area of the V1 surface (smaller CFs), V3 can effectively compensate for this coarser input. If so, this process should result in a relative normalisation of pRF size in V3 compared to V1 (Figure 4C, right).

      To test this prediction, we plotted the ratio of pRF sizes between achromats and controls, where a value of 1 indicates parity between the groups (Figure 4B). As our compensatory connective field hypothesis predicts, the ratio was closer to 1 in V3 than in V1 across both stimulus conditions, confirming the pRF size difference was significantly reduced at the higher cortical stage. Together this shows converging evidence across the two models (pRF and CF) of hierarchical refinement as a possible compensatory mechanism, where V3's altered connectivity helps to normalize the processing of degraded sensory input from V1.”

      Discussion (added paragraph):

      “The hierarchical reorganisation observed in V3 is unlikely to be driven by fixation instability. Connective field (CF) estimates are robust to eye movements (Tangtartharakul et al., 2023), because they are anchored to V1 inputs rather than absolute screen position. Considered alone, the pRF results could alternatively be explained by eye movements introducing a fixed size offset that affects smaller V1 pRFs more strongly than those in V3. While we found no evidence for this relationship between pRF size and gaze measures in our patients, we cannot fully rule out the possibility. Nevertheless, the internal consistency between the CF and pRF measures provides a more parsimonious account; that sampling across the hierarchy accounts for coarser tuning at the input stage.”

      Some analyses might also help provide the reader with insight. For example, doing analyses separately on V3 voxels that project entirely to scotoma regions, project entirely to stimulusdriven regions, and V3 voxels that project to 'mixed' regions.

      We agree that it is important to plot the connective field dynamics across the scotoma region.

      In Figure 4A we split the V3 vertices based on the V1 area they sample from. Therefore the 0°-1° would be considered as mainly sampling from the “scotoma” region and the higher the eccentricity is, the less “scotoma” it includes. The V3 vertices that have a significantly smaller CF size compared to controls are those sampling from mostly if not entirely stimulusdriven regions 4°-5° and 5°-6°. We are not sure how further binning the data by within, across and outside scotoma would be more informative.

      However, in Author response image 4, we plot in more details the distribution of CF sizes sampling from a V1 segment clearly inside and clearly outside the scotoma. The top figure shows the CF size distribution of V3 vertices that sample from a V1 0°-1° segment, where V1 is deprived of input due to the rod scotoma. In achromats, there is a clear drop in vertices with a very small (0.5) CF size. The bottom figure shows the distribution of V3 vertices that sample from the V1 4°-5° segment which falls outside the scotoma and shows a significant difference in CF size across the groups. Here in achromats you can see a drop in larger V3 CF sizes sampling from the V1 region, and an increase in smaller ones (note that this further addresses a previous concern that connective field differences across groups are solely driven by very small CFs).

      Author response image 4.

      Following the reviewer’s comment we have added the following statement in the results section discussing CF size:

      “The significant CF size differences are unlikely to be a model-fitting bias around a scotoma edge, as V3 vertices sampling from the immediate vicinity of the scotoma (1°3°) show CF sizes comparable to controls. The significant reduction in CF size occurs only further in the periphery (4°-6°), in regions that are primarily stimulus-driven.”

      The finding that pRF sizes are larger in achromats by a constant factor as a function of eccentricity is what differences in eye-movements would predict. It would be worth examining the relationship between pRF sizes and fixation stability.

      We found no relationship between fixation stability and pRF size in V1, although as we explain in response to an earlier point, this does not fully exclude the reviewers alterative explanation, which we now add to the discussion.

      Discussion:

      “The hierarchical reorganisation observed in V3 is unlikely to be driven by fixation instability. Connective field (CF) estimates are robust to eye movements (Tangtartharakul et al., 2023), because they are anchored to V1 inputs rather than absolute screen position. Considered alone, the pRF results could alternatively be explained by eye movements introducing a fixed size offset that affects smaller V1 pRFs more strongly than those in V3. While we found no evidence for this relationship between pRF size and gaze measures in our patients, we cannot fully rule out the possibility. Nevertheless, the internal consistency between the CF and pRF measures provides a more parsimonious account; that sampling across the hierarchy accounts for coarser tuning at the input stage.”

      Reviewer #2 (Public review):

      Summary:

      The authors inspect the stability and compensatory plasticity in the retinotopic mapping in patients with congenital achromatopsia. They report an increased cortical thickness in central (eccentricities 0-2 deg) in V1 and the expansion of this effect to V2 (trend) and V3 in a cohort with an average age of adolescents.

      In analyzing the receptive fields, they show that V1 had increased receptive field sizes in achromats, but there were no clear signs of reorganization filling in the rod-free area. In contrast, V3 showed an altered readout of V1 receptive fields. V3 of achromats oversampled the receptive fields bordering the rod-free zone, presumably to compensate and arrive at similar receptive fields as in the controls.

      These findings support a retention of peripheral-V1 connectivity, but a reorganization of later hierarchical stages of the visual system to compensate for the loss, highlighting a balance between stability and compensation in different stages of the visual hierarchy.

      Strengths:

      The experiment is carefully analyzed, and the data convey a clear and interesting message about the capacities of plasticity. 

      Weaknesses:

      The existence of unstable fixation and nystagmus in the patient group is alluded to, but not quantified or modeled out in the analyses. The authors may want to address this possible confound with a quantitative approach.

      We have responded to this in the “Recommendations for the authors” section of this reviewer, as they included a more detailed description of these points there.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) I think the term rod monochromats should be included early in the paper since it's a more intuitive term to describe this population.

      We agree with the reviewer that the term “rod monochromats” is more intuitive as it clarifies the retinal source of the disease but have chosen the term achromats for consistency with a wide literature of published work in this group, including our own and our close collaborators’. To clarify, in the first mention of the group as achromats in the introduction we have now added this term:

      “Achromatopsia (also known as rod monochromacy) causes cone photoreceptors in the retina to be inactive from birth (Aboshiha et al., 2014).”

      (2) The paper essentially contains two definitions of 'eccentricity'. One (atlas/segments) comes from the Benson atlas and the other (functional) comes from pRF mapping. It would be good to make this distinction terminology clearer earlier in the paper. It would also be good to use more consistent terminology. I assume 'sampled atlas V1 eccentricity' in 3A is the same as 'V1 segment' in 1A?

      For consistency we have now referred to these as V1 segment and sampled V1 segment in the figures when describing the atlas-based definition, and eccentricity for the measured pRF-based eccentricity.

      (3) The 'stability vs. plasticity' framing in the introduction could be tightened slightly.

      We have made the following changes following the reviewer’s comment:

      “In the visual domain, the focal point of the debate on plasticity and stability has hinged on the extent to which retinal input deprivation can drive local reorganisation in early visual cortex, for example, for deprived tissue to take on inputs from spared retinal locations (Adams et al., 2007; Baker et al., 2005, 2008; Baseler et al., 2002, 2011; Calford et al., 2005; Dilks et al., 2009; Dumoulin & Knapen, 2018; Ferreira et al., 2016; Goesaert et al., 2014; Haak et al., 2015; Molz et al., 2023; Ritter et al., 2019; Schumacher et al., 2008). In reality visual impairment is a more global phenomenon, affecting all levels of visual processing, with complex dynamics beyond constricted local retinocortical projection zones(Carvalho et al., 2019).”

      (4) Figure 1A, define the x axis as degrees.

      We have now added the ° sign to all the tick labels indicating Benson map eccentricity.

      (5) Figure 2B, is there room for pictures of the silent substitution/standard stimulus

      We have now added images in a Supplement 5 to avoid cluttering the main Figure 2B

      (6) Figure 2

      Panel A has a slightly weird organization. The reader is supposed to compare the square symbols to each other, and the circles to each other, why not organize the figure so they are adjacent in the graph (i.e. non selective control, non-selective achromat, selective control, selective achromat)? That also helps the reader orient that in the non-selective conditions you have almost complete pRF coverage. 

      We have taken on the reviewer’s suggestion and changed the order.

      In the inset, maybe use empty symbols? That's the traditional way to say that the square/circle applies to both red and black.

      We prefer the current format.

      Figure 2C - the symbols change to circles? Why not keep the symbols of A?

      We have now changed the symbols of 2C&D.

      I'd put the non-selective maps above the selective maps?

      We appreciate the feedback but prefer to keep it as it is, as we feel the critical point is conveyed by the rod maps.

      (7) 'We propose a new hierarchical model of neural adaptation'. These ideas are hardly new. There are also other models, that would explain your data (cumulative plasticity) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5953572/

      We thank the reviewer for the reference. We have now cited it in our discussion and removed the word “new” form the mentioned sentence.

      “Therefore, there is theoretically broader scope for experience-dependent reweighting of inputs (Beyeler et al., 2017; Makin & Krakauer, 2023) and to optimise use of inputs that are still available, more reliable, or more relevant in the impaired system. Conversely, higher-order visual areas may appear more plastic simply because they integrate the cumulative effects of learning from multiple lower stages (Beyeler et al., 2017).”

      We propose a hierarchical model of neural adaptation…” [deleted the word new]

      (8) Line 508. No image of the stimulus is contained in the paper

      Corrected

      (9) Line 620. I believe the Figure is 1B, not 1C.

      Corrected

      (10) Figure 4A. CF Size - add mm2 to the axes.

      Corrected

      Reviewer #2 (Recommendations for the authors):

      I am not an expert on pRF mapping, and as such, I am unsure how to relate to pRF mapping performed in patients with unstable fixation (not quantified, but referred to) and nystagmus, such as the achromatic population here. Since the majority of the results hinge on this analysis, I would appreciate more data about the differences between the groups. Supplement 2, which is meant to speak to this, shows only the data from 3 typical participants, and in itself is not evidence for "no correlation between stable fixation and enhanced foveal". Additionally, I'd appreciate a clear methods explanation of how the authors address these confounds; this is too important a concern to be left for the discussion section.

      We agree with the reviewer that eye movements could affect pRF measures. We have now also included data for all participants where we were able to obtain eye tracking measures and directly tested this relationship. Relevant results are copied below.

      Recap of results: 1) as expected gaze was less stable in achromats than controls, 2) achromats with more stable gaze did not show more activation in the scotoma projections zone, which we might have observed if fixation instability masks signals in this region 3) Gaze instability was not correlated with pRF size and eccentricity across V1 in achromats. We note that the relationship between nystagmus and visual sampling is complex - patients experience a stable image and may sample only during a specific phase of the eye movement. It is therefore not inherently clear if and how nystagmus affects pRF size.

      Relevant Manuscript text incorporating these analyses is copied below.

      To quantify eye movement, we used the following methods added to the manuscript:

      “Fixation stability

      Participants’ gaze was tracked throughout all pRF mapping runs. Collecting reliable gaze data from individuals with nystagmus is a challenge because out of the box calibration procedures mostly fail without stable fixation. To account for this, we implemented a post-hoc custom calibration procedure (Tailor et al., 2021). The eye-tracker was first precalibrated on a typically sighted individual. Then, before every other run, we collected gaze data from a 5-point fixation task (at fixation and above, below, left, and right of fixation at 5 eccentricity). This data allowed us to subsequently map the patient's recorded gaze coordinates to their precise locations on the screen. In 10 out of the 14 achromats we acquired reliable enough data to assess fixation stability.

      Calibration data processing: We first removed the first 0.5 seconds for each fixation location to allow for fixation to arrive on the target. We then performed (a) blink removal, (b) filtered out time points with eye movement velocity outliers (±2SD), and (c) filtered out any positions >3SDs to the left or right of the mean fixation location, and >1SD above or below. We took the median of the remaining gaze measurements as an approximate fixation estimate. The resulting 5 median fixation locations were used to fit an affine transformation that remapped the recorded gaze positions into screen space.

      Quantifying fixation stability: after applying the transformation of the post-hoc calibration, data was filtered for blinks and extreme velocities (<2SD). For each functional run, fixation instability was measured as the standard deviation of gaze x-positions across 1second windows. Measures when then averaged across the two run repeats.”

      Results (coverage section):

      “Another potential confound in our findings is fixation instability. In pRF mapping, which is usually conducted under photopic (cone-dominant) conditions, unstable fixation can cause a signal drop in the foveal projection zone. As expected due to nystagmus, the achromatopsia group showed higher fixation instability compared to controls (rodselective: t<sub>(9.08)</sub>=-3.19, p=0.01; non-selective: t<sub<(9.41)</sub>=-4.88, p<0.001 degrees-offreedom corrected for unequal-variance; see Supplement Figure S2a). However, several lines of evidence suggest this instability cannot fully account for the lack of "filling in" in achromats. First, within the achromat group, we found no correlation between fixation stability and coverage (rod-selective: spearman-r<sub>(8)</sub> = -0.36, p=0.31; non-selective spearman-r<sub>(8)</sub>=0.07,p=0.85); Individuals with more stable, control-like fixation did not show more signal inside the scotoma (see Supplement 2). Second, in adults with achromatopsia, typically with less severe nystagmus (Kohl et al., 1993), two recent studies also found absence of filling in (Anderson et al., 2024; Molz et al., 2023).

      So, while we cannot fully exclude nystagmus masking foveal signals in the cortex of some patients, this converging evidence from structural and functional MRI measures across different studies and groups, strongly suggests that the deprived cortex does not substantially ‘fill in’ with peripheral rod inputs in achromatopsia.”

      Results (pRF size + eccentricity):

      “Larger pRFs indicate that neuronal populations in achromats’ V1 cortex, combine information across larger areas in visual space than in typically sighted controls. This could reflect true neural tuning differences as well as be driven by larger eye movement. However, fixation instability in achromats do not significantly correlate with pRF size in our sample (rod-selective: spearman-r<sub>(8)</sub> = -0.41, p=0.24; non-selective spearman-r<sub>(8)</sub>=0.37,p=0.29)

      It has been shown that fitting artefacts around scotoma edges, can give rise to similar outward eccentricity shifts (Binda et al., 2013). However, when accounting for fitting artefacts around the foveal scotoma edge by modelling the rod-free zone during pRF fitting, pRF size and eccentricity differences remain unchanged (see Supplement 3). Finally, we found no significant correlations between gaze stability and the eccentricity shift (rod-selective: spearman-r<sub>(8)</sub> = 0.58, p=0.08; non-selective spearman-r<sub>(8)</sub>=0.09,p=0.8, Supplement 4D)

      Together, these analyses reveal subtle differences in how V1 of achromats responds to rod signals outside the foveal zone, which are consistent with results from other studies (Molz et al. 2023, Anderson et al. 2024). While we found no direct evidence that these are being driven by confounding factors such as eye-movements or fitting artefacts, more work is needed to understand the underlying processes that give rise to these shifts.”

      The following text has been added to Supplement 2

      “As expected, achromats showed significant higher fixation instability compared to controls (as reported in the main text). We found no significant correlation between fixation instability and either coverage, pRF size, eccentricity in achromats. Results of Spearman R correlations in both rod- and non-selective conditions are reported in the figure. We note that the relationship between nystagmus and visual sampling is complex- patients experience a stable image and may sample only during specific eyemovement phases. It is therefore not fully clear if and how nystagmus should give rise to altered pRFs.”

      The field connectivity analysis similarly seems to be used only on task data from the same design; if it was replicated from resting-state data, that would be a good way to show consistency which is independent of measures requiring fixation. 

      We agree that resting-state data would be valuable; however, we did not collect such data in these individuals due to time limitations. Instead, we demonstrate the consistency and reliability of our results by replicating our findings across two different stimulation conditions (rod-selective and non-selective), which differ in luminance, contrast and signal amplitude in both groups and for controls also in the photoreceptors involved. The convergence of results across these distinct visual conditions strengthens our confidence in the reliability of the observed effects. Also, notably, CF estimates have been shown to be robust to large eye movements, and therefore also to differences in fixation stability across groups (Tangtartharakul et al., 2023).

      The authors may want to contextualize their findings in relation to what reorganization exists in cases of late-onset loss of part of the visual field on one hand (stroke recovery), and in the case of complete blindness from early life on the other, as both speak to different levels of plasticity the visual system is capable of.

      We thank the reviewer for their comment and have added a new paragraph discussing this topic.

      Discussion:

      “Our findings on hierarchical adaptation have broader implications for other visual disorders, depending on their timing and nature. For instance, a central scotoma acquired in adulthood, as in macular degeneration, may not trigger the same V3 sampling shifts (Haak et al., 2016), suggesting a sensitive window for this form of plasticity, after which connective fields remain more stable. This also raises questions about congenital blindness, where the absence of any driving input could lead to weakening or repurposing of hierarchical connections (Saccone et al., 2024). Moreover, principles may differ between a deprived but structurally intact cortex, as in retinal dystrophies, and a physically damaged cortex, as in stroke. In the latter, more extensive reorganisation may be required to sample effectively from surviving, and potentially disparate, regions of V1. Perceptual training effects in stroke rehabilitation may reflect such dynamics (Cavanaugh et al., 2025; Elshout et al., 2021).”

      A more minor point: Can the authors clarify what the dark adaptation is used for, and provide the supplementary analysis showing that the duration difference for some of the participants didn't impact the results (stated but not shown).

      The dark adaptation period before the rod-selective condition allowed rod photoreceptors to recover from bleaching caused by prior mesopic light exposure, ensuring optimal rod sensitivity under scotopic conditions. To verify that our 15-minute adaptation period was sufficient, we tested 10 control participants with an extended 45-minute adaptation period. As we found no differences in the resulting rod maps between standard and extended adaptation protocols, these participants were combined with the main control group for all analyses. Author response image 5 are the plots for the two dark adaptation periods.

      Author response image 5.

    1. eLife Assessment

      This valuable study presents a hierarchical computational model that integrates locomotion, navigation, and learning in Drosophila larvae. The evidence supporting the model is convincing, as it qualitatively replicates empirical behavioral data. While some simplifications in neuromechanical representation and sensory-motor integration are limiting factors, the reported modular framework will be of interest for computational modeling of biological movement and adaptive behavior.

    2. Reviewer #1 (Public review):

      Summary:

      The paper presents a three-layered hierarchical model for simulating Drosophila larva locomotion, navigation, and learning. The model consists of a basic locomotory layer that generates crawling and turning using a coupled-oscillator framework, incorporating intermittency in movement through alternating runs and pauses. The intermediate layer enables navigation by allowing larvae to actively sense and respond to odor gradients, facilitating chemotaxis. The adaptive learning layer integrates a spiking neural network model of the Mushroom Body, simulating associative learning where larvae modify their behavior based on past experiences. The model is validated through simulations of free exploration, chemotaxis, and odor preference learning, demonstrating close agreement with empirical behavioral data. This modular framework provides a valuable advance for modeling of larva behavior.

      Strengths:

      Every modeling paper requires certain assumptions and abstractions. The main strength of this paper lies in its modular and hierarchical approach to modeling behavior, making connections to influential theories of motor control in the brain. The authors also provide a convincing discussion of the experimental evidence supporting their layered behavioral architecture. This abstraction is valuable, offering researchers a useful conceptual framework and marking a significant step forward in the field. Connections to empirical larval movement are another major strength.

      Weaknesses:

      While the model represents a conceptual advance in the field, some of its assumptions and choices fall behind state-of-the-art approaches. One limitation is the paper's simplified representation of larval neuromechanics, in which the body is reduced to a two-segment structure with basic neural control. Another limitation is the absence of an explicit neuromuscular control system, which would better capture the role of segmental central pattern generators (CPGs) and neuronal circuits in regulating peristalsis and turning in Drosophila larvae. Many detailed neuromechanical models, as cited by the authors, have already been published. These abstractions overlook valuable experimental studies that detail segmental dynamics during crawling and the larval connectome.

      The strength of the model could also be its weakness. The model follows a subsumption architecture, where low-level behaviors operate autonomously while higher layers modulate them. However, this approach may underestimate the complexity of real neural circuits, which likely exhibit more intricate feedback mechanisms between sensory input and motor execution.

    3. Reviewer #2 (Public review):

      The paper proposes a hierarchically layer approach to larval locomotion, chemotaxis and learning. The model consists of a basic locomotor layer with two coupled oscillators, one for crawls and one for turns. The intermediate layer modulates the frequency and amplitude of tunings to enables chemotaxis. The higher layer, integrates a spiking neural network model of the Mushroom Body to modify the door valence in response to experience as during learning.

      The model is compared to experimental data with a good degree of agreement. This modular framework provides a valuable advance for modeling larva behavior.

      Strengths:

      A novel multilayer level model that reflects current thinking of the neuronal organisation of motor control. The model is very useful to investigate the neuronal architecture of central pattern generators<br /> and higher order motor control circuits that could be linked to larval connectome data.

      Weaknesses:

      All the limitations of the model are discussed and therefore the paper perfectly fits its purpose.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      We are happy to read that this reviewer considers the proposed behavioral architecture ‘a significant step forward in the field’, and that she/he recognizes the strengths of our work in the modular and hierarchical approach that provides connections to influential theories of motor control in the brain, in the experimental evidence it is based on, and in the valuable abstractions that we have chosen for the larval behavioral modeling.

      The reviewer raises important points about the simplifications we have made, both conceptually and in the specific implementation of larval behaviors. Our main goal in this study is to introduce a conceptual framework that integrates agent-based modeling with systems neuroscience models in a modular fashion. To serve this purpose, we aimed for a minimal yet representative implementation at the motor layer of the architecture, calibrated to larval locomotion kinematics. This choice enables efficient simulation while allowing us to test top-down modulation and adaptive mechanisms in higher layers without the computational overhead of a full neuromechanical model. In addition to chemotaxis, we have recently used this simplified approach to model thermotaxis in larvae (Kafle et al., 2025, iScience, DOI: https://doi.org/10.1016/j.isci.2025.112809).

      The reviewer notes the absence of explicit segmental neuromuscular control or central pattern generators (CPGs). We deliberately abstracted from these mechanisms, representing the larval body as two segments with basic kinematic control, to focus on reproducing overall locomotor patterns. This bisegmental simplification, which we illustrate in Supplemental Video “Bisegmental larva-body simplification”, retains the behavioral features relevant to our current aims. However, the modular structure of the framework means that more detailed neuromechanical models—incorporating CPG dynamics or connectome-derived circuit models—can be integrated in future work without altering the architecture as a whole.

      We fully agree that real neural circuits are more complex than a strict subsumption architecture implies. In the Drosophila larva, there is clear evidence for ascending sensory feedback from the motor periphery to premotor and higher brain circuits, as well as neuromodulatory influences. These add layers of complexity beyond the predominantly descending control in our present model. At the same time, both larval and adult connectome data show that across-level descending and ascending connections are sparse compared to the dense within-layer connectivity. We see value in casting our model as a hierarchical control system precisely to make the strengths and limitations of such an abstraction explicit. The revised manuscript will include further discussion of these points.

      In summary, our design choices reflect a trade-off: by limiting the biological detail in the lower layers, we gain computational efficiency and maintain a clear modular structure that can host models at different levels of abstraction. This ensures that the architecture remains both a tool for immediate behavioral simulation and a scaffold for integrating richer neural and biomechanical models as they become available.

      Reviewer #2 (Public review):

      We thank the reviewer for recognizing the novelty of our locomotory model, particularly the implementation of peristaltic strides based on our new analyses of empirical larval tracks, and for providing constructive feedback that will help us improve the manuscript.

      The reviewer highlights the need for clearer explanations of the chemotaxis and odor preference modules. We expand these sections in the revised manuscript with more explicit descriptions of model structure, parameterization, and calibration. As mentioned above, we have also prepared a separate preprint dedicated to the larvaworld Python package, which contains detailed implementation notes and hands-on tutorials that allow users to adapt or extend individual modules.

      Regarding the comparison to empirical behavior in chemotaxis, our present analysis is indeed primarily qualitative. However, we would like to emphasize that the temporal profile of odor concentration at the larval head in our simulations matches that measured in Gomez-Marin et al. (Nature Comm., 2011, DOI: https://doi.org/10.1038/ncomms1455) using only one additional free parameter, while all parameters of the basic locomotory model had been fitted to a separate exploration dataset before and were kept fixed in the chemotaxis experiments. In addition to the simulation of chemotaxis in the present paper, we recently used larvaworld in a practical model application to estimate a species-specific parameter of thermotaxis from experiments across different drosophilids (Kafle et al., 2025, iScience, DOI: https://doi.org/10.1016/j.isci.2025.112809).

      The preference index in our simulations was computed using the same definition as in the established experimental group assay for larval memory retention, enabling a direct quantitative comparison between simulated and empirical results. Variability in the simulated outcomes arose naturally from inter-individual differences in body length and locomotory parameters, derived from real larval measurements, as well as from the random initial orientation of each individual in the arena. These factors contributed to variation in individual tracks and ultimately produced preference index values that closely matched those observed experimentally. In the revised manuscript, we also discuss handedness, as highlighted by the reviewer, as another meaningful expression of inter-individual variability in Drosophila larvae and insects more generally.

      Finally, we acknowledge the reviewer’s concern about the scalability and broader applicability of the model. While the present paper focuses on three specific behavioral paradigms (exploration, chemotaxis, odor preference), the modular structure of the architecture is designed for flexibility: modules at any layer can be exchanged for more detailed or alternative implementations, and new sensory modalities or behaviors can be integrated without redesigning the system. The larvaworld package, associated codebase, and documentation are openly available to encourage adoption and adaptation by the larval research community.

      Reviewer #3 (Public review):

      This public review provides an excellent account of our central aim to build an easily configurable, well-documented platform for organism-scale behavioral simulation and we are happy to read that the reviewer considers this an excellent goal.

      We thank the reviewer for her/his account of our well-organized code using contemporary Python tooling. We are currently further improving code readability and code documentation, and we will release a new version of the larvaworld Python package. We further agree with the reviewer’s assessment that understanding the model calibration currently requires reading of the appendix. For the revised manuscript we thus aim at improving our description of all calibration and modeling steps along the way. We will also make sure to improve the description of the experimental datasets used for calibration.

      We recognize that our description of the paper’s scientific contribution could be clearer. In revision, we will sharpen the Introduction and Discussion to highlight our main contributions:

      (1) Promoting a shift from isolated neural circuit modeling to integrated agent-based simulations in realistic environments.

      (2) Proposing the layered behavioral architecture, adopting the subsumption paradigm for modular integration.

      (3) Providing the larvaworld software as a ready-to-use, extensible modeling platform.

      (4) Implementing an empirically calibrated locomotory model and demonstrating its integration with navigation and learning modules in replicated behavioral paradigms.

      We agree with the reviewer that the next challenge is to integrate the empirically based behavioral simulations presented here with functional brain models capable of reproducing or predicting experimental findings at the level of cellular neurophysiology, including the effects of cell-type-specific manipulations such as gene knock-down or optogenetic activation/inhibition. However, based on our experience with systems-level modeling, we deliberately invested in behavioral simulation because functional models of the nervous system—including our own—often lack translation into simulated agent behavior. In many cases, model output is limited to one or more variables that can at best be interpreted as a behavioral bias, and most often represents an “average animal” that fails to capture inter-individual differences. By linking our spiking mushroom body model to behavioral simulations in a group of individual agents during memory retention tests (Figure 6C,D), we were able to achieve a first successful direct comparison between simulated and experimental behavior metrics—in this case, the behavioral preference index reported in Jürgensen et al. (iScience, 2024, DOI:

      https://doi.org/10.1016/j.isci.2023.108640).

      Finally, we reiterate that the layered behavioral architecture is designed to promote a modular modeling paradigm. Our adoption of a subsumption architecture does not conflict with the concept of behavioral primitives; on the contrary, the notion that such primitives follow (semi-)autonomous motor programs and can be combined into more complex behaviors was the starting point for our implementation of the architecture in the fly larva. In our view, a genuinely contradictory paradigm for neural control of behavior would require a non-modular, strictly non-hierarchical organization of the nervous system and, by extension, of behavioral control.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      See public review for main points. To summarize, I find the conceptual framework of the paper very valuable and an important advance. However, in this age of data, I would have expected that the authors would make an effort to build more realistic models that could relate directly to neural data (including connectome and activity) and muscular dynamics at the segmental level.

      This point is addressed in detail in our public review response. In brief, we agree that a segmental neuromechanical model informed by connectome data would provide richer mechanistic insight. However, such an approach would greatly increase complexity and reduce accessibility. Our aim here is to present a coarse-grained, kinematic-level framework that is modular, extensible, and designed to accommodate models at different levels of abstraction. Importantly, extensions that incorporate realistic neuromechanics or connectome-derived circuits can be readily integrated, provided they conform to the modular principles of the proposed behavioral architecture.

      The authors do not cite figures in order or appearance, which makes it hard to read.

      This has been corrected. Figures are now cited in the correct order throughout the revised manuscript.

      I would explain the model in more detail in the main text. Currently, the model is introduced through Figure 1 in an abstract way. It is really hard to make the connection between this figure to the nuts-and-bolts of neuromechanics. And, I believe, for this paper, the details of the modeling matter and are not just technical points to be hidden in the appendix. The video (video 1) is not helpful.

      We have restructured the Model section to provide more detail directly in the main text, moving explanations that were previously confined to the Appendix. This includes explicit description of the locomotory oscillator model, the intermittency module, and their empirical calibration. At the same time, we retained mathematical and implementation details in Materials & Methods to keep the reading flow accessible. Additionally, we expanded the caption of Video 1 and clarified in the text what it illustrates, making the video more informative.

      Modeling choices lead to further weaknesses. While the model can replicate observed locomotory patterns, it does not fully explain the underlying neurobiological mechanisms that govern behavioral intermittency. For example, the crawl-bend interference mechanism, while capturing observed phase-dependent attenuation of turning, is implemented in a simplified, statistical manner rather than being derived from detailed neuromuscular dynamics. The intermittent locomotion model, which generates alternating runs and pauses, relies on log-normal distributed stridechains but does not explicitly model neural mechanisms responsible for switching between movement states.

      We agree with this point. A fully mechanistic implementation of crawl-bend interference would require a detailed segmental neuromechanical model, which we deliberately refrained from integrating in order to keep the current study tractable and focused on a coarse-grained, kinematic-level description. Likewise, the intermittency module is currently based on data-fitted distributions of stridechains and pause durations, without explicit modeling of the neural mechanisms responsible for switching between these states. To our knowledge, these mechanisms remain unresolved, though alternative approaches have been suggested, for example, an artificial neural network model of intermittency (Sakagiannis et al., 2020). To ensure this limitation is transparent to the reader, we now explicitly state it in a newly added “Limitations of the study” subsection in the Discussion.

      We also highlight that the behavioral architecture is designed to be extensible, so that future work may incorporate such mechanistic models when available, while preserving the modular framework.

      I am curious about why the authors chose to model the mushroom body with much more realism than other modules.

      We clarified that this choice was not due to a bias in modeling depth, but to demonstrate the modularity and flexibility of the architecture. The mushroom body (MB) model we integrated was developed in our previous work as a biologically realistic spiking neural network. By incorporating it into the current framework, we show that models of very different abstraction levels – from simple statistical oscillators to detailed spiking networks – can coexist and interact under the same architecture. This rationale is now explicitly stated in the Discussion.

      Reviewer #2 (Recommendations for the authors):

      The manuscript from Sakagiannis et al. proposes a novel model for locomotion and foraging in Drosophila. Their ambition is to make a unified model that will incorporate distinct layers of complexity to describe and predict the locomotor behaviour of a larva, during exploration, chemotaxis and even learning. The paper fails in doing so, starting with a rather interesting exploratory model and becoming less and less convincing as it progresses, with thinner (chemotaxis) and thinner (learning) experimental and theoretical support. The model for chemotaxis is extremely simplified compared to the work of other laboratories. The associative learning paradigm is taken from another paper from the same research group and is not sufficiently explained. In its current form, the paper is of very limited theoretical and practical value. The analysis is insufficient to judge the overall quality and scalability of the model. It is hard to know if the model could be adopted by others in the larval community more widely in other animals. Would it be flexible and robust enough to be used to model other behavioural conditions?

      We appreciate this critical perspective. Our aim is not to present a final, fully parameterized model of all larval behaviors, but to introduce a flexible, modular behavioral architecture that integrates models at different levels of abstraction and can be expanded by the community. To support adoption, we have revised the manuscript to highlight the availability of the framework as a Python package (larvaworld), supplemented with documentation, tutorials, and code examples. This makes it easier for other researchers to reuse, extend, and test the architecture under additional behavioral conditions. We also explicitly refer to modeling studies that have adopted the proposed framework and the locomotory model itself.

      Below, we address the reviewer’s points layer by layer.

      (1) Exploratory behaviour. The strongest part of the paper. The authors propose a new method to analyse locomotion. They take into consideration the instantaneous linear and angular velocity. They assume the existence of two oscillators, which is really interesting. They incorporate the distribution of pauses duration and number of the strides. The incorporation of the strides is very exciting. They do not include handedness with has already been studied and incorporated in a mode for exploration they seem to have missed (Wosniack et al 2022). Figure 4 shows the dispersion. At first glance, it is very obvious that the model larvae do not behave like the animal. The distance they move from the centre is wider (Figure 4A). What is measured in dispersion (Figure 4B)? Just the distance travelled during 40s? A better measure of the similarities or differences between the model and real larvae would be interesting, such as analysing the Mean Square Displacement. Would the model be good if compared to the long-term exploratory behaviour from Sims et al. 2020, that the author previously used?

      The authors should convince the readers that their model is better, or at least as good than the ones already available.

      We thank the reviewer for these constructive suggestions. In the revised manuscript we now reference and discuss handedness, citing Wosniack et al. (2022, eLife), and highlight its potential role as an additional axis of individual variability. We also clarified the distance metrics used in Figure 4: dispersal denotes the Euclidean distance from the origin at the end of the trajectory, while pathlength denotes the cumulative distance travelled. Since larvae typically encounter the arena boundary within the first 40 seconds of exploration, dispersal is shown only over this interval.

      With respect to the reviewer’s suggestion of using mean-squared displacement (MSD), we now explicitly describe the relation between dispersal and MSD. Dispersal is an individual-level displacement measure from which population-level metrics such as MSD can be directly derived.

      Regarding long-term exploration, we agree that extended trajectories—as reported by Sims et al. (2020) over timescales of up to one hour—constitute a valuable complementary regime. Our experimental dataset is limited to 3-minute recordings in a bounded Petri dish, which constrains the accessible timescales of dispersal analysis. We now explicitly note in the Results that comparison to long-horizon datasets such as Sims et al. (2020) represents an important future direction that will require larger or unbounded arenas.

      Together, these revisions strengthen the presentation of the exploration results and clarify how our model relates to established statistical measures of larval foraging behaviour.

      (2) Chemotaxis. The chemotaxis model is so briefly explained in the result section that it is hard to understand. A modulation of the frequency and amplitude of lateral oscillator as a function of the concentration? The authors cannot differentiate between weathervaning and turning in this model (at least I can't understand how). What happened with the distribution of pauses and the directions of turns in Figure 5? The authors do not use real behavioural data to contract their model. How do we know that the parameters they have used reflect the larval behaviour? For example: what is the success rate for larvae to reach the area of high concentration? How close do they get? What is the length of the tracks from start to a target area of high concentration? Where are the calibration data for chemotaxis? This information is critical to understand the model, it needs to be shown in the result section. The authors mention an 8.9uM peak concentration. Of what?

      The model is oversimplified in comparison with Davies et al. 2015 and it is not clear at all how it reflects the real chemotaxis, which is a rather complex behaviour.

      We thank the reviewer for these detailed comments. In the revised manuscript we substantially expanded the description of the chemotaxis model. We now provide an explicit mathematical formulation of how odor concentration modulates the lateral oscillator through the quantity A<sub>0</sub>, which perturbs both the frequency and amplitude of bending according to the mechanism proposed by Wystrach et al. (2016). We additionally clarify that the motor layer - including the intermittency module and all parameters governing crawling, pausing, and turning - remains fully identical to the configuration calibrated on the exploration dataset; no refitting was performed for the chemotaxis condition.

      To address the reviewer’s question regarding the distinction between weathervaning and head casting, we now explain that both behaviours emerge naturally from the same coupled-oscillator structure via stride-phase–dependent crawl–bend interference. High-amplitude headcasts occur during pauses when crawl-induced attenuation is lifted, whereas low-amplitude weathervaning arises during runs when the interference is active.

      This unified mechanism eliminates the need for separate modules.

      The chemotaxis experiments were implemented to qualitatively replicate the behavioural patterns described in Gómez-Marín et al. (2011, Fig. 1A–1F), and we now include explicit figure references in the captions. Because the present implementation is a proof of concept rather than a quantitatively calibrated chemotaxis model, we do not report success rates, approach distances, or track-length statistics, as these depend strongly on odorscape geometry and calibration against quantitative single-animal datasets that were not available for the current work. This clarification has been added to the text and is stated explicitly again in the Limitations section.

      Finally, we now specify that the reported odor concentrations (e.g. 8.9,µM) follow the values used in Gómez-Marín et al. (2011), and we added the precise Gaussian function used to generate the odorscape in the Materials & Methods. Together, these revisions provide a clear and transparent account of the chemotaxis model and its scope.

      (3) Associative learning paradigm. I assume that the authors intended to incorporate a bias in chemotaxis behaviour towards a particular odorant (CS) that would have been associated with a reward food (US). However the model works slightly differently, it is represented by an aversive and an appetitive gradient.

      Theoretically, this is already an assumption (unless there is evidence for it, that should be referenced). It would be more conservative to have one neutral side and one appetitive (attractive) side. Second, the use of a mushroom body model, (even though it has already been published) to decide on the valence adds a layer of complexity that seems unnecessary. The learning process is different from the output process. Finally, the model intends to show us a "realist simulation of Drosophila locomotion" and we do not know how the larvae reach the right side during the test. It would be useful to have some comparison of the larval and model behaviour towards the preferred side.

      In this last section, the objective of the research unweaves and falls short of its ambition.

      We thank the reviewer for these helpful comments. In the revised manuscript we clarified that our implementation follows the standard larval conditioning protocol in which a rewarded odor (CS+) is tested against a neutral odor, not against an aversive one. The previously contradictory phrasing has been corrected, and the text now consistently reflects the established experimental procedure.

      We further explain that the mushroom body (MB) model is included not in order to increase biological complexity in this section, but to demonstrate the flexibility of the proposed behavioral architecture: detailed circuit models and more abstract motor modules can coexist under the same framework. The MB model implements associative plasticity independently of any behavioral simulation, and its output - a scalar odor valence - is transformed linearly into an odor-gain parameter that modulates turning during the test phase. This separation between learning and behavioral output mirrors the logic of the biological system while keeping the overall architecture modular.

      Regarding the reviewer’s request for insight into “how larvae reach the right side,” we note that standard group assays used in larval olfactory learning provide only population-level preference indices rather than detailed individual trajectories. Our comparison to empirical data therefore relies on these established preference indices, which the model successfully reproduces across training trials, including the characteristic saturation reported in Jürgensen et al. (2024). We now state explicitly that although the behavioral simulation does generate full trajectories for each virtual larva, the lack of corresponding experimental single-animal tracks precludes a direct trajectory-level comparison. This clarification has been added to the revised text.

      Together, we believe that these revisions improve clarity and better situate the learning simulations within both the behavioral architecture framework and the constraints of available experimental data.

      Reviewer #3 (Recommendations for the authors):

      Figure 1a is very dense and I am struggling with the terms "reactive" and "basic" due to a general lack of clarity about the details of the model organization. For example, why do all of the sensory inputs point to turning proprioception? Why is proprioception two different things for turning and crawling? Why are some senses in light green while olfaction is in dark green? Why is feedback only from feeding, when crawling, head casting, and turning will change the sensory environment as well? Why is head casting not a behavioral module here? Why focus on following/being constrained by the "subsumption architecture paradigm" over a focus on the known literature and neuroanatomy?

      We thank the reviewer for this careful inspection of Figure 1. In the revised version we improved both the figure and its caption, as well as the corresponding description in the text.

      Specifically:

      - The “basic” layer has been renamed the “motor” layer for clarity, and the caption has been expanded to better describe each component.

      - The sensory inputs are now shown to target the motor layer as a whole, rather than just the proprioceptive component of turning.

      - Each motor module is conceptualized as a sensorimotor loop (green-red), which explains why proprioception appears in both crawling and turning.

      - The color coding has also been clarified: modules used in the current simulations are shown in darker shades, while others are faded.

      - Sensory perturbations caused by body locomotion – as in the case of crawling and turning – are not depicted in the figure as feedback between modules. We make this more explicit in the caption. The signal from feeding to the above layers is neuromodulatory – as indicated by the purple arrowhead.

      Finally, we explain that head casting and weathervaning are not modeled as separate modules, since both behaviors emerge from the coupled oscillator mechanism through crawl-bend interference. Our adherence to the subsumption architecture paradigm is motivated by its success in robotics and its conceptual alignment with hierarchical sensorimotor loops, but we have now made clearer that this is a simplifying framework rather than a rigid constraint.

      "Stimulus free conditions" (line 102) don't really exist. Substrate and temperature will always be present, light will have some intensity, etc. Does this really refer to fictive behaviors?

      We thank the reviewer for raising this point. In the revised manuscript we have removed the term “stimulus-free conditions” entirely to avoid the misleading implication that larvae experience no sensory input. We now explicitly describe these experiments as free exploration in the absence of navigation-guiding gradients, which accurately reflects the laboratory assay while avoiding any suggestion of fictive behavior. This terminology has been updated consistently throughout the text.

      The first results section is closer to an introduction than the intro itself is, owing to its focus on the context of the work the paper actually does rather than a broad review of larval behaviors that are not considered within this work.

      We believe the reviewer is referring to the “Model” section rather than the “Results.” The Model section is deliberately separated to outline the theoretical background of the behavioral architecture and to make explicit the general modeling assumptions, which explains why it cites previous work in detail. By contrast, the Introduction is intended as a brief overview of the broader larval behavioral repertoire, since the larva serves here as the case study for our framework. Presenting this repertoire is important because it defines the behaviors that populate the different layers of the architecture, even if only a subset of them is implemented in the simulations presented in this study.

      While the model components are described in the modeling section, no question is actually discussed. What is the goal of this model?

      This broader question is addressed in the public review section

      "Crawler" and "turner" are inconsistently described. They are described as "modules" in Figure 1, but they seem more like behavioral primitives.

      The specific terms "crawler" and "turner" refer to the computational modules, but correctly the reviewer points out that these generate the respective “crawling” and “turning” behavioral primitives. This has been made explicit in the Materials & Methods.

      Do larva-larva interactions matter here?

      In the revised manuscript we now state explicitly that larva–larva interactions are not included in the present simulations, as each virtual larva is modeled independently in accordance with the single-animal datasets used for calibration. We also point the reader to the Limitations section, where we note that although social interactions lie outside the scope of this study, the Larvaworld software package already supports tactile sensing and collision handling, enabling such interactions to be incorporated in future work.

      The description of the locomotor system, with coupled oscillators between crawling frequency and bending is very empirical. Is this because of the 2-segment model effectively limiting peristalsis to a single segment? What are the limits of this approach?

      The stride-phase–dependent modulation of bending amplitude was identified through kinematic analysis of full 12-segment larval datasets and is therefore independent of our later decision to implement a two-segment simplification. This means that the empirical relationship we describe should hold for any multisegment model, regardless of the reduced representation used in the present implementation. Generally, we performed our detailed empirical analyses with the goal to uncover statistical relations, which in turn were use for our data-driven coupled oscillator model in combination with the stochastic element of stride-chain and pause duration.

      Line 190: The paper starts discussing experimental larva tracks. These experiments need to be described.

      The reviewer probably refers to the dataset analysed in this study. This is a public dataset as described in the Dataset Description section in Materials & Methods, along with a description of the experiment per se.

      The purpose of Figure 2 is not entirely clear. Several panels are not referenced in the text (F,G,H) and all panels are referenced extremely out of order. Figure 3 is similarly hard to follow for the same reasons of being referenced out of order. In fact, this section is largely duplicated by the "Model calibration" appendix, which I find to be much more clearly written and with more directly relevant figure panels.

      In the revised manuscript, all panels of Figures 2 and 3 are now cited in the correct order, and their roles in the narrative have been clarified. Figure 2 is explicitly presented as a summary of the empirical kinematic analyses that motivate the structure of the locomotory model, while Figure 3 illustrates the corresponding model components. To avoid redundancy with the “Model calibration” appendix, we streamlined the main text and replaced duplicated descriptions with cross-references to the appendix, which contains the full methodological detail.

      The data describe larvae behaving with a range of parameters, presumably both as individuals and across time. However, the models described seem to employ a population of larvae that shares a common best-fit parameter and the equations presented in the methods are all ordinary differential equations without noise or stochasticity. Where is the inter-individual variation coming from?

      The reviewer is correct to point out the importance of variability. Our approach is agent-based, and we model populations of non-identical individuals rather than replicates of a single average larva. The simulated larvae retain variability across several parameters, capturing the combined range observed in the data. This was described in the original manuscript, and to avoid possible misunderstandings, we have now expanded the “Inter-individual variability” section in the Materials & Methods and, where appropriate, clarified this point elsewhere in the text.

      The absolute orientation of trajectories in Figure 4A is not meaningful in your model. I suspect it would be more informative to show aligned trajectories in order to better visually assess the behavioral similarity. Also, the biological experiment needs to be described here. Time crawling seems to not be a great fit, although the peaks are fairly well aligned. Do you have thoughts on why this is?

      In Figure 4A, which is intended as a visual comparison between experimental and simulated trajectories, the experimental tracks were transposed so that all starting points coincide at the center of the arena. As the reviewer notes, they were not rotated to a common axis, since our subsequent analysis focuses on spatial dispersal rather than directional alignment. The description of the experimental dataset has been clarified in the revised text.

      The reviewer is also correct that the distribution of time spent crawling is narrower in the simulations than in the experimental data. This reflects the fact that in the present study only three crawling-related parameters were sampled to generate inter-individual variability, and time spent crawling was not among them. We deliberately chose to assess how well the model reproduces distributions for behavioral metrics that were not explicitly fitted or parameterized. This point has now been made explicit in the revised manuscript.

      How did you assess the agreement of chemotaxis results with Gomez-Martin et al? It would be useful for the comparison to be made explicit within this paper, as well. How were the chemotaxis parameters fit?

      The agreement between experimental and simulated chemotaxis was assessed only qualitatively, as we did not perform quantitative locomotor analyses on chemotaxis datasets. For these simulations we used the same motor layer, including all its modules, as calibrated in the free-exploration condition (Fig. 4). The only additional adjustment was a single weighting parameter that translates the appetitive or aversive valence of odor sources into modulatory input for the bending module. This parameter was tuned manually using a visual criterion of performance, to ensure that both attractive and aversive chemotaxis were observable. We now make explicit in the text that for more complex simulations we retain the calibration obtained in simpler conditions and build upon it, rather than re-optimizing the model. Moreover, we now provide reference to the exact figure numbers in Gomez-Martin et al. for direct visual comparison also of the perceived concentration metrics in our Figure 5E&F where experimental and simulated data show a very good correspondence.

      Similarly, what are the key parameters for the mushroom body model and how did you fit their relationship to behavior? Was there actually feedback between the behavior of the larva and the training or was the SNN only used to generate the odor gain constant?

      The reviewer is correct to highlight this point. In the present study the mushroom body model was simulated independently to generate the odor-specific behavioral bias. This output was then translated into an odor gain constant, which served as input for the subsequent behavioral simulations of odor preference. There was no closed-loop interaction between the larval behavior and the training of the spiking network in this version. Establishing such a closed-loop connection is part of our future goals.

      It is unclear where feeding (as introduced in Figure 1) entered into the work presented, if at all.

      The reviewer is correct that the feeding module does not play a role in the present study. It was included in the behavioral architecture for completeness and because it is already implemented in the larvaworld package (see Sakagiannis et al., 2024). We have clarified this in the revised text.

      "During pauses, the input to the crawler module I_c = 0 and therefore forward..." The equations presented for the crawler module do not contain I_c.

      The inconsistency regarding the crawler module input has also been corrected. The equations now explicitly include the tonic input parameter, making them consistent with the descriptive text and our model implementation.

      Larva do more than crawl forward, they can also hunch up, head cast with their head in the air, dig, crawl backward, roll, and other behaviors. Because the individual modules in this framework have been defined as coupled oscillators, how would you decide to implement such aspects? At what point does the oscillator approach break down? In this model, how does the larva decide whether to bend left or right, and how is that affected by the environment or internal state? Can a larva bend in the same direction twice in a row?

      The intermittent coupled-oscillator model presented here does not attempt to cover the full larval repertoire, such as hunching, digging, backward crawling, or rolling. Nor does it explicitly implement handedness as a directional bias. Nevertheless, the framework already allows for sequences of repeated turns: from a stationary position a larva can execute successive bends of varying amplitude, which may occur in the same direction, mimicking repeated head casts to one side.

      Extending the model to include additional locomotor primitives would require the development of new modules, which could expand the basic locomotor layer either alongside or in place of the current lateral oscillator module. As noted in the manuscript, the modules implemented here are not intended as definitive but as placeholders that demonstrate how the architecture can integrate more elaborate models in the future. In this context, future directions include introducing handedness as part of inter-individual variability and enriching the behavioral repertoire with additional modules to capture the broader range of larval actions.

      I was not able to install `larvaworld` either through pip in a fresh environment on OS X 15 and various Python versions between 3.8 and 3.12. I ran into a range of issues, from `tables` (which is understandable) to issues installing the old NumPy in Python 3.12 where `setuptools` is no longer included. The packaging should be made more robust, or the working environment could be better defined. For example, the version pinning of dependencies seems much more strict than I would expect for a user-focused Python library, particularly with out-of-date versions of core tools like NumPy.

      We thank the reviewer for going to length and testing the implementation and pointing these issues to us. We have recently updated the package (version 2.0.1, November 2025) to improve installation robustness, relaxed unnecessary dependency pinning, and provided an environment specification to facilitate reproducibility. The revised manuscript directs users to recently updated installation instructions.

      Automated testing for python versions 3.10-3.11 for MacOS, Windows and Ubuntu is already implemented. Unfortunately we have not yet tried it on OS X15. Please post any issues on the larvaworld’s github page : https://github.com/nawrotlab/larvaworld.

    1. eLife Assessment

      This important study combines behavioural psychophysics with image-computable modelling to test whether face recognition relies on view-selective or view-tolerant mechanisms. Although the diagnostic orientation content of faces varies with viewpoint (more horizontal for frontal views, more vertical for profiles), human recognition remains predominantly tuned to horizontal information, consistent with the predictions of a view-tolerant model. The evidence for view-tolerant tuning to horizontal orientations is compelling, although questions remain about the plausibility of the computations implemented in the view-tolerant model and how they map onto mechanisms of everyday face recognition.

    2. Reviewer #1 (Public review):

      Summary:

      The authors describe the results of a single study designed to investigate the extent to which horizontal orientation energy plays a key role in supporting view-invariant face recognition. The authors collected behavioral data from adult observers who were asked to complete an old/new face matching task by learning broad-spectrum faces (not orientation filtered) during a familiarization phase and subsequently trying to label filtered faces as previously seen or novel at test. This data revealed a clear bias favoring the use of horizontal orientation energy across viewpoint changes in the target images. The authors then compared different ideal observer models (cross-correlations between target and probe stimuli) to examine how this profile might be reflected in the image-level appearance of their filtered images. This revealed that a model looking for the best matching face within a viewpoint differed substantially from human data, exhibiting a vertical orientation bias for extreme profiles. However, a model forced to match targets to probes at different viewing angles exhibited a consistent horizontal bias in much the same manner as human observers.

      Strengths:

      I think the question is an important one: The horizontal orientation bias is a great example of a low-level image property being linked to high-level recognition outcomes and understanding the nature of that connection is important. I found the old/new task to be a straightforward task that was implemented ably and that has the benefit of being simple for participants to carry out and simple to analyze. I particularly appreciated that the authors chose to describe human data via a lower-dimensional model (their Gaussian fits to individual data) for further analysis. This was a nice way to express the nature of the tuning function favoring horizontal orientation bias in a way that makes key parameters explicit. Broadly speaking, I also thought that the model comparison they include between the view-selective and view-tolerant models was a great next step. This analysis has the potential to reveal some good insights into how this bias emerges and ask fine-grained questions about the parameters in their model fits to the behavioral data.

      Weaknesses:

      I'll start with what I think is the biggest difficulty I had with the paper. Much as I liked the model comparison analysis, I also don't quite know what to make of the view-tolerant model. As I understand the authors' description, the key feature of this model is that it does not get to compare target and probe at the same yaw angle, but must instead pick a best match from candidates that are at different yaws. While it is interesting to see that this leads to a very different orientation profile, it also isn't obvious to me why such a comparison would be reflective of what the visual system is probably doing. I can see that the view-specific model is more or less assuming something like an exemplar representation of each face: You have the opportunity to compare a new image to a whole library of viewpoints and presumably it isn't hard to start with some kind of first pass that identifies the best matching view first before trying to identify/match the individual in question. What I don't get about the view-tolerant model is that it seems almost like an anti-exemplar model: You specifically lack the best viewpoint in the library but have to make do with the other options. I sort of understand the reasoning that this enforces tolerance of viewpoint variability, but I'm not clear on whether or not this is a version of face familiarity and recognition that the authors think has an analog in human visual processing.

      I do think that this model is interesting in terms of the differential tuning it exhibits, but don't find it easy to align with any theoretical perspective on face recognition. Specifically, do the authors think there is a stage of face processing in which tolerance as they've operationalized it in the model is extant? What I'm looking for is a concrete description of the circumstances that the authors are saying lead to this kind of model potentially being a meaningful analog of face recognition. For example, is the idea that one may become familiar with a face in some very limited set of viewpoints and then be presented with that face in other views?

      Alternatively, if the authors prefer to say that they simply thought this was a nice exercise in terms of identifying a different model and that it may not be a meaningful proxy for face recognition. I think that's fine, to be clear! I just still don't see anything in the text that convinces me of the ecological validity of this version of view-tolerance.

    3. Reviewer #2 (Public review):

      This study investigates the visual information that is used for the recognition of faces. This is an important question in vision research and is critical for social interactions more generally. The authors ask whether our ability to recognise faces, across different viewpoints, varies as a function of the orientation information available in the image. Consistent with previous findings from this group and others, they find that horizontally filtered faces were recognised better than vertically filtered faces. Next, they probe the mechanism underlying this pattern of data by designing two model observers. The first was optimised for faces at a specific viewpoint (view-selective). The second was generalised across viewpoints (view-tolerant). In contrast to the human data, the view-specific model shows that the information that is useful for identity judgements varies according to viewpoint. For example, frontal face identities are again optimally discriminated with horizontal orientation information, but profiles are optimally discriminated with more vertical orientation information. These findings show human face recognition is biased toward horizontal orientation information, even though this may be suboptimal for the recognition of profile views of the face.

      One issue in the design of this study was the lowering of the signal-to-noise ratio in the view-selective observer. This decision was taken to avoid ceiling effects. However, it is not clear how this affects the similarity with the human observers.

      Another issue is the decision to normalise image energy across orientations and viewpoints. I can see the logic in wanting to control for these effects, but this does reflect natural variation in image properties. So, again, I wonder what the results would look like without this step.

      Despite the bias toward horizontal orientations in human observers, there were some differences in the orientation preference at each viewpoint. For example, frontal faces were biased to horizontal (90 deg) but other viewpoints had biases that were slightly off horizontal (e.g. right profile: 80 deg, left profile: 100 deg). This does seem to show that differences in statistical information at different viewpoints (more horizontal information for frontal and more vertical information for profile) do influence human perception. It would be good to reflect on this nuance in the data.

      Comments on revisions:

      I am happy with the response and changes to the comments in my review. The key findings from this study are: (1) that there is bias toward the use of horizontal information across all viewpoints for face recognition in humans using an old-new recognition task. (2) In contrast, the optimal information for matching faces varies as a function of viewpoint. The view-selective model shows horizontal information is dominant for frontal views and vertical information is dominant for profile views.

      The data from the view-tolerant model is less easy to interpret as it doesn't fit with any theoretically plausible model of face recognition. It might be a useful model for a face matching task in which participants had to match unfamiliar faces across viewpoints. This might be a possible extension of the current work.

      Nonetheless, I still think this is an interesting contribution to the literature.

    4. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      The authors describe the results of a single study designed to investigate the extent to which horizontal orientation energy plays a key role in supporting view-invariant face recognition. The authors collected behavioral data from adult observers who were asked to complete an old/new face matching task by learning broad-spectrum faces (not orientation filtered) during a familiarization phase and subsequently trying to label filtered faces as previously seen or novel at test. This data revealed a clear bias favoring the use of horizontal orientation energy across viewpoint changes in the target images. The authors then compared different ideal observer models (cross-correlations between target and probe stimuli) to examine how this profile might be reflected in the image-level appearance of their filtered images. This revealed that a model looking for the best matching face within a viewpoint differed substantially from human data, exhibiting a vertical orientation bias for extreme profiles. However, a model forced to match targets to probes at different viewing angles exhibited a consistent horizontal bias in much the same manner as human observers.

      Strengths:

      I think the question is an important one: The horizontal orientation bias is a great example of a low-level image property being linked to high-level recognition outcomes, and understanding the nature of that connection is important. I found the old/new task to be a straightforward task that was implemented ably and that has the benefit of being simple for participants to carry out and simple to analyze. I particularly appreciated that the authors chose to describe human data via a lower-dimensional model (their Gaussian fits to individual data) for further analysis. This was a nice way to express the nature of the tuning function, favoring horizontal orientation bias in a way that makes key parameters explicit. Broadly speaking, I also thought that the model comparison they include between the view-selective and view-tolerant models was a great next step. This analysis has the potential to reveal some good insights into how this bias emerges and ask fine-grained questions about the parameters in their model fits to the behavioral data.

      Weaknesses:

      I will start with what I think is the biggest difficulty I had with the paper. Much as I liked the model comparison analysis, I also don't quite know what to make of the view-tolerant model. As I understand the authors' description, the key feature of this model is that it does not get to compare the target and probe at the same yaw angle, but must instead pick a best match from candidates that are at different yaws. While it is interesting to see that this leads to a very different orientation profile, it also isn't obvious to me why such a comparison would be reflective of what the visual system is probably doing. I can see that the view-specific model is more or less assuming something like an exemplar representation of each face: You have the opportunity to compare a new image to a whole library of viewpoints, and presumably it isn't hard to start with some kind of first pass that identifies the best matching view first before trying to identify/match the individual in question. What I don't get about the view-tolerant model is that it seems almost like an anti-exemplar model: You specifically lack the best viewpoint in the library but have to make do with the other options. Again, this is sort of interesting and the very different behavior of the model is neat to discuss, but it doesn't seem easy to align with any theoretical perspective on face recognition. My thinking here is that it might be useful to consider an additional alternate model that doesn't specifically exclude the best-matching viewpoint, but perhaps condenses appearance across views into something like a prototype. I could even see an argument for something like the yaw-averages presented earlier in the manuscript as the basis for such a model, but this might be too much of a stretch. Overall, what I'd like to see is some kind of alternate model that incorporates the existence of the best-match viewpoint somehow, but without the explicit exemplar structure of the view-specific model.

      The design of the view-tolerant model aligned with the requirements of tolerant recognition and revealed the stimulus information enabling to abstract identity away from variations in face appearance. However, it did not involve the notion that such ability may depend on a prototype or summary representation of face identity built up through varied encounters (Burton, Jenkins and Schweinberger 2011, Jenkins, White et al. 2011, Mike Burton 2013, Burton, Kramer et al. 2016, Menon, Kemp and White 2018).

      We agree with the Reviewer that the average of the different views of a face is a good proxy of its central tendency (i.e., stable identity properties; Figure 1). We thus followed their suggestion and included an additional model observer that compared specific views to full-spectrum view-averaged identities. The examination of the orientation tuning profile of this so-called view-average model observer confirmed the crucial contribution of horizontal identity cues to view-invariant recognition as the horizontal range best predicted the average summary of full-spectrum face appearances across views. This additional model observer is now presented in the Discussion and Supplementary files 2 and 3.

      Besides this larger issue, I would also like to see some more details about the nature of the cross-correlation that is the basis for this model comparison. I mostly think I get what is happening, but I think the authors could expand more on the nature of their noise model to make more explicit what is happening before these cross-correlations are taken. I infer that there is a noise-addition step to get them off the ceiling, but I felt that I had to read between the lines a bit to determine this.

      In the Methods section, we now provide detailed information about the addition of noise to model observer cross-correlations: ‘In a pilot phase, we measured the overall identification performance of each model. Initially, the view-selective model performed at ceiling, yielding a correlation of 1 since there was an exact target-probe match across all trials. To avoid ceiling effects and to keep model performance close to human levels (Supplementary File 2), we thus decreased the signal-to-noise ratio (SNR) of the target and probe images to .125 by combining each with distinct noise patterns (face RMS contrast: .01; noise RMS contrast: .08). Each trial (i.e. target-probe pairing) was iterated ten times with different random noise patterns.’

      We also added a supplemental with the graphic illustration of the d’ distributions of each model and human observers: ‘Sensitivity d’ of the view-tolerant model was much lower than view-selective model and human sensitivity (Supplementary File 2), even without noise. The view-tolerant model therefore processed fully visible stimuli (SNR of 1). This decreased sensitivity in the view-tolerant compared to the view-selective model is expected, as none of the probes exactly matched the target at the pixel level due to viewpoint differences. In contrast to humans who rely on internally stored representations to match identity across views, the model observer lacks such internal representations and entirely relies on (less efficient) pixelwise comparisons.’

      Another thing that I think is worth considering and commenting on is the stimuli themselves and the extent to which this may limit the outcomes of their behavioral task. The use of the 3D laser-scanned faces has some obvious advantages, but also (I think) removes the possibility for pigmentation to contribute to recognition, removes the contribution of varying illumination and expression to appearance variability, and perhaps presents observers with more homogeneous faces than one typically has to worry about. I don't think these negate the current results, but I'd like the authors to expand on their discussion of these factors, particularly pigmentation. Naively, surface color and texture seem like they could offer diagnostic cues to identity that don't rely so critically on horizontal orientations, so removing these may mean that horizontal bias is particularly evident when face shape is the critical cue for recognition.

      Our stimuli were originally designed by Troje and Bulthoff (1996). These are 3D laser scans of white individuals aged between 20 and 40 years, posing with a neutral expression. Different views of the faces were shot under a fixed illumination. Ears and a small portion of the neck were visible while the hair region was removed. All face images had a normalized skin color and we further converted them to grayscales

      While we agree that this stimulus set offers a restricted range of within- and between-identity variations compared to what is experienced in natural settings, we believe that the present findings generalize to more ecological viewing conditions. Indeed, past evidence showed that the recognition of face pictures shot under largely variable pose, age, expression, illumination, hair style is tuned to the horizontal range of the face stimulus (Dakin and Watt 2009, Dumont, Roux-Sibilon and Goffaux 2024). In other words, our finding that view-tolerant identity recognition is mainly driven by horizontal face information would likely replicate with the use of a more ecological stimulus set.

      Moreover, the skin color normalization and grayscale conversion, while limiting the range of face variability, did not eliminate the contribution of surface pigmentation in our study. It is thus unlikely that our findings exclusively reflect the orientation dependence of face shape processing. Pigmentation refers to all surface reflectance properties (Russell, Sinha et al. 2006) and hue (color) is only one among others. The grayscaled 3D laser scanned faces used here contained natural variations in crucial surface cues such as skin albedo (i.e., how light or dark the surface appears) and texture (i.e., spatial variation in how light is reflected); they have actually been used to disentangle the role of shape and surface cues to identity recognition (e.g., Troje and Bulthoff 1996, Vuong, Peissig et al. 2005, Russell, Sinha et al. 2006, Russell, Biederman et al. 2007, Jiang, Dricot et al. 2009). Moreover, a past study of ours demonstrated that the diagnosticity of the horizontal range of face information is not restricted to face shape cues; the specialized processing of face shape and surface both selectively rely on horizontal information (Dumont, Roux-Sibilon and Goffaux 2024).

      For these reasons, the present findings are unlikely to be fully determined by shape processing, and we expect them to generalize to more ecological stimulus sets. We discuss these aspects in the revised manuscript.

      Reviewer #2 (Public review):

      This study investigates the visual information that is used for the recognition of faces. This is an important question in vision research and is critical for social interactions more generally. The authors ask whether our ability to recognise faces, across different viewpoints, varies as a function of the orientation information available in the image. Consistent with previous findings from this group and others, they find that horizontally filtered faces were recognised better than vertically filtered faces. Next, they probe the mechanism underlying this pattern of data by designing two model observers. The first was optimised for faces at a specific viewpoint (view-selective). The second was generalised across viewpoints (view-tolerant). In contrast to the human data, the view-specific model shows that the information that is useful for identity judgements varies according to viewpoint. For example, frontal face identities are again optimally discriminated with horizontal orientation information, but profiles are optimally discriminated with more vertical orientation information. These findings show human face recognition is biased toward horizontal orientation information, even though this may be suboptimal for the recognition of profile views of the face.

      One issue in the design of this study was the lowering of the signal-to-noise ratio in the view-selective observer. This decision was taken to avoid ceiling effects. However, it is not clear how this affects the similarity with the human observers.

      In the Methods section, we now provide detailed information about the addition of noise to model observer cross-correlations: ‘In a pilot phase, we measured the overall identification performance of each model. Initially, the view-selective model performed at ceiling, yielding a correlation of 1 since there was an exact target-probe match across all trials. To avoid ceiling effects and to keep model performance close to human levels (Supplementary File 2), we thus decreased the signal-to-noise ratio (SNR) of the target and probe images to .125 by combining each with distinct noise patterns (face RMS contrast: .01; noise RMS contrast: .08). Each trial (i.e. target-probe pairing) was iterated ten times with different random noise patterns.’

      We also added a supplemental with the graphic illustration of the d’ distributions of each model and human observers.

      Another issue is the decision to normalise image energy across orientations and viewpoints. I can see the logic in wanting to control for these effects, but this does reflect natural variation in image properties. So, again, I wonder what the results would look like without this step.

      All stimuli were matched for luminance and contrast. It is crucial to normalize image energy across orientations as natural image energy is disproportionately distributed across orientations (e.g., Hansen, Essock et al. 2003). Images of faces cropped from their background as used here contain most of their energy in the horizontal range (Keil 2008, Keil 2009, Goffaux and Greenwood 2016). If not normalized after orientation filtering, such uneven distribution of energy would boost recognition performance in the horizontal range across views. Normalization was performed across our experimental conditions merely to avoid energy from explaining the influence of viewpoint on the orientation tuning profile.

      We were not aware of any systematic natural variations of energy across face views. To address this, we measured face average energy (i.e., RMS contrast) in the original stimulus set, i.e., before the application of any image processing or manipulation. Background pixels were excluded from these image analyses. Across yaws, we found energy to range between .11 and .14 on a 0 to 1 grayscale. This is moderate compared to the range of energy variations we measured across identities (from .08 to .18). This suggests that variations in energy across viewpoints are moderate compared to variations related to identity. It is unclear whether these observations are specific to our stimulus set or whether they are generalizable to faces we encounter in everyday life. They, however, indicate that RMS contrast did not substantially vary across views in the present study and suggest that RMS normalization is unlikely to have affected the influence of viewpoint on recognition performance.

      In the revised methods section, we explicitly motivate energy normalization: ‘Images of faces cropped from their background as used here contain most of their energy in the horizontal range (Goffaux, 2019; Goffaux & Greenwood, 2016; Keil, 2009). Across yaws, we found face energy to range between .11 and .14 on a 0 to 1 grayscale, which is moderate compared to the range of face energy variations we measured across identities (from .08 to .18). To prevent energy from explaining our results, in all images, the luminance and RMS contrast of the face pixels were fixed to 0.55 and 0.15, respectively, and background pixels were uniformly set to 0.55. The percentage of clipped pixel values (below 0 or above 1) per image did not exceed 3%.’.

      Despite the bias toward horizontal orientations in human observers, there were some differences in the orientation preference at each viewpoint. For example, frontal faces were biased to horizontal (90 degrees), but other viewpoints had biases that were slightly off horizontal (e.g., right profile: 80 degrees, left profile: 100 degrees). This does seem to show that differences in statistical information at different viewpoints (more horizontal information for frontal and more vertical information for profile) do influence human perception. It would be good to reflect on this nuance in the data.

      Indeed, human performance data indicates that while identity recognition remains tuned to horizontal information, horizontal tuning peak shows some variation across viewpoints. We primarily focused on the first aspect because of its direct relevance to our research objective, but also discussed the second aspect: with yaw rotation, certain non-horizontal morphological features such as the jaw line or nose bridge, etc. may increasingly contribute to identity recognition, whereas at frontal or near frontal views, features are mostly horizontally-oriented (e.g., Keil 2008, Keil 2009). In the revised Discussion, we directly relate the modest fluctuations of peak location to yaw differences in face feature appearance.

      Recommendations for the authors:

      Reviewing Editor Comments:

      Based on a discussion with the reviewers, we integrated the recommendations and reached a consensus on the eLife assessment. To move from a "solid" to a "compelling/convincing" strength-of-evidence rating, please address the reviewers' comments. Key points are to clarify and test the plausibility of the models (e.g., effects of different noise-addition steps, inclusion/exclusion of specific orientation channels in the view-dependent comparison, and alternative decision criteria), and to address or discuss the limitations of the stimulus set in capturing recognition under more naturalistic scenarios, for example, including texture cues.

      Reviewer #1 (Recommendations for the authors):

      I generally found the paper to be very well-written, so I have only a few minor comments here.

      (1) I didn't really follow why the estimation of the Gaussian functions described in the text was preferred over a simpler ML framework. Do these approaches differ that much? I see references to prior studies in which these were applied, so I can certainly go check these out, but I could see value in adding just a bit of text to briefly make the case that this is important.

      Employing a simpler linear framework, i.e. a linear model predicting d’ from the interaction between orientation and viewpoint, would result in an 8 (orientation) * 7 (viewpoint) design that is difficult to analyze. The interaction term would almost certainly reach significance but its interpretation would be limited. We would either have to rely on numerous local comparisons, which are not particularly informative for our research objectives (e.g., knowing whether d’ differs significantly between two adjacent orientations at a given viewpoint is of little relevance), or to use a polynomial contrast approach (testing the linear, quadratic, … up to the 7th order trends), which would also be difficult to interpret. For such complex, approximately Gaussian-shaped data, the highest-order polynomial trend would likely provide the best fit, but without offering meaningful insight.

      In contrast, a nonlinear approach appears more appropriate. The Gaussian model we used allows us to characterize the parameters of the tuning profile, namely, peak location, peak amplitude, standard deviation (or bandwidth) and base amplitude. These parameters are not merely statistical parameters. Rather, they are directly interpretable in cognitive/functional terms. The peak location corresponds to the orientation at which the Gaussian curve is centred, i.e. the preferred orientation band for identity recognition. The standard deviation represents the width of the curve, reflecting the strength or selectivity of the tuning. The base amplitude is the height of the Gaussian curve base, indicating the minimum level of sensitivity, typically found near vertical orientation. Finally, the peak amplitude refers to the height of the Gaussian curve relative to its baseline, that is, it captures the advantage of horizontal over vertical orientations.

      Moreover, the use of a nonlinear, Gaussian model is motivated by past work that showed that the Gaussian function fits the evolution of recognition performance as a function of orientation (Dakin and Watt 2009, Goffaux and Greenwood 2016). Orientation selectivity at primary stages of visual processing has also been modelled using Gaussian (or Difference of Gaussians; Ringach, Hawken and Shapley 2003).

      We revised the data analysis section to include a justification for our use of a Gaussian model: ‘Therefore, fitting the human sensitivity data could be fitted using a simple Gaussian model. seemed most appropriate as it allows characterizing the parameters of the tuning profile, namely, peak location, peak amplitude, standard deviation and base amplitude, which are directly interpretable in cognitive/functional terms. Moreover, the use of a nonlinear, Gaussian model is motivated by past work that showed that the Gaussian function fits the evolution of recognition performance as a function of orientation (Dakin & Watt, 2009; Goffaux & Greenwood, 2016). Simpler frameworks, i.e. a linear model predicting d’ from the interaction between orientation and viewpoint, would result in an 8 (orientation) * 7 (viewpoint) design that is difficult to analyze and interpret.’

      (2) When reporting the luminance and contrast of your stimuli, please make clear what these units and measures are. This was a case where I had to take a second to assure myself that I knew what the values meant.

      We clarified that the luminance and contrast values reported in the manuscript are on a grey scale ranging from 0 to 1.

      (3) In your Procedure section, I think describing the familiarization task right away would help the text flow more clearly. At present, you began talking about the old/new task, and I was immediately wondering how familiarization worked!

      The procedure section now starts with the description of the familiarization task.

      (4) p. 3 - "Culminates" doesn't seem like the right word here.

      We agree and rephrased this way: ‘The tolerance of face identity recognition is stronger for familiar than unfamiliar faces’.

      (5) p. 5 - I think "with the multiple" shouldn't have "the".

      Indeed, we removed the “the”.

      Reviewer #2 (Recommendations for the authors):

      I enjoyed reading the manuscript, but thought the Introduction was a bit long. I wasn't sure about the relevance of the section on temporal contiguity. I think this might have been more relevant if this had been a manipulation in the design. So, I wonder if this might be shortened or removed to focus on the key questions. On the other hand, I found the overview of the view-selective and view-tolerant to be a bit brief. There is plenty of detail here, but I found it difficult to break down what was done when I first read it. It might be good to provide an overview in the Discussion too.

      While past research on the contribution of temporal contiguity to face identity recognition brings interesting insights into the nature of the visual experience leading to view-tolerant performance, we agree with the Reviewer that this aspect is not directly at stake here. We reduced the review of this literature in the Introduction. We clarified the description of the model observers as suggested by the reviewer and made sure to provide an overview of the model observers in the Discussion as well.

      References.

      Burton, A. M., R. Jenkins and S. R. Schweinberger (2011). "Mental representations of familiar faces." Br J Psychol 102(4): 943-958.

      Burton, A. M., R. S. Kramer, K. L. Ritchie and R. Jenkins (2016). "Identity From Variation: Representations of Faces Derived From Multiple Instances." Cogn Sci 40(1): 202-223.

      Dakin, S. C. and R. J. Watt (2009). "Biological "bar codes" in human faces." J Vis 9(4): 2 1-10.

      Dumont, H., A. Roux-Sibilon and V. Goffaux (2024). "Horizontal face information is the main gateway to the shape and surface cues to familiar face identity." PLoS One 19(10): e0311225.

      Goffaux, V. and J. A. Greenwood (2016). "The orientation selectivity of face identification." Scientific Reports 6(34204): 34204.

      Hansen, B. C., E. A. Essock, Y. Zheng and J. K. DeFord (2003). "Perceptual anisotropies in visual processing and their relation to natural image statistics." Network 14(3): 501-526.

      Jenkins, R., D. White, X. Van Montfort and A. Mike Burton (2011). "Variability in photos of the same face." Cognition 121(3): 313-323.

      Jiang, F., L. Dricot, V. Blanz, R. Goebel and B. Rossion (2009). "Neural correlates of shape and surface reflectance information in individual faces." Neuroscience 163(4): 1078-1091.

      Keil, M. S. (2008). "Does face image statistics predict a preferred spatial frequency for human face processing?" Proc Biol Sci 275(1647): 2095-2100.

      Keil, M. S. (2009). ""I look in your eyes, honey": internal face features induce spatial frequency preference for human face processing." PLoS Comput Biol 5(3): e1000329.

      Menon, N., R. I. Kemp and D. White (2018). "More than a sum of parts: robust face recognition by integrating variation." R Soc Open Sci 5(5): 172381.

      Mike Burton, A. (2013). "Why has research in face recognition progressed so slowly? The importance of variability." Q J Exp Psychol (Hove) 66(8): 1467-1485.

      Ringach, D. L., M. J. Hawken and R. Shapley (2003). "Dynamics of orientation tuning in macaque V1: the role of global and tuned suppression." Journal of neurophysiology 90(1): 342-352.

      Russell, R., I. Biederman, M. Nederhouser and P. Sinha (2007). "The utility of surface reflectance for the recognition of upright and inverted faces." Vision Res 47(2): 157-165.

      Russell, R., P. Sinha, I. Biederman and M. Nederhouser (2006). "Is pigmentation important for face recognition? Evidence from contrast negation." Perception 35(6): 749-759.

      Troje, N. F. and H. H. Bulthoff (1996). "Face recognition under varying poses: the role of texture and shape." Vision Res 36(12): 1761-1771.

      Vuong, Q. C., J. J. Peissig, M. C. Harrison and M. J. Tarr (2005). "The role of surface pigmentation for recognition revealed by contrast reversal in faces and Greebles." Vision Res 45(10): 1213-1223.

    1. eLife Assessment

      This Review Article provides a compendium of advice for MD-PhD students to consider when deciding which, if any, clinical field they will select for residency training. It is grounded in published data and effectively considers factors including the potential for clinical disciplines to sustain research integration, provide mentorship, meet lifestyle expectations, and foster a long-term career as a research-focused physician-scientist.

    2. Reviewer #1 (Public review):

      Summary:

      This brief piece by Swartz and colleagues outlines the complexities surrounding the choice of clinical specialty for physician-scientists. It is, in general, clear and well-written, and it will be useful to research-oriented medical students choosing a path and to the mentors who are guiding them.

      Strengths:

      The writing is clear. The points made are not profound, but they are important and will be of use to the intended audience.

      Weaknesses:

      I have only minor suggestions for improvement. There are some areas of redundancy where the article could be tightened up by consolidating.

    3. Reviewer #2 (Public review):

      Summary:

      This article is a useful compendium of advice for MD/PhD students (and research-focused MD students) to consider when it is time to decide on a clinical field for residency training. The authors are a distinguished group of physician-scientists and program directors who are drawing on published data and their own experience as mentors to provide advice and resources to students about to make what can be a career-defining choice. It makes an effective argument for considering important differences between clinical fields in their ability to sustain research integration, provide mentorship, meet lifestyle expectations, and foster a long-term career as a research-focused physician-scientist.

      Strengths:

      (1) A lot has been written about physician-scientists as an endangered species. Given the important role that physician-scientists can play if they engage in research that is informed by experience in patient care, not nearly enough has been written about the choices that students make during training that can keep them on track or throw them off.

      (2) The article provides not only general advice, but specific information in the 2 tables that can help trainees to weigh their priorities and consider their options.

      (3) Among the best advice is to weigh clinical demands, maintenance of procedural skills, recognition of the impact of research time on salary, and the impact of high salaries on the tension between research effort and clinical effort in clinical departments, which is where most physician-scientists in academia are employed.

      Areas for potential improvement:

      (1) Some of the most useful pieces of advice are scattered through the text when they might be more impactful if focused. For example, what are the 4 or 5 most essential factors that someone in an MD/PhD or an MD program should weigh when they are deciding between clinical disciplines? There are also published data on the experience of past graduates in achieving a research-focused career in each clinical discipline. How should that data be applied by trainees? What are the factors that should be weighed in deciding where to work as a research-focused physician once training has been completed?

      (2) Some clinical fields at academic institutions have proved to be much more hospitable to careers as research-focused physicians than others. Published data highlight the challenges. I believe the authors have tried very hard to present a balanced perspective, but in the process, they have, I believe, missed an opportunity to guide trainees and make them aware of what they should look for to avoid making a decision that may prove incompatible with their long-term goals.

      (3) An issue that hasn't been raised: Where will be the jobs for physician-scientists who have an MD {plus minus} PhD and want to do research and discovery? How many openings will there be for physician-scientists in academia 5-10 years from now? In industry? How are recent events in Washington affecting the continuation of those jobs? Unfortunately, I am not aware of labor statistics for physician-scientists, but perhaps the authors can find them.

      (4) Additional questions that can be raised and addressed in the article: Should one of the "smart choices" in the article's title be where you do the residency, and not just which residency you do? How important is it to be at a successful, research-intensive medical center/university, both during and after residency and fellowship training? If being in an institution where there are numerous very successful physician-scientists and scientists improves the likelihood of being able to sustain a physician-scientist career, how should graduating students improve their chances of being at one of those institutions?

      (5) In every clinical discipline, there are departments that value physician-scientists more than other departments and invest accordingly. What advice would the authors give to help graduating students identify those departments?

    4. Author response:

      Thank you for the valuable feedback. We will be updating the manuscript to incorporate the reviewers' terrific suggestions. We specifically have:

      • Reduced redundancy and streamlined overlapping sections (especially around research alignment, protected time, and clinical demands)

      • Made the core decision-making framework more explicit and easier to extract (in a new Table 1, with clearer synthesis in the text)

      • Strengthened the emphasis on institutional/program context as a key determinant of success—arguably as important as specialty choice

      • Added more actionable guidance for trainees on how to evaluate departments (e.g., NIH Reporter, T32 presence, R01 density, K→R track record)

      • Included a slightly more explicit statement acknowledging that while all specialties can support physician-scientist careers, the structural ease varies and may require different levels of negotiation/support

      We did not address the broader workforce/job market question, since it feels outside the scope.

    1. eLife Assessment

      This valuable paper provides convincing evidence that humans can navigate better through maps whose local transitions were learned in an intermixed order than maps whose local transitions were learned in neighboring groups. The authors put forward a potential mechanism in which the grouped learning resulted in mental fragmentation, though evidence for this mechanism is incomplete. The work will be of interest to researchers studying cognitive maps and curriculum learning.

    2. Reviewer #1 (Public review):

      This paper investigates how different learning curricula influence the way that humans piece together directly experienced transitions into a broader cognitive map. When adjacent learning trials were grouped within rows or columns of the map, subsequent navigation through the map was weaker than when adjacent learning trials came from disjoint spaces in the map. The authors speculate that the grouped curriculum resulted in mental fragmentation that made navigation across space more difficult later on.

      This is an interesting paradigm that contributes useful new findings in the domain of map learning to the growing literature on curriculum learning. The evidence for a difference between conditions is highly compelling, but, as the authors are very transparent in acknowledging in the Discussion, the evidence for their proposed mechanism - mental fragmentation under grouped learning - is somewhat weak. The study thus presents an intriguing empirical result but not an ironclad mechanistic account.

      An alternative - by their account, "less interesting" - explanation is that grouped learning was easier because trials in close succession had overlapping elements, and so participants were not trying as hard or as engaged. There is a literature on spaced (as opposed to massed) learning being better for subsequent memory because it increases retrieval effort. It seems very plausible that this could be going on here, and the control experiment reported in the supplement would not help to rule this out. This literature deserves some discussion.

      The Introduction focuses entirely on literature showing advantages in grouped over intermixed learning, setting that up as the most well-motivated expectation from the literature. Upon finding the opposite, the Discussion then mentions that interleaving has been found to be useful in "applied domains", but then returns to how surprising this is in light of recent findings in the category learning literature. But there is a substantial earlier literature on interleaved vs blocked curricula in category learning, very often finding advantages for interleaving. See, e.g., Carvalho & Goldstone, 2015, for a review. There is also a paper showing interleaving advantages in associative inference, Zhou et al., 2023, JEP:G, which is very relevant to several of the discussion section paragraphs. Thus, the treatment of the prior curriculum learning literature is currently sparse.

    3. Reviewer #2 (Public review):

      I think this paper is an excellent and timely contribution. It clearly shows that learning overlapping relationships in a disjoint training schedule (where the overlaps are not encountered close together in time) appears to aid the formation of an integrated associative memory structure (a cognitive map) and supports generalisation. I believe the methods are sound and the results are clear. I only have a couple of methodological questions that may not warrant any changes to the paper (or only very minor changes/additions):

      (1) The mixed effects models did not include random slopes for the within-subject factors ("spatial manipulation" and "block"), and so the corresponding fixed effect inferences may be unsafe. Having said that, it is likely that including these slopes may not be warranted given their contribution to the model's fit. I recommend that the authors check this.

      (2) The mixed effects models for accuracy appear to model average performance across trials rather than using a generalised linear model with a (e.g.) logit link function and the binomial distribution to characterise performance. I think this is a little sub-optimal, as the latter is often more sensitive. Nonetheless, it is not in any way wrong; the results are clear enough as is, and there may be a good reason to avoid a non-linear link function, which can alter the interpretation of effects close to the ceiling and floor.

      I think the introduction and/or discussion would benefit from contrasting their results with Berens & Bird (2022, PLOS Comp Bio). In this paper, it is shown that blocking the training of discriminations in a linear hierarchy (what we call progressive training) substantially benefited transitive inference performance. This seems at odds with the author's finding that "participants struggle to integrate information across rows and columns, i.e. across groups of transitions that were trained separately in time".

      I would really like to know what the authors think about this discrepancy (or, indeed, whether they think there is one at all). Is it possibly because "progressive" learning is some combination of "grouping", "blocking" and "chaining" (where there is a structured overlap between adjacently trained relationships)? Or is it something else, e.g., that there is a fundamental difference between learning associations and discriminations (personally, I lean on this explanation)?

      Relevant to this, the authors note that their "findings do contradict recent reports from the category learning literature, where blocking seems to help learning and generalisation (Dekker et al., 2022; Flesch et al., 2018; Noh et al., 2016). It may be that where the goal is not to learn a complex knowledge structure - like a map - but simply to compress exemplars by mapping them onto a smaller number of labels - the benefits of blocking emerge." However, the benefit of progressive (blocked) training in my own work was observed in a task that required learning a complex/relational structure in the form of a transitive hierarchy, which theoretical accounts suggest depends on learning map-like representations (Whittington et al., 2020).

    4. Reviewer #3 (Public review):

      Summary:

      This study examines how training regimes influence the formation of cognitive maps. Participants learned two relational maps over three days through pairwise transitions: one map was trained with grouped sequences that followed rows or columns, while the other was trained with disjoint transitions sampled randomly across the map. In addition, the study manipulated the temporal spacing of training blocks (blocked vs. semi-blocked) and tested whether the results generalized across two map geometries (a 5×5 grid and a 4×4 torus).

      Furthermore, they run a follow-up experiment (or condition) testing rows and columns shuffled in the grouped condition.

      While grouped training produced better performance during learning, the authors report that disjoint training led to superior performance at test on tasks probing the global map knowledge.

      Summarising experimental design:

      (1) Map geometry (between-subjects): 5×5 grid vs 4×4 torus

      (2) Training block schedule (between-subjects): Blocked vs Semi-blocked

      (3) Training regime/transition sampling (within-subject): Grouped or Disjoint (Day 1 and Day 2)

      Strengths:

      The study addresses a clear and timely theoretical question about how the training regime affects the formation of cognitive maps. A further strength is the well-controlled experimental design, allowing the authors to test their hypotheses in a systematic and informative way.

      Weaknesses:

      (1) If I understood correctly, participants learned one map on the first day and the other on the second day, with the training regime (grouped vs. disjoint) counterbalanced across maps. This raises the possibility that experience with one training regime on day one could influence performance on the second day. For example, it would be interesting to examine whether participants who experienced the disjoint regime first showed any differences when learning the grouped regime on the following day. While it may be difficult to fully disentangle such transfer effects from the main training regime effects, it would be informative to test whether performance on the second day depends on the regime experienced on the first day (e.g., whether prior exposure to the disjoint regime predicts performance on the subsequent grouped training, but not vice versa).

      (2) The author mentions a control experiment. Did the participants in the control experiment complete only the training phase or also the testing tasks used in the main experiment? If testing was included, it would be informative to report whether performance at test was comparable to that observed in the main experiment. Given that this condition appears to involve blocked transitions while moving across both rows and columns, I would expect performance to fall somewhere between the grouped and disjoint conditions.

      (3) Participants' performance did not differ between conditions in the map reconstruction task, suggesting that participants in both the grouped and disjoint regimes were ultimately able to form a cognitive map. Was this task always administered last during the testing session? I wonder whether the explicit request of the reconstruction task could have influenced participants' awareness of the map structure.

      (4) The manuscript describes the study as consisting of four experiments (two groups per map shape, differing in the blocked versus semi-blocked schedule). However, based on the design described in the Methods, this appears more accurately characterized as a single experiment with two between factors: map geometry (grid vs. torus) and blocking schedule (blocked vs. semi-blocked) manipulated between participants, and training regime (grouped vs. disjoint) manipulated within participants.

      (5) It is not entirely clear to me from the Results section whether performance at test differed between the two map geometries (grid and torus), or whether the reported effects of training regime were consistent across them.

    1. eLife Assessment

      The authors combined human assembloids, fetal brain tissue, bulk and single cell RNA sequencing, and live imaging to understand the molecular mechanisms affected by hypoxia during cortical development. The findings are very important to the neurodevelopmental field, They reveal new insights into how migration of cortical interneurons can be affected in hypoxic conditions, and provide exciting models to probe broad neurodevelopmental processes in health and disease. The evidence is compelling. The data and analyses are very rigorous and go beyond the state-of-the-art.

    2. Reviewer #1 (Public review):

      Summary:

      This work aims to elucidate the molecular mechanisms affected in hypoxic conditions causing reduced cortical interneuron migration. They use human assembloids as a migratory assay of subpallial interneurons into cortical organoids and show substantially reduced migration upon 24 hours hypoxia. Bulk and scRNA-seq shows adrenomedullin (ADM) up-regulation, as well as its receptor RAMP2 confirmed at protein level. Adding ADM to the culture medium after hypoxic conditions rescues the migration deficits, even though the subtype of interneurons affected is not examined. However, the authors demonstrate very clearly that ineffective ADM does not rescue the phenotype and blocking RAMP2 also interferes with the rescue. The authors are also applauded for using 4 different cell lines and using human fetal cortex slices as an independent method to explore the DLXi1/2GFP-labelled iPSC-derived interneuron migration in this substrate with and without ADM addition (after confirming that also in this system ADM is up-regulated). Finally, the authors demonstrate PKA - CREB signalling mediating the effect of ADM addition, and also lead to up-regulation of GABAreceptors. Taken together this is a very carefully done study on an important subject - how hypoxia affects cortical interneuron migration. In my view it would be of great interest for the readers of Elife.

      Strengths:

      Its strengths are the novelty and the thorough work using several culture methods and 4 independent lines.

      Weaknesses:

      The main weakness is that we dont know which interneuron subtypes are most affected by hypoxia and which may be rescued in their migration by ADM.

      A further weakness is that the few genes confirmed to be regulated after hypoxia do not help determining which statistical cut-off can be considered reliable, given that they didn't compare strongly regulated versus weakly regulated genes.

      Comments on revisions:

      Unfortunately, the authors did not address my suggestions. While they show example stainings of interneuron subtypes, they do not show if Calretinin, calbinin or somatostatin+ interneurons are differentially affected by hypoxia or the rescue with ADM. I still consider this an important piece of information to add.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by Puno and colleagues investigates the impact of hypoxia on cortical interneuron migration and downstream signaling pathways. They establish two models to test hypoxia, cortical forebrain assembloids and primary human fetal brain tissue. Both of these models provide a robust assay for interneuron migration. In addition, they find that ADM signaling mediates the migration deficits and rescue using exogenous ADM. The findings are novel and very interesting to the neurodevelopmental field, revealing new insights into how cortical interneurons migrate and as well, establishing exciting models for future studies.The authors use sufficient iPSC lines including both XX and XY, so analysis is robust. In addition, the RNAseq data with re-oxygenation is a nice control to see what genes are changed specifically due to hypoxia. Further, the overall level of valiation of the sequencing data and involvement of ADM signaling is convincing, including the validation of ADM at the protein level. Overall this is a very nice manuscript. I have a few comments and suggestions for the authors.

      Strengths/Weaknesses:

      (1) Can they comment on the possibility of inflammatory response pathways being activated by hypoxia - has this been shown before? While not the focus of the manuscript, it would be discussed in the Discussion as an interesting finding and potential involvement of other cells in the Hypoxic response.

      (2) Can they comment on the mechanism at play here with respect to ADM and binding to RAMP2 receptors - is this a potential autocrine loop, or is the source of ADM from other cell types besides inhibitory neurons? Given the scRNA-seq data, what cell-to-cell mechanisms can be at play? Since different cells express ADM, there could be different mechanisms at place in ventral vs dorsal areas.

      (3) For data from Figure 6 - while the ELISA assays are informative to determine which pathways (PKA, AKT, ERK) are active, there is no positive control to indicate these assays are "working" - therefore, if possible, western blot analysis from assembloid tissue could be used (perhaps using the same lysates from Fig 3) as an alternative to validate changes at the protein level (however, this might prove difficult); further to this, is P-CREB activated at the protein level using WB?

      (4) Can the authors comment further on the mechanism and what biological pathways and potential events are downstream of ADM binding to RAMP2 in inhibitory neurons? What functional impact would this have linked to the CREB pathway proposed? While the link to GABA receptors is proposed, CREB has many targets beyond this.

      (5) Does hypoxia cause any changes to inhibitory neurogenesis (earlier stages than migration?) - this might always be known but was not discussed.

      (6) In the Discussion section - it might be worth detailing to the readers what the functional impact of delayed/reduced migration of inhibitory neurons into the cortex might results in, in terms of functional consequences for neural circuit development

      Comments on revisions:

      The authors have addressed my comments thoroughly. I have no further comments or suggestions

    4. Reviewer #3 (Public review):

      Summary:

      The authors aimed to test whether hypoxia disrupts the migration of human cortical interneurons, a process long suspected to underlie brain injury in preterm infants but previously inaccessible for direct study. Using human forebrain assembloids and ex vivo developing brain tissue, they visualized and quantified interneuron migration under hypoxic conditions, identified molecular components of the response, and explored the effect of pharmacological intervention (specifically ADM) on restoring the migration deficits.

      Strengths:

      The major strength of this study lies in its use of human forebrain assembloids and ex vivo prenatal brain tissue, which provide a direct system to study interneuron migration under hypoxic conditions. The authors combine multiple approaches: long-term live imaging to directly visualize interneuron migration, bulk and single-cell transcriptomics to identify hypoxia-induced molecular responses, pharmacological rescue experiments with ADM to establish therapeutic potential, and mechanistic assays implicating the cAMP/PKA/pCREB pathway and GABA receptor expression in mediating the effect. Together, this rigorous and multifaceted strategy convincingly demonstrates that hypoxia disrupts interneuron migration and that ADM can restore this defect through defined molecular mechanisms.

      Overall, the authors achieve their stated aims, and the results strongly support their conclusions. The work has significant impact by providing the first direct evidence of hypoxia-induced interneuron migration deficits in the human context, while also nominating a candidate therapeutic avenue. Beyond the specific findings, the methodological platform-particularly the combination of assembloids and live imaging-will be broadly useful to the community for probing neurodevelopmental processes in health and disease.

      Comments on revisions:

      The authors have fully addressed my concerns by incorporating the relevant discussion into the manuscript, especially regarding how well the migration observed in hSO-hCO assembloids reflects in vivo condition. I have no further comments.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #2 (Public review): 

      Weaknesses:

      (1) Can the authors comment on the possibility of inflammatory response pathways being activated by hypoxia? Has this been shown before? While not the focus of the manuscript, it could be discussed in the Discussion as an interesting finding and potential involvement of other cells in the Hypoxic response.

      We thank the reviewer for reviewing our manuscript and for the important comment about inflammation. Indeed, hypoxia has been shown to activate the inflammatory response pathways. In various studies, it was found that HIF-1a can interact with NF-κB signaling, leading to the upregulation of pro-inflammatory cytokines such as IL-1β, IL-6, and TNF-α (Rius et al., Cell, 2008; Hagberg et al., Nat Rev Neurol, 2015).

      In our transcriptomics data (Fig. 2D), and to the reviewers’ point, we identified enrichment of inflammatory signaling response following the hypoxic exposure. Since hSO at the time of analyses do contain some astrocytes, we think these contribute to the observed pro-inflammatory changes and emphasize the feasibility of capturing this response in organoids in vitro. This is also important because ADM is known to have anti-inflammatory properties and should be investigated as such in future studies focused on hypoxia-induced inflammation.

      In the manuscript, we included a few sentences in the discussion to address the lack of in-depth analyses of inflammation as a limitation of our study.

      (2) Could the authors comment on the mechanism at play here with respect to ADM and binding to RAMP2 receptors - is this a potential autocrine loop, or is the source of ADM from other cell types besides inhibitory neurons? Given the scRNA-seq data, what cell-to-cell mechanisms can be at play? Since different cells express ADM, there could be different mechanisms in place in ventral vs dorsal areas.

      Based on our scRNA-seq data in hSOs showing significant upregulation of ADM expression in astrocytes and progenitors, and increased expression of RAMP2 receptors on neurons, we speculate that the primary mechanism is likely to involve paracrine interactions. However, we cannot exclude autocrine mechanisms with the current experiments. Dissecting these interactions in a cell-type specific manner could be an important focus for future ADM-related studies.

      To address the question about the possible different mechanisms in ventral versus dorsal areas, in the revision, we plotted and included in the figures the data about the cell-type expression of ADM and its receptors in hCOs (Fig. S3)

      (3) For data from Figure 6 - while the ELISA assays are informative to determine which pathways (PKA, AKT, ERK) are active, there is no positive control to indicate these assays are "working" - therefore, if possible, western blot analysis from assembloid tissue could be used (perhaps using the same lysates from Figure 3) as an alternative to validate changes at the protein level (however, this might prove difficult); further to this, is P-CREB activated at the protein level using WB?

      We thank the reviewer for this comment and the observation. Although we did not include a traditional positive control in these ELISA assays, several lines of evidence indicate that the measurements are reliable. First, the standard curves behaved as expected, and all sample values fell within the assay’s dynamic range. Second, technical replicates showed low variability, and the observed changes across experimental conditions (e.g., hypoxia vs. control) were consistent with the expected biological responses based on previous literature. We agree that including western blot validation would strengthen the findings, and we will note this for our future studies focused on CREB and ADM.

      (4) Could the authors comment further on the mechanism and what biological pathways and potential events are downstream of ADM binding to RAMP2 in inhibitory neurons? What functional impact would this have linked to the CREB pathway proposed? While the link to GABA receptors is proposed, CREB has many targets beyond this.

      We appreciate the reviewers’ insightful question. Currently, not much is known about the molecular pathways and downstream cellular events triggered by ADM binding to RAMP2 in inhibitory neurons, and in general in brain cells. The data from our study brings the first information about the cell-type specific expression of ADM in baseline and hypoxic conditions and is one of the key novelties of our study.

      While the signaling landscape of ADM in interneurons is largely unexplored, several studies in other (non-brain) cell types have demonstrated that ADM binding to RAMP2 can activate downstream cascades such as the cAMP/PKA/CREB pathway, PI3K/AKT, and ERK/MAPK, all of which are also known to be critical regulators of neuronal development and survival. These previously published data along with our CREB-targeted findings in hypoxic interneurons, suggest ADM–RAMP2 signaling could influence multiple aspects of interneuron biology, but these remain to be evaluated in future studies.

      We agree with the reviewer that CREB has a wide range of transcriptional targets. We decided to focus on GABA as a target of CREB for two main reasons, including: (i) GABA signaling has been previously shown to play an important role in the migration of cortical interneurons, and (ii) a previous study by Birey et al. (Cell Stem Cell, 2022) demonstrated that CREB pathway activity is essential for regulating interneuron migration in assembloid models of Timothy Syndrome, thus further providing evidence that dysregulation of CREB activity disrupts migration dynamics.

      While our study provides a first step toward uncovering the mechanisms of interneuron migration protection by ADM, we fully acknowledge that future work will be needed to delineate the full spectrum of ADM–RAMP2 downstream signaling events in inhibitory neurons and other brain cells.

      (5) Does hypoxia cause any changes to inhibitory neurogenesis (earlier stages than migration?) - this might always be known, but was not discussed.

      We appreciate this question from the reviewer; however, this was not something that we focused on in this manuscript due to the already large amount of data included. A separate study focusing on neurogenesis defects and the molecular mechanisms of injury for that specific developmental process would be an important next step.

      (6) In the Discussion section, it might be worth detailing to the readers what the functional impact of delayed/reduced migration of inhibitory neurons into the cortex might result in, in terms of functional consequences for neural circuit development.

      We thank the Reviewer for the suggestion of detailing the functional impact of reduced inhibitory neuron migration. The manuscript to discuss that previous studies show that failure of interneurons to migrate and reach their designated targets within the appropriate developmental window leads to their elimination through apoptosis. Decreased numbers (or abnormal development) of interneurons are associated with neurodevelopmental impairments and abnormal functional connectivity in the brain.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The authors should examine if all cortical interneurons are affected by ADM or only subtypes (Parvalbumin/Somatostatin).

      We thank the reviewer for raising this important question. In our study, we utilized the Dlx1/2b::eGFP reporter to broadly label cortical interneurons; however, this system does not distinguish specific interneuron subtypes. To address this, in the manuscript we used the single-cell RNA sequencing data and immunostainings to provide this information. As expected based on our previous reports, most cortical interneurons present in organoids are represented by calretinin (CALB2), somatostatin (SST) and calbindin (CALB1). These data are now presented in Fig. S3.

      Separately, we used available scRNA-seq data from developing human brain and showed that at ~20 PCW, the developing human brain has similar types of cortical interneurons. These data are now included in Fig. S5.

      (2) The authors should test more candidates from their bulk RNA-seq data with different fold changes for regulation after hypoxia, to allow the reader to judge at which cut-off the DEGs may be reproducible. This would make this database much more valuable for the field of hypoxia research.

      We appreciate the reviewers’ thoughtful suggestion. In addition to the bulk RNA-seq analysis, we did validate several upregulated hypoxia-responsive genes with varying fold changes by qPCR; these include PDK1, PFKP, VEGFA (Fig. S1).

      We do agree that in-depth investigation of specific cut-offs would be interesting, however, this could be the focus of a different manuscript.

      Reviewer #3 (Recommendations for the authors):

      Most of the evidence presented is convincing in supporting the conclusions, and I have only minor suggestions for improvement:

      (1) The bulk RNA-seq was performed in hSOs only, which may not fully capture the phenotypes of migrating or migrated interneurons. It would be valuable, if feasible, to sort migrated cells from hSO-hCO assembloids and specifically examine their molecular mediators.

      We thank the reviewer for this suggestion. While it is likely that the cellular environment will have some influence on a subset of the molecular changes, based on all the data from the manuscript and our specific target, the RNA-sequencing on hSOs was sufficient to capture essential changes like ADM upregulation. The in-depth exploration on differential responses of migrated versus non-migrated interneurons to hypoxia could be the focus of a different project.

      (2) In Figure 3, it is striking that cell-type heterogeneity dominates over hypoxia vs. control conditions. A joint embedding of hSO and hCO cells could provide further insight into molecular differences between migrated and non-migrated interneurons.

      We thank the reviewer for this observation and opportunity to clarify. Since we manually separated the assembloids before the analyses, we processed these samples separately. That is why they separate like this. In the revision, we added data about ADM expression and its receptors’ expression in the hCOs.

      (3) It would be helpful to expand the discussion on how closely the migration observed in hSO-hCO assembloids reflects in vivo conditions, and what environmental aspects are absent from this model. This would better frame the interpretation and translational relevance of the findings.

      We thank the Reviewer for bringing up this important point. Although the assembloid model offers the unique advantage of allowing the direct investigation of migration patterns of hypoxic interneurons, we fully agree it does not fully recapitulate the in vivo environment. While there are multiple aspects that cannot be recapitulated in vitro at this time (e.g. cellular complexity, vasculature, immune response, etc), we are encouraged by the validation of our main findings in ex vivo developing human brain tissue, which strongly supports the validity of our findings for in vivo conditions.

      We expanded our discussion to include more details and the need to validate these findings using in vivo models.

      (4) The authors suggest that hypoxia is also associated with delayed interneuron maturation, yet the bulk RNA-seq data primarily reveal stress and hypoxia-related genes. A more detailed discussion of why genes linked to interneuron maturation and function were not strongly affected would clarify this point.

      We thank the Reviewer for the opportunity to clarify.

      The RNAseq data was performed during the acute stages of hypoxia/reoxygenation and we think a maturation phenotype might be difficult to capture at this point and would require analysis at later in vitro assembloid maturation stages.

      Our speculation about a possible maturation defect is based on data from previous studies from developmental biology that showed failure of interneurons to reach their final cortical location within a specified developmental window will impair their integration within the neuronal network, and thus lead to maturation defects and possible elimination by apoptosis.

      Since preterm infants suffer from countless hypoxic events over multiple months, we speculate these repetitive events are likely to induce cumulative delays in migration, inability of interneurons to reach their target in time, followed by abnormal integration within the excitatory network, and eventual elimination of some of these interneurons through apoptosis. However, the direct demonstration of this effect following a hypoxic insult would require prolonged in vivo experiments in rodents to follow the migration, network integration and apoptosis of interneurons; to our knowledge this experimental design is not technically feasible at this time, and thus this hypothesis remains speculative and only included in the discussion.

      (5) Relatedly, while the focus on interneuron migration is well justified, acknowledging how hypoxia might also impact other aspects of cortical development (e.g., progenitor proliferation, neuronal maturation, or circuit integration) would place the findings in a broader developmental framework and strengthen their relevance.

      We appreciate the Reviewer’s suggestion to discuss the role of hypoxia on other interneuron developmental processes during cortical development. In the manuscript, we included text in the discussion about the likely effects of hypoxia on interneuron proliferation, maturation and circuit integration.

      (6) Very minor: in Figure S3C and D, it was not stated what the colors mean (grey: control, yellow: hypoxia)

      Thank you for pointing out this error; we corrected it in our revision.

    1. eLife Assessment

      This valuable study proposes a novel rapid-entry mechanism for Staphylococcus aureus, involving the rapid release of calcium from lysosomes. The paper's strength lies in its very interesting hypothesis. The methods used are solid and adequately support the conclusions.

    2. Reviewer #2 (Public review):

      [Editors' note: This version was assessed by the editors. The authors have addressed a point raised by Reviewer #2, who thought the authors compared cells grown in low-serum and high serum conditions. This has been clarified in the latest version.]

      In the manuscript Ruhling et al propose a rapid uptake pathway that is dependent on lysosomal exocytosis, lysosomal Ca2+ and acid sphingomyelinase, and further suggest that the intracellular trafficking and fate of the pathogen is dictated by the mode of entry. Overall, this is manuscript argues for an important mechanism of a 'rapid' cellular entry pathway of S.aureus that is dependent on lysosomal exocytosis and acid sphingomyelinase and links the intracellular fate of bacterium including phagosomal dynamics, cytosolic replication and host cell death to different modes of uptake.

      A key strength is the nature of the idea proposed, while continued reliance on inhibitor treatment combined with lack of phenotype / conditional phenotype for genetic knock out is a major weakness.

      In the previous version, the authors perform experiments with ASM KO cells to provide genetic evidence of the role for ASM in S. aureus entry through lysosomal modulation.

    3. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #2 (Public review):

      In the manuscript Ruhling et al propose a rapid uptake pathway that is dependent on lysosomal exocytosis, lysosomal Ca2+ and acid sphingomyelinase, and further suggest that the intracellular trafficking and fate of the pathogen is dictated by the mode of entry. Overall, this is manuscript argues for an important mechanism of a 'rapid' cellular entry pathway of S.aureus that is dependent on lysosomal exocytosis and acid sphingomyelinase and links the intracellular fate of bacterium including phagosomal dynamics, cytosolic replication and host cell death to different modes of uptake. 

      Key strength is the nature of the idea proposed, while continued reliance on inhibitor treatment combined with lack of phenotype / conditional phenotype for genetic knock out is a major weakness. 

      In the revised version, the authors perform experiments with ASM KO cells to provide genetic evidence of the role for ASM in S. aureus entry through lysosomal modulation. The key additional experiment is the phenotype of reduced bacterial uptake in low serum, but not in high serum conditions. The authors suggest this could be due to the SM from serum itself affecting the entry. While this explanation is plausible, prolonged exposure of cells to low serum is well documented to alter several cellular functions, particularly in the context of this manuscript, lysosomal positioning, exocytosis and Ca2+ signaling. A better control here could be WT cells grown in low serum.

      As the reviewer suggested, we did culture both, WT control cells as well as ASM knock-outs, under low serum conditions before conducting the invasion assays. Hence, the detected effects on S. aureus invasion must be caused by lack of functional ASM in the mutant.

      We apologize that this did not become evident from the manuscript’s text. We thus included a change in line 259 which now reads:

      ”To test whether FBS confounded our invasion experiments, we cultivated WT as well as ASM K.O. cells in medium with reduced FBS concentration (1%) and determined the S. aureus invasion efficiency (Figure 2I).”

      If SM in serum can interfere, why do they see such pronounced phenotype on bacterial entry in WT cells upon chemical inhibition?

      We explain the differences between inhibitor-treated WT cells and ASM K.O.s by the severe accumulation of SM upon genetic ablation of ASM. We demonstrated this by HPLC-MS/MS measurements in Figure 2L. If cells were cultured in 10% FBS, an ASM K.O. resulted in approx. 4-times higher levels of cellular SM C18:0 when compared to WT cells, while amitriptyline treatment of WT cells had no effect, and ARC39 treatment increased SM C18:0 levels only by 2-fold. This likely results from different durations of SM accumulation in the cell pools which is caused either by complete absence of ASM (in case of the ASM K.O.) or only in the hour-range upon treatment with the inhibitors.

      Under low serum conditions, the severe SM C18:0 accumulation in the ASM K.O. was found decreased (from 4-fold to 2-fold when compared to WT cells; Figure 2M). Here, the WT cells used as reference also were cultured in the same manner as the ASM K.O. A similar pattern was observed for other SM species (Supp. Figure 3). This correlates with the S. aureus invasion phenotype in ASM K.O.: under high serum conditions (and resulting in severe SM accumulation) we did not detect an invasion defect, while under low serum conditions (resulting in only moderate SM accumulation) S. aureus invasion was reduced in the knock-outs when compared to WT cells cultured in the same conditions, respectively.

      While the authors argue a role for undetectable nano-scale Cer platforms on the cell surface caused by ASM activity, results do not rule out a SM independent role in the cellular uptake phenotype of ASM inhibitors.

      Since the comments starting with the line above are identical to the previous comments by the reviewer, we assume that these points of criticism still resound with the Reviewer, although we had agreed previously with the reviewer that we do not show formation of ceramide-enriched platforms, we had changed the manuscript accordingly in the previous revision round already (see also our comment below).

      The authors have attempted to address many of the points raised in the previous revision. While the new data presented provide partial evidence, the reliance on chemical inhibitors and lack of clear results directly documenting release of lysosomal Ca2+, or single bacterial tracking, or clear distinction between ASM dependent and independent processes dampen the enthusiasm.

      We continue to share the reviewer’s desire to discriminate between ASM-dependent and ASMindependent processes, but the simultaneous occurrence of multiple pathways of bacterial uptake is currently the limiting factor and technological challenge in our laboratory, since these events happen rapidly. We do hope that we or others will be able to address these limitations in the future, for instance with the technologies suggested by the reviewer.

      I acknowledge the author's argument of different ASM inhibitors showing similar phenotypes across different assays as pointing to a role for ASM, but the lack of phenotype in ASM KO cells is concerning. The author's argument that altered lipid composition in ASM KO cells could be overcoming the ASMmediated infection effects by other ASM-independent mechanisms is speculative, as they acknowledge, and moderates the importance of ASM-dependent pathway. The SM accumulation in ASM KO cells does not distinguish between localized alterations within the cells. If this pathway can be compensated, how central is it likely to be ? 

      We here want to elaborate again, since our revision experiments demonstrate the ASM-dependency of the rapid uptake under low serum conditions – see also above. We were convinced that the genetic evidence of an S. aureus invasion phenotype in ASM K.O.s under these conditions would eliminate the reviewer’s concern about the role of ASM during the bacterial invasion (see also above). Our lipidomics data of ASM K.O.s cultured in 1% and 10% FBS (Figure 2, M, Supp. Figure 3) and inhibitor-treated WT cells (Figure 2L, Supp. Figure 3) show a correlation between SM accumulation and the invasion phenotype observed by us.

      We agree with the reviewer, however, that it remains elusive why changes in the sphingolipidome increase ASM-independent S. aureus internalization by host cells. One explanation is a dysfunction of the lipid raft-associated protein caveolin-1 upon strong SM accumulation, which was previously shown to appear in ASM-deficient cells (1, 2). A lack of caveolin-1 results in strongly increased host cell entry of S. aureus in certain cell types (3, 4). In other cell types, such as A549 cells, S. aureus invades in an αtoxin and caveolin-1 dependent fashion (5). It will be interesting to study, to what extent such processes as described by Goldmann and colleagues will depend on ASM. However, a characterization of the mechanism behind these observations requires further experimentation and is beyond the scope of the current manuscript. 

      As to the centrality of the pathway: we cannot and do not make any assumptions on the centrality of the pathway and its importance in vivo. As scientists we were intrigued by our finding of an ASM dependent uptake pathway for S. aureus – especially its speed. In different as of yet still unidentified host cell types or cell lines such a pathway may pose a major entry point for pathogens. Alternatively, we may have identified an ASM-dependent mode of receptor uptake, with which the bacteria “piggyback” into the cells.

      The authors allude to lower phagosomal escape rate in ASM KO cells compared to inhibitor treatment, which appears to contradict the notion of uptake and intracellular trafficking phenotype being tightly linked. As they point out, these results might be hard to interpret.

      We again want to add that we measured phagosomal escape of S. aureus in WT and ASM K.O. cells cultured in 1% FBS (low serum conditions) and compared it to escape rates obtained with host cells cultured in 10% FBS. Again, we infected cells for 10 or 30 min and determined the escape rates 3h p.i. However, the results are similar to escape rates determined with 10% FBS (see Author response image 1). This was addressed already during the manuscript’s first revision. We found that escape rates of S. aureus were significantly decreased in absence of ASM regardless of the FBS concentration in the medium.

      Author response image 1.

      We therefore think that prolonged absence of ASM has additional side effects. For instance, certain endocytic pathways could be up- or down-regulated to adapt for the absence of ASM or could be affected by other changes in the lipidome (that can be minimized but not completely prevented by culturing cells in 1% FBS). This could, for instance, affect maturation of S. aureus-containing phagosomes and hence phagosomal escape.

      As it is currently unclear in how far the prolonged absence of ASM activity affects cellular processes, we think other experiments investigating the role of ASM-dependent invasion for phagosomal escape are more reliable. Most importantly, bacteria that enter host cell early during infection (and thus, predominantly via the “rapid” ASM-dependent pathway) possess lower phagosomal escape rates than bacteria that entered host cells later during infection (Figure 5, D and E). This is confirmed by higher escapes rates upon blocking ASM-dependent invasion with Vacuolin-1 (Figure 4E) and three different ASM inhibitors (Figure 4C and D). We further demonstrate that sphingomyelin on the plasma membrane during invasion influences phagosomal escape, while sphingomyelin levels in the phagosomal membrane did not change phagosomal escape (Figure5 a and b). This is summarized in Figure 5F.

      Could an inducible KD system recapitulate (some of) the phenotype of inhibitor treatment? If S. aureus does not escape phagosome in macrophages, could it provide a system to potentially decouple the uptake and intracellular trafficking effects by ASM (or its inhibitor treatment) ?

      Knock-downs in our laboratory are based on the vector pLVTHM(6). Inducible knock-downs in the cells would require the introduction of an inducible Tet<sup>on</sup> system, which the cells currently do not harbor.

      However, it needs to be stated that for optimal gene knock-downs, the induction of this system has to be performed by doxycycline supplementation in the medium for 7 days thus leading to several days of growth of the cells, which will allow the cells to adapt their lipid metabolism thus reflecting a situation that we encounter for the K.O.s.

      ASM-dependent uptake of S. aureus in macrophages has been demonstrated before (7). However, the course of infection in macrophages differs from non-professional phagocytes (8). E.g. in macrophages, S. aureus replicates within phagosomes, whereas in non-professional phagocytes replicates in the host cytosol. Absence of ASM therefore may influence the intracellular infection of macrophages with S. aureus in a distinct manner.

      The role of ASM on cell surface remains unclear. The hypothesis proposed by the authors that the localized generation of Cer on the surface by released ASM leads to generation of Cer-enriched platforms could be plausible, but is not backed by data, technical challenges to visualize these platforms notwithstanding. These results do not rule out possible SM independent effects of ASM on the cell surface, if indeed the role of ASM is confirmed by controlled genetic depletion studies.

      We agree with the reviewer that we do not show generation of ceramide-enriched platforms (see also above). We thus already had changed Figure 6F in the revised manuscript to make clear that it remains elusive whether ceramide-enriched platforms are formed. We also had added a sentence to the discussion (line 615) to emphasize that the existence of these microdomains is still debated in lipid research.

      We think that the following observations support SM-dependent effects of ASM during S. aureus invasion:

      (i) Reduced invasion upon removing SM from the plasma membrane (Figure 2N, Supp. Figure 2M)

      (ii) Increased invasion in TPC1 and Syt7 K.O. (Figure 2, P) in presence of exogenously added SMase.

      However, we agree with the reviewer that we do not directly demonstrate ASM-mediated SM cleavage during S. aureus invasion. Hence, we had added a sentence to the discussion that mentions a possible SM-independent role of ASM for invasion (line 556) that reads:

      “Since it remains elusive to which extent ASM processes SM on the plasma membrane during S. aureus invasion, one may speculate that ASM could also have functions other than SM metabolization during host cell entry of the pathogen. However, we did not detect a direct interaction between S. aureus and ASM in an S. aureus-host interactome screen (9).”

      The reviewer acknowledges technical challenges in directly visualizing lysosomal Ca2+ using the methods outlined. Genetically encoded lysosomal Ca2+ sensor such as Gcamp3-ML1 might provide better ways to directly visualize this during inhibitor treatment, or S. aureus infection. 

      We again thank the reviewer for this suggestion. We already had included the following section in our discussion (then: line 593): “Since fluorescent calcium reporters allow to monitor this process microscopically, future experiments may visualize this process in more detail and contribute to our understanding of the underlying signaling. mechanisms.”

      References for the purpose of this response letter:

      (1) Rappaport, J., C. Garnacho, and S. Muro, Clathrin-mediated endocytosis is impaired in type AB Niemann-Pick disease model cells and can be restored by ICAM-1-mediated enzyme replacement. Mol Pharm, 2014. 11(8): p. 2887-95.

      (2) Rappaport, J., et al., Altered Clathrin-Independent Endocytosis in Type A Niemann-Pick Disease Cells and Rescue by ICAM-1-Targeted Enzyme Delivery. Mol Pharm, 2015. 12(5): p. 1366-76.

      (3) Hoffmann, C., et al., Caveolin limits membrane microdomain mobility and integrin-mediated uptake of fibronectin-binding pathogens. J Cell Sci, 2010. 123(Pt 24): p. 4280-91.

      (4) Tricou, L.-P., et al., Staphylococcus aureus can use an alternative pathway to be internalized by osteoblasts in absence of β1 integrins. Scientific Reports, 2024. 14(1): p. 28643.

      (5) Goldmann, O., et al., Alpha-hemolysin promotes internalization of Staphylococcus aureus into human lung epithelial cells via caveolin-1- and cholesterol-rich lipid rafts. Cell Mol Life Sci, 2024. 81(1): p. 435.

      (6) Wiznerowicz, M. and D. Trono, Conditional suppression of cellular genes: lentivirus vectormediated drug-inducible RNA interference. J Virol, 2003. 77(16): p. 8957-61.

      (7) Li, C., et al., Regulation of Staphylococcus aureus Infection of Macrophages by CD44, Reactive Oxygen Species, and Acid Sphingomyelinase. Antioxid Redox Signal, 2018. 28(10): p. 916-934.

      (8) Moldovan, A. and M.J. Fraunholz, In or out: Phagosomal escape of Staphylococcus aureus. Cell Microbiol, 2019. 21(3): p. e12997.

      (9) Rühling, M., et al., Identification of the Staphylococcus aureus endothelial cell surface interactome by proximity labeling. mBio, 2025. 0(0): p. e03654-24.

    1. eLife Assessment

      This important study shows that the Nora virus, a natural Drosophila pathogen that also persistently infects many laboratory fly stocks, infects intestinal stem cells (ISCs), leading to a shorter life span and increased sensitivity to intestinal infection with the bacterium Pseudomonas. The authors provide convincing data to support their conclusions. The paper provides new insights into virus-host interactions in the Drosophila gut and serves as a warning for scientists who use the fruit fly as a model to study gut physiology.

    2. Reviewer #1 (Public review):

      [Editors' note: The article has been improved and several points raised by the reviewers have now been addressed. The authors should ideally further improve the clarity of the figures and the description of the experimental methods. This is particularly important for an article discussing potential confounding factors.]

      Summary:

      This important article reveals that the Nora virus can colonize the intestinal cells of Drosophila melanogaster, where it persists with minimal immediate impact on its host. However, upon aging, infection, or exposure to toxicants, stem cell activation induces Nora virus proliferation, enabling it to colonize enterocytes. This colonization disrupts enterocyte function, leading to increased gut permeability and a significant reduction in lifespan. Results are convincing and hold significant import for the Drosophila community.

      Strengths:

      (1) Building on previous studies by Habayeb et al. (2009) and Hanson et al. (2023), this study highlights cryptic Nora virus infection as a crucial factor in aging and gut homeostasis in Drosophila melanogaster.

      (2) Consistent with the oral route of Nora virus transmission, the study demonstrates that the virus resides in intestinal stem cells, with its replication directly linked to stem cell proliferation. This process facilitates the colonization of enterocytes, ultimately disrupting intestinal function.

      (3) The study establishes a clear connection between stem cell proliferation and virus replication, suggesting that various factors - such as microbiota, aging, diet, and injury - can influence Nora virus dynamics and associated pathology.

      (4) The experimental design is robust, comparing infected flies with virus-cured controls to validate findings.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors report that Nora virus, a natural Drosophila pathogen that also persistently infects many laboratory fly stocks, infects intestinal stem cells (ISCs), leading to a shorter life span and increased sensitivity to intestinal infection with the Pseudomonas bacterium. Nora virus infection was associated with an increased proliferation of ISC and disrupted gut barrier function. Genetically, the authors show that increased ISC division in Nora virus and Pseudomonas coinfected flies is driven by signaling through the JAK-STAT pathway and apoptosis.

      Accordingly, blocking apoptosis and JAK-STAT signaling reduces viral load, suggesting that in this context the JAK-STAT pathway is proviral in contrast to other previous observations in systemically infected flies. This work adds to the findings of another recent paper showing that another persistent fruit fly virus, Drosophila A virus, also increases ISC proliferation and decreases gut barrier function. Intestinal viruses should therefore be considered confounders in studies of fly intestinal physiology.

      Strengths:

      Overall, the data are convincing and robust, starting with two wildtype fly stocks (Ore-R strain) that differ in their Nora virus infection status, followed by experiments in which cleared stocks are reinfected with a purified Nora virus stock preparation. The conclusions of the paper will be of interest to scientists working on insect physiology, virology, and immunology, but should also serve as a warning for scientists that use the fly as a model to study gut physiology.

    4. Reviewer #3 (Public review):

      Summary:

      Franchet et al. sought to characterize the impact of Nora virus on host lifespan and sensitivity to a variety of infectious or stressful treatments. Through careful and rigorous analyses, they provide evidence that the Nora virus greatly impacts fly survival to infection, overall lifespan, and intestinal integrity. The authors have been thorough and rigorous, and the experimental evidence including proper isolation of the virus and Koch's Postulate reinoculation of the organism is excellent. The additional work is valuable and to the gold standard of the field, characterizing the pathology of the gut, including data showing gut leakage, the presence of the virus in the intestinal stem cells, and the importance of stem cell proliferation for virus replication and spread using elegant genetic tools to block stem cell proliferation or enterocyte death.

      Strengths:

      The authors have been rigorous and careful. The initial finding is presented through the lens of two related strains differing in virus infection. From there, the authors characterized the virus and isolated a purified culture, which they used to reinoculate a cleared strain to demonstrate proper Koch's Postulate satisfaction. The authors have also probed various parameters in terms of dietary importance in relevant conditions for many experiments. The additional work to characterize the pathology of the gut is compelling, using genetic tools to block or allow intestinal stem cell proliferation and enterocyte death through JAK-STAT and JNK signalling alongside the tracing of virus presence using a Nora virus antibody. JAK-STAT and JNK are previously described as regulators of these processes, making these tools appropriate and convincing. It is also interesting to see good evidence that the virus itself is damaging, rather than simply permitting coinfection by gut microbes (which does happen).

    5. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) The study does not explore or discuss how oral ingestion of Nora virus leads to the colonization of stem cells, which are located basally in the gut. This mechanism should be discussed.

      We have added an additional paragraph (4th) in the Discussion dealing with this issue and are further discussing the consequences of RNAi potentially not being functional in progenitor cells in the paragraph on antiviral responses.

      (2) The authors fail to detect Dicer-GFP fusion protein expression in stem cells, a finding that could explain why the virus persists in these cells. Further investigation is needed to determine whether RNAi functions are effective in stem cells compared to enterocytes. For clarification, the authors could cross esg-Gal4 UAS-GFP and Myo-Gal4 UAS-GFP with UAS GFP-RNAi and/or express a Dicer-GFP construct under a stem cell-specific driver.

      Actually, it is well-known in the Drosophila literature on the intestinal epithelium that RNAi functions well in progenitor cells as the technique has been widely used to understand the control of stem cell division and differentiation in tens of articles. We provide here just a few examples: Jiang et al., Nat Commun (2025) https://doi.org/10.1038/s41467-024-55255-1; Zhai et al., PLoS Genetics (2017) https://doi.org/10.1371/journal.pgen.1006854; Wu et al., https://doi.org/10.1371/journal.pgen.1009649.

      (3) The presentation of experimental parameters (e.g., pathogen type, temperature, time points) should be improved in the results section and at the top of the figures to enhance clarity. Additionally, details regarding the mode of oral infection (continuous exposure vs. single feeding on a filter) should be specified. Given that fly stock flipping frequency influences microbiota load (as noted in Broderick et al.), this should be reported, especially for lifespan studies.

      P. aeruginosa oral infection was always by continuous exposure, as detailed in the Mat.& Meth. section. Nora infection was done by exposure to the viral solution for 24h, as detailed in Mat. & Meth. The flipping frequency had also been reported in that section.

      (4) To confirm that enterocyte colonization requires stem cell proliferation and differentiation, the authors should analyze Nora virus localization in JAK-STAT-deficient flies infected with bacteria or toxicants. This would help determine whether the virus can infect enterocytes in the absence of enterocyte differentiation, but stimulation of stem cells.

      We now provide these data (pictures and quantification) in Fig.7 G-H and discuss them in the main text.

      (5) The study does not discuss the spatial distribution of Nora virus infection along the gut. Specifically, it remains unclear whether viral colonization is higher in gut regions R2 and R3, which contain proliferative stem cells. Addressing this could provide valuable insights into the virus's infection dynamics.

      We have now specified that Nora virus was detected only in the posterior midgut; we are now also providing a schematic illustration in Fig. S5J.

      Recommendations for the authors:

      Major Suggestion

      See weaknesses section for key areas requiring improvement.

      Minor Suggestions

      (1) Line 79: Mention Nox in the text. Key references on Nox include Jones (2013), Iatsenko (2018), and Patel (2016).

      Done.

      (2) Line 92: The long list of publications is unnecessary and can be shortened.

      We are not sure that many investigators are aware of the scope of our studies on host-pathogen relationships and this is the adequate place for a reminder.

      (3) Line 196: Cite Choi et al. (Aging Cell, 2008; 7:318-334. doi: 10.1111/j.1474- 9726.2008.00380.x) for the initial work on gut dysplasia during aging. However, note that dysbiosis in aging is demonstrated in Buchon et al. (2009, Genes and Development) and other studies.

      Done.

      (4) Line 265: It would be interesting to clarify whether the shortened lifespan of Norainfected flies after a clean injury is dependent on the microbiota.

      The shortened life span of Nora-infected flies is not due to the injury as demonstrated in Fig. S4F. Hence, the shortened lifespan is differentially affected by the microbiota according to nutrition conditions as documented in Fig. 3D-E.

      (5) Line 285: Clarify what is meant by "polyubiquitin promoter"-do the authors mean a ubiquitous Gal4 driver? Specify the Gal4 lines used in the result section.

      Done. The construct is a direct fusion of the ubiquitin p63E promoter to the Dicer-fluorescent protein sequences as described in Girardi et al., Sci Rep, 2015.

      (6) Line 347: Indicate the references aligning with the most recent studies on this topic.

      Done.

      (7) Line 373 and elsewhere: Mention studies that have shown the microbiota influence on lifespan, in relation to dietary richness.

      Done.

      (8) Line 588: Provide details on the method used for hemolymph collection.

      Done.

      (9) Line 964: Clarify the phrase "as previously shown"-where in this paper was it demonstrated?

      The legends have been rewritten and the phrase has been deleted.

      (10) Line 987: In "survival of non-infested with PA14," explicitly mention Nora to distinguish between different infections.

      Done.

      Figures & Experimental Details

      (11) Figures: Improve figure legends or add information at the top of figures, specifying:

      Number of flies used to monitor Nora virus titer.

      Temperature conditions. o Age of flies used in experiments.

      Done.

      (12) Figure 2E: The lifespan of Nora-negative flies appears very short. Was this lifespan assay conducted at 29{degree sign}C? What was the fly stock flipping rate?

      Correct, it was 29°C. As described in the Material and Methods section, the flies were flipped every two (29°C) to four days (25°C).

      (13) Figure 4C: Improve labeling on the plate for better clarity.

      Done.

      (14) Figure 6C: The figure legend on the right is difficult to interpret. Clarify what "+" indicates and explicitly write out the genotype. Is NP identical to NPG4G80?

      Done. NP is the NP1 driver. We usually use it in a version that also includes a Gal80<sup>ts</sup> transgene to express the gene of interest only at the adult stage.

      (15) Dissection Details: Clearly state which part of the gut was dissected-midgut, entire gut, {plus minus} Malpighian tubules. This should be specified in the results section.

      Done (no Malpighian tubules nor crop) for RTqPCR analyses.

      (16) Clean Injury: Provide more details in the results section regarding the injury site and needle size.

      Done.

      (17) Use "Abx" instead of "AntiB," as the former is more commonly recognized.

      Done.

      Reviewer #2 (Public review):

      The title does not seem to be fully supported by the data. While the authors convincingly show the increased sensitivity to Pseudomonas infection, effects on another tested bacterium, Serratia marcescens, were not significantly different between Nora-virus-infected and noninfected flies. Thus, effects of 'intestinal infection' seem to be too broad a claim.

      We agree with the reviewer and have accordingly modified the title, which now explicitly refers to P. aeruginosa.

      Also, whether the Nora virus increases sensitivity to oxidative stress is not so clear to me: the figure that supports this claim is the survival assay of Figure 5F. However, the difference in survival between control and paraquat-treated Nora (-) flies seems to be in the same order as between control and paraquat-treated Nora (+) flies. Rather, cause and effect seem to be the reverse: paraquat increases ISC proliferation, higher viral loads, and consequently shorter survival. I suggest rephrasing the title and conclusions accordingly.

      While we usually just directly compare Nora (+) vs. Nora (-) flies with the same conditions, we note that the difference of survival between control and paraquat-treated Nora (-) flies is of about 9 days, based on LT50 values whereas it is of 8 days for Nora(+) flies. This difference is of about two days when comparing Nora (+) to Nora (-) flies exposed to paraquat. Thus, Nora does contribute to an increased sensitivity to oxidative stress likely by the process highlighted by the reviewer and also by its own detrimental action on the homeostasis of the intestinal epithelium and associated disruption of its barrier function.

      Quantification of immunofluorescence microscopy is missing, rendering the images somewhat anecdotal. Quantification should be provided. It will then also be of interest to quantify the number of Nora (+) cells, and the Nora virus levels per infected cell (e.g. Figure 5H). Also, the claim that the Nora virus initially infects ISC and later (upon stress) infects enterocytes requires quantification.

      Missing quantifications of pictures have been added: Figs. S5E and 7H. We are not sure we understand the reviewer comment on “Nora virus levels per infected cell”: the signal we are seeing may correspond to aggregates of the virus and would be impossible to quantify reliably, e.g., in the right-most panel of Fig. 5H. Fig. 5I clearly shows that no Nora is detected in enterocytes of young 5-day-old flies in the absence of infectious or xenobiotic challenge.

      Genetic support for the role of the JAK-STAT pathway in driving ISC proliferation and supporting Nora virus replication is convincing. It would also be of interest to analyze other pathways implicated in ISC proliferation (e.g. JNK, EGFR), especially given the observations of Nigg et al, showing an involvement of STING/NF-kB and EGFR pathway in driving intestinal phenotypes of Drosophila A virus-infected flies (doi: 10.1016/j.cub.2024.05.009).

      We agree with the reviewer that these would be interesting experiments to perform, especially in the light of one hypothesis that antiviral defenses may prevent the initial infection of enterocytes as discussed at length in our updated discussion on host antiviral defenses. However, we are currently unable to perform additional experiments and leave it to other interested investigators studying antiviral innate immunity to address these questions. In this work, we used the interference with the JAK-STAT pathway as a second tool to block the division of ISCs.

      Figure 5E: An intriguing observation is that GFP:Dicer2 seems to be unstable in Nora virusinfected cells. Here, GFP control driven by the same driver line would be required to confidently conclude that this is due to an effect on Dicer-2 specifically.

      Actually, this experiment was not performed using the Gal4-UAS system but a direct fusion. We do know that GFP is stable when expressed in enterocytes, e.g., Lee et al., Cell Host&Microbe (2016) DOI: 10.1016/j.chom.2016.10.010.

      Legends are mostly conclusive, and essential information about the experimental setup is missing in the captions of multiple figures, making the interpretation of the data difficult. See my private recommendations for suggestions to improve the data presentation.

      Done.

      Recommendations for the authors:

      Suggestions for the presentation of the data:

      (1) I found the names Ore-R(SC) and Ore-R(SM) for noninfected vs infected Ore-R flies not very intuitive. I suggest renaming them into something that makes the infection status clear.

      These notations refer to two distinct sub-strains that may reflect different origins with some likely genetic drift accounting for the distinct properties of the two sub-strains. As the ORE-R (SM) have different infection status: infested, cleaned, re-infected, we fear that this would not clarify the matter. Of note, ORE-R(SC) are refractory to Nora virus infection (Fig. S1I).

      (2) Please define the number of flies analyzed for survival assays in the legends.

      Done.

      (3) The authors provide conclusions in most of the figure legends, without providing an explanation of the experiment that was done. Conclusions should be used sparingly, if at all, in legends. Also, relevant information is often missing in the legends (time points after infection, Figure 2E food source, etc.). I suggest the authors carefully double-check their legends and rephrase the conclusive legends with descriptive ones.

      Done. The figure legends have been rewritten.

      (4) Several of the legends indicate that 'data represent the mean of biological triplicates' however some panels do not represent triplicates (e.g. Figure 1C-E). Please correct.

      Done.

      (5) Legends: which multiple comparison test was used for ANOVA?

      Done. Tukey’s post-hoc test was used for direct comparisons.

      (6) Line 888: black arrows are not shown in the figure.

      Corrected.

      (7) Figure 1F: legend on the figure seems incorrect (all are labeled Nora (+)); likewise for Figure 2C (all labeled Nora (-)).

      Corrected.

      (8) Materials and methods: please describe how the Nora virus antibody was raised (and specify on line 271 what viral protein is recognized).

      Done. As the whole virus was used for immunization, we cannot state which specific viral proteins are detected by the antibody.

      (9) Please define what is presented in the box plots (mean, range, whiskers, individual data points).

      Done.

      (10) Figure 4 and associated text (line 221): a brief explanation of the Smurf assay would be useful.

      Done.

      (11) Figure 4C: I did not find the picture of the agar plate informative, as similar information is conveyed in Figure 4D. Also, the labelling cannot be clearly read.

      Figure 4D provides a quantification of panel C. The readability has been improved.

      (12) Figure 4C: It is suggested that Nora-positive, smurf-negative flies were analyzed, but from Figure 4B it seems that these do not exist. Please explain.

      The data in Fig. 4B do not represent absolute numbers but percentages. Thus, there were at most 50% of SMURF-positive flies at the time of the assay, the rest being Smurf-negative yet Nora-positive.

      (13) The abbreviations PA14 and Db11 are used in several figures. I would suggest defining the abbreviation in the legend to facilitate interpretation.

      Done.

      (14) Figure 5A/5G: the Nora virus RNA levels in this figure are dramatically lower than the levels in other figure panels. Please check/correct.

      Done. The reviewer is indeed correct: we have forgotten to write that for these two panels, the loads are relative and not absolute as is the case in other panels. 5A: the load in whole flies was taken to be 1; 5G: untreated Nora-positive flies were taken to be 1.

      (15) Figure 6A: total number of AporTag positive cells are reported. Were the same number of total cells analyzed? Please define.

      We have not counted all of the cells in each midgut but provide the number of ApopTag positive cells per midgut. We thus make the assumption that the overall number of midgut cells is not varying much from one midgut to the other. Visual inspection of DAPI-stained nuclei did not reveal any obvious change in the density of enterocyte nuclei as illustrated in Fig. S6 (we guess that everyone in the field is making the same assumption when counting mitotic ISCs with PHH3 staining).

      (16) Figure 6C: I find the shades of blue difficult to distinguish and suggest to us other colors.

      Done.

      (17) There seems to be a large mismatch between the percentage of Nora virus-positive cells in Figures 5C, 6H and the images of Figures 5G and 5H. Why?

      We think there might be a mistake with the Figure numbers cited by the referee. We guess the point the referee was trying to raise is the difference of perceived Nora virus burden between Fig. 5H and Fig. 6G, a quite valid point. For Fig. 5H, we had measured the Nora-virus load by RTqPCR (Fig. 5G, relative burden) but had not quantified the images. This is now done and shown in Fig. 5I. In Fig. 5H, young flies were used and hence there was no Nora virus detected in ECs, as now quantified in Fig. 5I. For Fig. 6G, we had to use 30-day old intestines to be able to observe Nora virus in the enterocytes of the controls. We have now included this important point in the main text and in the Figure legends.

      (18) The Title of the legend in Figure 7 is not supported by the data as 'spread through the intestine' has not been analyzed. Please adjust.

      Done.

      (19) All figures in which ANOVA is used: I assume that anything not labeled with an asterisk was found to be non-significant? If so, this should be indicated in the manuscript.

      Actually, we have not highlighted obvious differences to maintain clarity (e.g., Fig. 1E between uncured Ore-R(SM) and cured Ore-R(SC). We thus have underlined the biologically relevant differences in the panels. The interested readr can refer to the primary data that are accessible on a data repository.

      (20) Figure 7C: the authors may want to contrast their finding that Upd3 was not upregulated in Nora virus-infected flies (in the absence of PA14) with the findings of Kuyateh et al, who did report upregulation of Upd3 (https://doi.org/10.3390/v15091849).

      We thank the reviewer for pointing out this study we were unaware of. We would like to point out that this article is difficult to follow as it is not 100% clear in which of the analyzed studies the induction of upd3 was observed and which exact experimental conditions were followed, e.g., young or old flies, whole flies or gut… We have looked in more detail at ref. 133 of this article, which refers to an unpublished study from the Hultmark laboratory that is however available online: (https://www.diva-portal.org/smash/record.jsf?aq2=%5B%5B%5D%5D&c=15&af=%5B%5D&searchType=SIMPLE&sortOrder2=title_sort_asc&query=Nora+virus&language=en&pid=diva2%3A1045375&aq=%5B%5B%5D%5D&sf=all&aqe=%5B%5D&sortOrder=author_sort_asc&onlyFullText=false&noOfRows=50&dswid=4587).

      In that study, flies were “infected” with Nora virus by expressing a cDNA clone injected into embryos. The problem is that for some unknown reasons the authors used Relish mutant flies. It is thus difficult to conclude as these flies are defective for the IMD and Sting pathways whereas our flies are wild-type. We were also interested to read that genes involved in midgut stem cells differentiation were expressed in flies harboring Nora virus, which is in keeping with the data of the present study. However, it is difficult to discuss this when we know little on the background of the studies analyzed by Kuyateh et al, in as much as our Discussion is already rather long.

      (21) Figure 7E: are the differences between control and Dome/Stat knockdown flies significantly different for Nora (+) flies (in the absence of Pseudomonas)? This is not clear from the data presentation.

      The answer to the question is positive: the JAK-STAT pathway also contributes to the maintenance of intestinal epithelium homeostasis in the absence of bacterial infection, that is presumably basal conditions. We have modified Fig. 7E to include more comparisons.

      Textual suggestions:

      (22) Line 25 strives > thrives

      Done.

      (23) Lines 150- 152, etc are not very informative. Also, some of the viruses analyzed are not "known contaminating viruses", but viruses used experimentally (VSV, IIV6, CrPV). I suggest adjusting the phrasing.

      Done.

      (24) Line 862: weaker fitness > lower fitness.

      Done.

      (25) Virology terms:

      (a) I suggest not using the term titer for qPCR readouts (which do not involve titration). Viral RNA level or viral RNA load would be more appropriate.

      Done.

      (b) I would propose rephrasing the Y-axis label of Figure 1C, E to Nora RNA load (same for other figures showing viral RNA).

      Done.

      (c) Infested: rather use the more accurate term infected.

      Done.

      (d) Contamination: rather use the term infection.

      We have modified some but not all occurrences of this word. We believe that it is important to use the word contamination when referring to enterocytes: the enterocytes are not infected by Nora; rather, differentiated infected ISCs become contaminated enterocytes. Infection refers to an active process whereas contamination refers to a state.

      (e) Proliferation: rather use the term replication.

      According to our US-English dictionary, proliferation refers to the “rapid reproduction of a cell, part, or organism”, which is the meaning we intend. Replication does not have this notion of speed of reproduction.

      (f) Drosophila should not be italicized in Drosophila A virus, following the ICTV convention that a "virus name should never be italicized, even when it includes the name of a host species or genus" https://ictv.global/faq/names.

      Done.

      (26) Line 873-975: please rephrase the legend of Figure 1F as the current one is not informative.

      Done.

      (27) Line 934: I suggest moving the justification of the time point chosen "= LT50 on the survival test in 935 Fig. 2E" to the main text.

      Done.

      (28) Line 936: with drop > with a drop.

      No longer relevant.

      (29) Line 940-941: the grammar of the sentence does not seem to be correct as it suggests that SDS induces Diptericin expression.

      No longer relevant.

      (30) Line 952-953; line 980: please correct mismatch singular/plural (antibody have, inhibition do).

      Done.

      (31) Line 422: "It will be interesting to determine whether the absence of a Dcr2 fluorescent proteins fusions in progenitor cells that we report in this study rules out a role for the RNAi pathway in intestinal host defense against the Nora virus". It would be of interest to discuss this finding in the context that virus-derived Nora virus siRNAs can be easily detected and that the viruses encode an RNAi antagonist (doi: 10.1371/journal.ppat.1002872).

      Done. We have updated the Discussion and propose a model whereby RNAi would prevent primary infection of enterocytes and then virus replication in proliferating progenitor cells would allow the virus to effectively inhibit the RNAi machinery when the infected progenitor cells become enterocytes.

      (32) Line 159: Nora virus phenotypes differ between laboratories. I would be interested to read the authors' speculations on why this would be the case.

      Our work shows that the effects of Nora virus depend significantly on several parameters we have identified: nutrition quality, age, exposure to abiotic or biotic stresses, and fly genotypes with the existence of Nora-refractory strains. These parameters as well as potential differences between laboratories are actually discussed in the second paragraph of the Discussion.

      (32) Line 175: capitalization of ORE-R vs Ore-R at other places in the manuscript.

      Done.

      (33) Line 185-194: PA14 and Pseudomonas are used interchangeably. Perhaps it is clearer to stick to a single term for consistency.

      PA14 is one clinical strain used to study P. aeruginosa. There are many others such as PAO1, which is also widely used. We have decided to write P. aeruginosa PA14 the first time we are using it in each figure legend, and use only PA14 afterwards.

      Reviewer #3 (Public review):

      The claim that Dcr2 is not abundant in ISCs because the protein is not stable is logically consistent and reasonable. Perhaps I missed this, but the authors could additionally knock down or use somatic CRISPR to delete Dcr2 in ISCs to test whether a lack of Dcr2 underlies sensitivity. In this experiment, the expectation would be that depleting Dcr2 in ISCs genetically would make little difference to susceptibility overall compared to controls. This is not an essential experiment request.

      We agree with the reviewer that these would be interesting experiments to perform. However, we are currently unable to perform additional experiments and leave it to other interested investigators studying antiviral innate immunity to address these questions dealing with the specific steps of RNA interference that may be missing in progenitor cells.

      Recommendations for the authors:

      (1) Line 206-207 and 214-216: the order of ideas presented here is unintuitive. In Lines 206207, it is said that ABX treatment had no effect, which is counterintuitive to the nature of infection susceptibility. But this is resolved in Lines 214-216 when the reader realizes that S3G is fed on a sucrose solution, and so likely microbiota-depleted. Perhaps more could be said to clarify this in the main text, and/or swap the order of these observations so a casual reader is not confused about the nature and extent of the microbiota contributing to the sensitivity of Nora-infected flies.

      As suggested by the reviewer, we have clarified the text with respect to the food source and microbiota load; we emphasize that the microbiota plays a protective role in Nora-negative flies fed on sucrose solution even though the microbiota load is very low under these conditions. Of note, the microbiota is not depleted in sucrose-fed Nora-positive flies: we suspect that delaminating enterocytes may actually provide directly or more likely indirectly (peritrophic matrix) nutrients for the microbiota.

      (2) Line 262-265: the text may be a bit exaggerated given only 3 pathogens tested, one of which was a fungal natural infection breaching the cuticle and largely bypassing the gut. This could be re-phrased.

      The important point is that uninfected Nora-positive flies die with a LT50 of about 10 days even when noninfected; it has nothing to do with the number of pathogens tested. Thus, any infection that causes death with kinetics in this range may be misinterpreted in the absence of a relevant uninjured or clean injury control.

      (3) Line 379-382: I don't know if citing Schissel et al. is needed here. This paper's methods and data are highly problematic, as mentioned by the authors. This is not a highly cited paper, nor does it add value to the present discussion to cite it only to discredit it. Perhaps this can be left out and the field can move on quietly - naturally, this choice is the present authors', and this is just my view.

      We have actually cited this article at two other places and thus had not cited it “only to discredit it”. We have nevertheless removed the lines as suggested by the reviewer.

      (4) Line 404: perhaps clarify "Interestingly, mammalian stem cells..."

      Done.

      (5) Line 455: my understanding of digital PCR is that it is highly useful for detecting rare variants but not particularly better than qPCR for estimating loads/titres? This is not to say dPCR is worse, just that dPCR and primer-specific RT + qPCR are comparable if load/titre is desired. For instance, Qiagen actually recommends qPCR over dPCR specifically (and pretty much exclusively) for gene expression: https://www.qiagen.com/us/applications/digitalpcr/beginners/dpcr-vs-qpcr.

      (6) Perhaps Line 455 could drop the advocacy for digital PCR? I agree using dissected guts, or seemingly aged individuals per Figure 3B(?), is a valuable thing to point out. Maybe the aged individuals point could be added here? I guess the idea behind dissected guts is to have samples enriched in Nora virus.

      Cleaning Nora-positive strains is really difficult and we suspect that as long as there is one viral particle left, it may be sufficient to re-ignite the contamination of the strain. Our own experience with digital PCR on the expression of AMP-like molecules in the head of flies is that we found the approach to be more sensitive than classical RTqPCR (Xu et al., EMBO Rep, 2023).

    1. eLife Assessment

      This valuable study identifies and characterizes probe binding errors in a widely used commercial platform for spatial transcriptomics, discovering that at least 21 out of 280 genes in a human breast cancer panel are not accurately detected. The authors provide convincing evidence for their findings through validation against multiple independent sequencing technologies and reference datasets, and they introduce a computational tool to help predict potential off-target probe binding. Given the broad adoption of this platform in biomedical research, this work provides an essential quality control resource that will improve data interpretation across numerous studies.

    2. Reviewer #2 (Public review):

      This paper describes an analysis of a commercially available panel for a spatial transcriptomic approach and introduces a computational tool to predict potential off-target binding sites for the type of probe used in the aforementioned panel. The performance of the prediction tool was validated by examining a dataset that profiled the same cancer tissue with multiple modalities. Finally, a detailed analysis of the potential pitfalls in a published study communicated by the company that commercialized the spatial transcriptomic platform in question is provided, along with best practice guidelines for future studies to follow.

      Strengths:

      - The manuscript is clearly written and easy to follow.<br /> - The authors provide clean, organized, and well-documented code in the associated GitHub repository.

      Comments on revision:

      My impressions from the first round of review haven't really changed. I don't think the software tool is well developed, and failing to incorporate thermodynamics or consider the impact of alignment settings is a major weakness.

      I do think the topical area is relevant. The inclusion of the Xenium /Hubmap data modestly strengthens the manuscript relative to the original submission.

    3. Reviewer #3 (Public review):

      Summary:

      The authors present a new computational method (OPT) for predicting off-target probe binding in the commercial 10X Xenium spatial transcriptomics platform. They identified 28 genes in the 10x xenium human breast cancer gene panel (280 genes) that are not accurately detected at the single-molecule level. They validated the predicted off-target binding using reference data from single-cell RNA-seq and 3'-sequencing-based Visium RNA-seq. This work provides a practical resource and will serve as a valuable reference for future data interpretation.

      Strengths:

      (1) Provides a toolbox for the community to identify off-target probes.

      (2) Validates the predictions using single-cell RNA-seq and sequencing-based Visium RNA-seq datasets.

      Comments on revision:

      The authors state that OPT is a new software tool and have posted example code on GitHub. However, the Jupyter notebook does not display any figures or workflows that would allow the process to be replicated. Please provide documentation and code that can reproduce the results/figures presented in the paper.

    4. Author Response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      We thank the editors and the reviewers for their constructive feedback in helping us strengthen this manuscript.

      During the revision process, new information was shared with us by the 10x Genomics team regarding the Xenium probe sequences evaluated in our original paper. Briefly, the Xenium probe sequences we evaluated represented an earlier iteration of the probes used to generate the data in Janesick et al. Further, we were made aware that the probe sequences used in Janesick et al. represented an earlier iteration of the commercially available Xenium v1 Human Breast Gene Expression Panel. We now elaborate further in a new Supplementary Note. We have therefore updated the paper throughout to reflect this new understanding, though we emphasize that our conclusions do not change. Rather, this newfound understanding provides stronger evidence of off-target probe binding with imperfect sequence matching, which we support with new supplementary figures.

      (1) Limited evaluation of tissues and gene panels

      “The results were only tested with one tissue (human breast). However, this is not a major weakness, as one can easily extrapolate that this should be the case for any other tissue.”

      “Does not apply the OPT method to the most widely used Xenium gene panels (e.g., pan-Human, pan-Mouse panels with ~5,000 genes each).”

      “The authors claim that OPT is a generalizable method for identifying off-target probes. To support this claim, they should provide similar predictions for the Xenium Pan-Human or Pan-Mouse gene panels, which are more widely used than the breast cancer panel.”

      “While I understand that conducting new experimental studies is likely beyond the authors' intended scope of the manuscript, the narrow reliance on Janesick et al. for all of the validation makes it difficult to assess the broad usability of OPT. In the absence of designing and then validating novel padlock probe designs with OPT, are there other publicly available datasets that authors could perform secondary analysis on using OPT?”

      Our primary focus on breast cancer was driven by data availability rather than tissue specificity. For this probe panel, matched Xenium, Visium, and scRNA-seq datasets are publicly available, enabling direct cross-platform comparisons of gene expression and allowing us to evaluate the impact of off-target probe binding in Xenium.

      OPT is tissue-agnostic and can be applied to any probe panel regardless of tissue type. To demonstrate this generalizability, we have now applied OPT on all publicly available 10x Genomics probe sets beyond the breast panel, including the Xenium pan-Human and pan-Mouse gene panels. The complete results of these analyses have been generated and are provided as a compressed zip file accompanying the revised manuscript.

      Beyond pre-designed panels, in this revision, we have now also applied OPT to custom Xenium gene panels from the Human BioMolecular Atlas Program (HUBMAP) and further demonstrate integration of HUBMAP RNA-seq data to evaluate the impact of potential predicted off-targets in a new section “Bulk RNA-seq reference atlases suggest off-target binding can variably impact results in Xenium custom probe panels.”

      Overall, in these newly evaluated panels, we identify many cases of off-target probe binding with non-negligible expression of off-target genes in the target tissue, underscoring that our findings are not specific to human breast tissue. Therefore, in the revision, we have broadened the title to “Evidence of off-target probe binding affecting 10x Genomics Xenium Gene Panels compromise accuracy of spatial transcriptomic profiling”

      (2) Limited quantifications

      “Lacks clarity on how the confidence level of off-target predictions is calculated.”

      “How can the confidence level of these off-target predictions be quantitatively assessed? Please provide benchmarks or validation metrics if available.”

      We thank the reviewer for raising this important point. To strengthen our claim that predicted off-targets can contribute to observed Xenium expression patterns, we incorporated a quantitative assessment in addition to the qualitative comparisons presented previously. Specifically, we leveraged Visium and scRNA-seq data to compare spot- and cluster-level expression of target genes alone versus expression aggregated with their predicted off-target genes. Across all examples shown, inclusion of predicted off-targets consistently resulted in stronger agreement with the Xenium results, as reflected by decreased RMSE and increased Pearson correlation relative to using the target gene alone.

      We emphasize, however, that OPT does not assign a formal confidence score to off-target predictions based on sequencing data alone. Importantly, identification of a potential off-target by OPT does not imply that it will necessarily affect Xenium results. As we’ve noted, if the off-target gene is not expressed, then it will not affect the observed gene expression magnitudes of the target gene. To help users assess whether predicted off-target genes will affect observed gene expression magnitudes of the target gene for a tissue of interest, we now provide a complementary analysis, including heat-map visualizations comparing the expression of target genes and their predicted off-targets in matched bulk RNA-seq or scRNA-seq datasets from the same tissue (Supplementary Figures 9, 10, 11). We hope this evaluation pipeline will clarify to researchers they can evaluate whether predicted off-targets will appreciably affect results in their tissue of interest.

      (3) Under-developed and non-essential software

      “The manuscript section on the software tool feels underdeveloped.”

      “Once the 10X Genomics corrects their gene panels according to this finding, the tool (OPT) will not be useful for most people. Still, it can be used by those who want to design de novo probes from scratch.”

      “Since the authors claim that OPT is intended for community use, the paper should provide a clear, step-by-step user guide, such as Jupyter tutorial, ideally as supplementary material.”

      We agree with the reviewers that the description of the software tool itself is relatively concise. This is intentional, as the primary goal of this manuscript is not to introduce a standalone software framework, but rather to use the tool as a means to characterize and quantify off-target probe binding and its potential downstream impact on spatial gene expression analyses. Accordingly, our emphasis is placed on the biological and analytical insights enabled by this approach, rather than on extensive software tool details. To support potential users, we have now included additional software documented with an example Python notebook demonstrating how it can be applied to any probe panels in the GitHub repository: https://github.com/JEFworks-Lab/off-target-probe-tracker/blob/main/example.ipynb

      Likewise, the primary goal of this manuscript is not to suggest that a specific vendor’s probe panels are flawed, but rather to demonstrate that off-target probe binding is a general and underappreciated phenomenon that can occur in some probe-based spatial transcriptomics platforms to meaningfully impact downstream analyses and biological interpretation.

      OPT was developed as a framework to identify potential off-target probe interactions based on sequence homology. In practice, OPT can serve as a post hoc tool that allows researchers to assess whether predicted off-target interactions may exist in a given panel and to account for these possibilities when interpreting spatial expression patterns, even when panels have been developed by the many probe designing methods now highlighted in the revised manuscript. Given the complexity of probe design and hybridization behavior, we believe that explicitly identifying and reporting potential off-targets remains valuable for downstream data interpretation, cross platform comparisons, and reproducibility. Thus, OPT is intended to complement existing probe design strategies and vendor efforts, rather than replace them, by providing researchers with additional context to interpret their data more accurately.

      In our revision, we have therefore elaborated on this in the discussion, reiterated here for convenience: “Although we focus here on the 10x Genomics Xenium technology, we do not exclude the possibility that off-target binding may similarly affect other probe-based gene detection approaches from other commercial vendors. Any technology that relies on hybridization-based detection is inherently susceptible to off-target probe binding when sequence similarity exists. Further, hybridization-based detection often inherently involves a trade-off between sensitivity and specificity. Given these inherent technological limitations, we therefore emphasize the importance of transparency through sharing probe sequences. However, many companies do not release the probe sequences used in their assays, limiting the consumer’s ability to fully interpret their results as well as the community’s ability to effectively characterize and benchmark performance variation across platforms. Therefore, we strongly recommend that companies publish probe sequences for pre-designed panels and likewise that researchers using these technologies should obtain and publish probe sequences used in their studies to support transparent and reproducible science. “

      Recommendations for the authors:

      “The paper only describes evidence of the off-target effect based on perfect sequence homology, although the tool (OPT) provides an option to find additional "potential" off-targets that allow mismatches. It would be very nice if the authors could additionally provide at least one example of off-target binding with at least one mismatch.”

      We thank the reviewer for the opportunity to clarify this point. In addition to analyses based on perfect sequence homology, we examined predicted off-target binding when allowing mismatches at the terminal ends of probe sequences. This analysis is presented in the Results section titled “OPT results when allowing mismatches at the terminal ends of the probe sequences identifies additional off-target candidates.”

      In this revision, we now allowed a 10bp padding on either end of the 40bp probe sequence, permitting imperfect sequence matching at the terminal regions. Under these conditions, OPT identified additional off-target candidates, including TUBB2B and ACTG2, which we highlight as representative examples (Supplementary 7,8). We further demonstrate how these predicted off-target interactions impact gene expression concordance by comparing Xenium measurements with both Visium and scRNA-seq data, showing measurable changes in cross-platform agreement. Together, these results illustrate that allowing mismatches reveals biologically relevant off-target effects beyond those captured by perfect sequence homology alone.

      “Clarifications and updates for Figure 2A-B

      Xenium offers a resolution of up to 200 nanometers with continuous readout, without pixel gaps. However, the figures shown in Figure 2A-B appear pixelated - why is this the case? Could the authors clarify this discrepancy and, if possible, provide the raw feature intensity data for Xenium in the supplementary materials?

      Additionally, there appear to be no visible gaps in the Visium graphs. Could the authors update the figure panels to represent the true spot locations for Visium, to more accurately reflect the underlying data structure?”

      We thank the reviewer for the opportunity to clarify these points. The goal of Figure 2A-B is to facilitate a direct visual comparison of gene expression patterns between the Visium and Xenium platforms. To enable this comparison, we aggregated the single-cell Xenium data into spatial patches matching the effective resolution of Visium spots (55x55µm). Similarly, Visium spots were rendered as patches to produce a more continuous visual representation. As a result of this aggregation and visualization choice, the Xenium expression plots appear pixelated despite Xenium’s native subcellular resolution (up to ~200 nm with continuous readout). We have clarified this processing and visualization step in the Methods to avoid confusion.

      With respect to the Visium expression plots, the lack of gaps is also a consequence of rendering each spot as a filled patch rather than plotting traditional Visium spots. This was done intentionally to maintain visual consistency with the aggregated Xenium data and to emphasize spatial concordance rather than the underlying sampling geometry. We have now explicitly stated this design choice to improve clarity.

      “I found the format of the manuscript to be at times confusing and perhaps a bit of an odd fit for a general interest journal. A significant portion of the manuscript is spent critiquing a specific publication, "High resolution mapping of the tumor microenvironment using integrated single-cell, spatial and in situ analysis" published by Janesick et al. (of 10x Genomics, Inc.) in Nature Communications in 2023. This content would seem more appropriate as a Comment submitted to Nature Communications, potentially to be accompanied by a response from the authors of Janesick et al. at 10x.”

      I would like to address this important point as the corresponding author who takes primary responsibility for the unconventional decision to submit this manuscript to eLife as opposed to as a commentary suggested by the reviewer.

      Consistent with the reviewer, I did initially consider submitting this as a Matters Arising to Nature Communications. However, after consultation with other senior colleagues and co-authors, I decided to forgo this route on the basis that the information provided in a Matters Arising must be kept confidential. I was concerned that this would lead to long, drawn-out private exchanges. As we note in the manuscript, the Xenium platform's widespread use and high cost imposed a certain urgency that I believed warranted open and rapid dissemination.

      Therefore, we submitted to eLife with the hope that eLife’s unique continuous post-publication public peer review process will enable the rapid dissemination of these important financially-sensitive insights while permitting constructive criticisms from both industry and academic expert reviewers to be openly considered by all readers.

    1. eLife Assessment

      This important study developed a novel paradigm combined with EEG recordings to examine the neural mechanisms underlying temporal integration in perception and its modulation by prior history (i.e., the serial dependence effect). The results provide solid evidence that two key EEG features, namely the individual alpha frequency and the aperiodic slope, jointly and independently shape perceptual integration and its reliance on prior information. While additional control analyses would further strengthen the main conclusions, the findings will be of broad interest to researchers studying perception, decision-making, inter-individual differences, and brain rhythms.

    2. Reviewer #1 (Public review):

      Summary

      Alpha oscillations have been previously proposed to shape the temporal resolution of visual perception, with a higher alpha frequency providing a finer resolution. This study goes beyond by investigating three additional processes that could influence joint visual temporal perception: the aperiodic neural signal, the integration of recent perceptual experience (serial dependence), and subjective confidence. To address their question, they developed a novel task where two Gabor patches oriented in opposite directions are presented in a continuous stream. This allows for testing for robust perceptual integration while avoiding bias from suboptimal perception. Behavioral analyses revealed an association between confidence and individual temporal integration thresholds, and demonstrated that serial dependence biases visual temporal integration as well as its associated confidence. EEG analyses first replicated the previous findings showing that faster IAF provides higher temporal resolution. Interestingly, the aperiodic neural signal was associated with both perceptual and temporal precision. Finally, the authors show that serial dependence is reduced in individuals with faster IAF and enhanced in participants exhibiting a stronger aperiodic component. Together, these findings highlighted that visual temporal integration arises from an interplay between alpha oscillations, the aperiodic signal, serial dependance and subjective confidence.

      Strengths:

      (1) The novel task proposed in the study represents a substantial improvement over the two-flash fusion task previously used to investigate the role of alpha oscillations in visual temporal perception.

      (2) Serial dependence has attracted increasing interest in vision research in recent years. Testing whether recent visual inputs also influence temporal resolution is, therefore, a valuable and timely approach. In this regard, the authors provide evidence for a serial dependence effect.

      (3) Although the functional role of brain oscillations has been extensively studied over the past decade, the role of the aperiodic neural signal has long been overlooked. This study revealed that the aperiodic component plays a role in perceptual precision and temporal resolution, thus providing evidence for an important role of the aperiodic neural signal.

      (4) The mediation analysis demonstrates that the aperiodic and oscillatory neural components act independently, providing important insights for future studies aimed at understanding their respective role.

      Weaknesses

      It would have been valuable to record EEG continuously during the experiment to investigate how spontaneous alpha oscillations and aperiodic signal dynamically influence the temporal integration, serial dependance and confidence on a trial-by-trial basis.

      Appraisal

      The authors employed a novel and thoughtfully designed task, combined with appropriate analyses, to address their research question. Their results are convincing and provide strong support for their conclusions.

      Impact

      This study provides valuable insights into the role of the aperiodic neural signal in visual temporal integration. This is important because its contribution has likely been underestimated, and future research will likely uncover increasing evidence of its impact across multiple cognitive functions.

      It was also very interesting to observe how alpha oscillations are associated with serial dependence and confidence, extending beyond their well-known role in visual temporal resolution. This opens intriguing avenues for future research on the functional role of alpha oscillations.

    3. Reviewer #2 (Public review):

      Summary:

      This paper examines resting-state electroencephalography (EEG), the electrophysiological underpinnings of the temporal integration window in perception, and its modulation by priors (serial dependence) as measured through the perceptual fusion point of two continuous alternating stimuli. The study also includes a measure of perceptual confidence. Separating periodic from aperiodic EEG activity, the results show that the faster the individual alpha-frequency at rest and the steeper the aperiodic slope (previously linked to higher sampling/ lower noise), the lower the perceptual fusion point (corresponding to narrower integration windows), with independent contributions of the period and aperiodic activity to the integration window. The data also reveal that the point of fusion depends on prior history, and that the strength of this effect depends on individual alpha frequency and aperiodic slope: the lower the individual alpha frequency and the aperiodic slope, the stronger the serial dependence, with the two contributions being again independent. Higher alpha frequency also led to higher confidence. The results are interpreted to suggest that speed of alpha oscillations and aperiodic slope of the power spectrum (presumably reflecting rate/fidelity of visual sampling and the level of background noise) jointly shape the perceptual measure under study: high rate/ fidelity and low noise promote temporal precision in integration, while lower rate/fidelity and higher noise lead to a higher reliance on prior history. It is concluded that it is the interaction between two EEG features that shapes temporal integration and hence perceptual fusion.

      Strengths:

      The strength lies in the use of a continuous visual stream of two alternating stimuli whose timing shapes fusion or separation of the two stimulus precepts, avoiding some of the pitfalls of previous fusion probes through discrete (not continuous) stimulus pairs (missed detection of one stimulus of the pair may be misinterpreted as fusion). The results seem robust (based on n=83 participants), the results are interesting, and the interpretations are sound.

      Weaknesses:

      The main weakness lies in the reliance on resting state EEG for correlation with the behavioural measures. This captures trait-based relationships, but does miss out on the brain activity dynamics within/across trials, which could be used for a direct readout of evidence accumulation to a decision, for capturing spontaneous fluctuations of the processes under study, etc. Also, in terms of resting state EEG, both eyes-closed (EC) and eyes-open (EO) data have been recorded, but their links to perceptual fusion point/ confidence seem somewhat inconsistent across the results. This is a bit confusing. Are the EO and EC signals in any way related/ correlated, and if not, what are they supposed to represent? Would an analysis of these EEG measures during task performance (e.g., in a pre-stimulus = baseline time window) provide more consistent results? These points could be resolved by additional analyses and/or more elaborate discussions.

    4. Reviewer #3 (Public review):

      Summary:

      In this study, the authors seek to explain what influences the temporal resolution of visual perception and its associated metacognitive monitoring, interindividual differences in such processes, and the neural mechanisms associated with these interindividual differences. More specifically, they investigated the factors influencing the perception of a rapid alternating stream of visual patterns as a single fused percept versus two segregated stimuli, and how these factors relate to stable features of ongoing brain activity. They introduce a novel sustained-stream temporal integration paradigm designed to address limitations of traditional two-flash tasks, and combine this with resting-state electroencephalography (EEG) to examine how individual alpha peak frequency and the aperiodic component of the power spectrum relate to temporal integration thresholds, perceptual history effects, and subjective confidence. Their overarching aim is to move beyond a purely oscillatory account of temporal sampling and to test whether periodic (alpha) and non-periodic (aperiodic) neural dynamics jointly shape perceptual decisions.

      Strengths:

      The study has several notable strengths. First, the experimental paradigm represents a thoughtful and innovative refinement of earlier approaches. By presenting alternating gratings within a continuous stream and varying the duration of each element rather than introducing discrete blank intervals, the authors mitigate well-known confounds of classical two-flash paradigms, particularly the possibility that "fusion" reports reflect missed detections rather than genuine temporal integration. The psychometric functions are well characterized, and the sample size is large for an individual-differences EEG study, with an a priori power analysis supporting the adequacy of the sample. Second, the use of spectral parameterization to separate oscillatory alpha peak frequency from the aperiodic component of the spectrum is methodologically rigorous and timely, as this distinction is increasingly recognized as important to avoid confounds in oscillatory activity estimation and the measurement of neural noise/excitatory-inhibitory balance (i.e., the aperiodic component of the power spectrum). The present work contributes to this emerging direction by relating both to behavioral indices within the same dataset. Third, the integration of perceptual thresholds, serial dependence, and subjective confidence within a unified framework provides a richer account of temporal perception than studies focusing on a single measure. In particular, the demonstration that resting alpha frequency predicts integration thresholds and that the aperiodic exponent relates to variability of the psychometric function is broadly consistent with the authors' central claims.

      Weaknesses:

      (1) At the same time, several aspects of the interpretation require caution. One conceptual issue concerns the interpretation of the psychometric slope parameter as an index of "temporal precision." The manuscript consistently equates steeper slopes with higher perceptual precision or lower internal noise. However, the slope of a binary psychometric function does not uniquely index sensory temporal resolution. It reflects the steepness of the transition between response categories and can arise from multiple sources, including variability in sensory encoding, instability of decision criteria, lapse rates, or other decisional processes. Even in the literature cited by the authors, slope is often described more generally as reflecting perceptual variability or sensory and/or decision noise rather than a pure measure of perceptual precision. An abrupt transition from "fused" to "segregated" responses, therefore, does not necessarily imply finer temporal resolution at the sensory level; it may instead reflect more consistent categorization or reduced decisional variability. The present data convincingly demonstrate relationships between spectral measures and the steepness of behavioral transitions, but they do not by themselves establish that this steepness reflects perceptual temporal precision rather than broader sources of behavioral variability.

      (2) A related concern involves the causal language used to describe the relationship between neural measures and behavior. The EEG metrics are derived from resting-state recordings and therefore reflect stable, trait-like individual differences. Nonetheless, the Discussion sometimes adopts mechanistic phrasing suggesting that slower alpha rhythms or flatter spectra lead the brain to compensate by weighting prior information more heavily, or that neural noise is being "regulated." Such formulations imply within-task adaptive processes that are not directly measured. The results demonstrate robust between-participant associations, but further research is needed to establish whether individuals regulate neural noise or adjust prior weighting dynamically.

      (3) Another point that merits clarification concerns the control analyses. The authors appropriately use spectral parameterization to dissociate oscillatory alpha peak frequency from the aperiodic component in the main analyses; however, their subsequent control analyses examining other frequency bands appear to rely on conventional band-power measures. Because band power can be influenced by the aperiodic background, null effects in other bands are difficult to interpret without similarly accounting for aperiodic structure.

      (4) In addition, the temporal structure of the stimulus stream introduces an interpretational nuance. Varying the duration of each Gabor in a continuous alternation produces quasi-periodic stimulation rates, and several of these ISIs fall within the alpha frequency range. Rhythmic visual stimulation at alpha-range frequencies is known to produce strong stimulus-locked responses and can interact with intrinsic alpha rhythms in a frequency-dependent manner (Keitel et al., 2019; Gulbinaite et al., 2017). Although the present study does not record EEG during task performance and therefore cannot directly assess stimulus-driven steady-state responses, this aspect of the design complicates a purely intrinsic sampling interpretation. The observed relationship between resting alpha frequency and integration thresholds may reflect intrinsic sampling speed, but it could also be influenced by how closely an individual's alpha rhythm aligns with alpha-range temporal structure in the stimulus.

      Conclusion:

      Despite these limitations, the study achieves many of its primary aims. The sustained-stream paradigm reliably elicits graded temporal integration behavior and robust serial dependence effects. Individual alpha frequency is convincingly associated with integration thresholds, and the aperiodic exponent relates to behavioral variability measures. These findings support the broader conclusion that temporal perception reflects an interaction between rhythmic neural dynamics and the background spectral structure of ongoing activity. The work is likely to have a meaningful impact for researchers studying perceptual timing, perceptual history, individual differences in brain rhythms, and the functional role of aperiodic neural activity.

      References:

      Keitel, C., Keitel, A., Benwell, C. S., Daube, C., Thut, G., & Gross, J. (2019). Stimulus-driven brain rhythms within the alpha band: The attentional-modulation conundrum. Journal of Neuroscience, 39(16), 3119-3129.

      Gulbinaite, R., Van Viegen, T., Wieling, M., Cohen, M. X., & VanRullen, R. (2017). Individual alpha peak frequency predicts 10 Hz flicker effects on selective attention. Journal of Neuroscience, 37(42), 10173-10184.

    5. Author Response:

      (1) Clarification of the distinction between resting-state trait measures and ongoing neural dynamics

      All the Reviewers commented that this study provides a useful characterization of the relationship between trait-based resting-state neural dynamics and behavioral measures. At the same time, we agree that including ongoing EEG dynamics during task performance would have added important complementary information. In particular, task-related EEG would allow a more direct characterization of the relationship between ongoing neural activity and behavioral indices at the single trial level, thereby helping to clarify the role of ongoing neural dynamics in evidence accumulation and perceptual decision-making. It would also enable testing how pre-stimulus alpha oscillations and aperiodic activity dynamically influence temporal integration, serial dependence, and confidence on a trial-by-trial basis.

      However, we would like to emphasize that the primary aim of the present study was to investigate trait-level resting-state neural dynamics, which are known to be relatively stable and consistent within individuals, such as individual alpha frequency (e.g., Grandy et al., 2013; Wiesman & Wilson, 2019; Gray & Emmanouil, 2020) and aperiodic neural dynamics (Demuru and Fraschini, 2020; Pathania et al., 2021; Euler et al., 2024), and to examine whether these stable neural characteristics predict behavioral measures indexing temporal perception. Accordingly, the present study was designed to address how stable individual differences in resting-state neural dynamics shape temporal performance, rather than within-task neural fluctuations during the temporal task. We agree that combining resting-state and task-related EEG would be a valuable direction for future work, but this lies beyond the scope of the current dataset, as EEG was not recorded during task performance. Furthermore, we agree with the Reviewers that some of the wording in the Discussion can be clarified to emphasize the trait-level, rather than trial-level, nature of the task and potential interpretations.

      Additionally, we agree that the relationship between eyes-open (EO) and eyes-closed (EC) resting-state EEG, and their differential associations with behavior, warrants further discussion. In our data, EO resting-state activity emerged as a stronger predictor of behavioral performance than EC. Conceptually, resting-state EO and EC should not be considered interchangeable measures of the same underlying neural activity, but rather as related yet distinct brain states, with overlapping neural generators expressed under different state constraints. EC is typically associated with stronger posterior alpha activity and a more internally oriented mode, whereas EO reflects a more visually engaged and vigilant state, closer to the conditions under which perceptual judgments are formed. This may explain why, in our findings, brain–behavior associations are more evident in EO, consistent with the greater similarity between the EO condition and the task context. In this sense, EO may emphasize exteroceptive processing and visual readiness, whereas EC reflects a more internally oriented configuration. This difference in functional weighting could account for the stronger behavioral correlations observed in EO in the present study. The distinction between these resting states has been emphasized in previous EEG and neuroimaging work showing differences in power, topography, and large-scale network organization (e.g., Marx et al., 2004). Additionally, these state-related differences may reflect physiological changes related to sensory processing (El Boustani et al., 2009) and arousal (Lendner et al., 2020). Accordingly, the present dissociation may arise because EO provides a resting-state measure that is more proximal to the sensory and excitability conditions engaged during task performance (for similar findings, see also Deodato and Melcher, 2024). However, we agree with the reviewers that further clarification of these state-related differences is warranted. In the revised manuscript, we will (i) expand the Discussion to more clearly articulate the conceptual distinction between EO and EC and their expected links to perceptual and confidence measures, (ii) systematically describe EO–EC differences across all EEG measures analyzed, and (iii) quantify the relationship between EO and EC indices to directly assess the extent to which they share trait-like variance across individuals.

      In the revised manuscript, we will clarify these points by adjusting the text, strengthening the conceptual framing, and expanding the Discussion, including a more detailed outline of future research directions.

      (2) Functional interpretation of psychometric measures

      The Reviewers raised an important point regarding the interpretation of the psychometric parameters investigated in our study. In particular, we agree that the slope of a binary psychometric function does not provide a direct measure of sensory temporal resolution or perceptual sensitivity, and that our original wording may have overstated this interpretation. Rather, the slope reflects the steepness of the transition between response categories and indexes overall behavioural variability, which can arise from multiple sources, including variability in sensory encoding, decision criteria, and occasional response errors (e.g., Wichmann and Hill 2001; Prins 2012).

      We therefore agree that interpreting steeper slopes as necessarily reflecting “temporal precision” may be overly specific, and that there are other possible interpretations. In the revised manuscript, we will adopt more cautious terminology and describe the slope more generally as indexing behavioral variability in the transition between perceptual reports, which may reflect a combination of sensory and decisional factors. Importantly, our results demonstrate robust relationships between neural measures and the consistency or sharpness of perceptual categorization, rather than uniquely isolating sensory temporal resolution. While, in standard psychophysical frameworks, the slope is related to internal variability in the sensory representation, this relationship depends on model assumptions and does not uniquely isolate sensory precision (e.g., Prins, 2016). Following the reviewers’ suggestion, we will also refine our psychometric modeling by incorporating a lapse parameter. We agree with the Reviewer that accounting for occasional stimulus-independent errors (e.g., lapses) can improve parameter estimation and prevent biases in slope and threshold estimates when lapse rates are implicitly fixed to zero (Wichmann & Hill, 2001). In the revised manuscript, we will therefore (i) clarify the terminology used to describe psychometric parameters and (ii) report additional analyses including lapse rates.

      In addition, we agree that complementary modeling approaches could help disentangle perceptual and decisional contributions to the observed effects by providing access to latent parameters of perceptual decision-making. For example, within a signal detection framework, one could test whether EEG measures relate to perceptual sensitivity versus decision criterion, while sequential sampling models such as the diffusion model (e.g., Ratcliff and McKoon, 2008) could assess whether neural measures are associated with parameters such as drift rate, decision boundary, starting bias, or trial-to-trial variability. However, several characteristics of the present paradigm limit the direct applicability of these approaches. First, the task relies on a continuous manipulation of sensory evidence across stimulus durations (ISIs), and behavioral responses are summarized through psychometric functions rather than modeled at the single-trial level. As a result, the current framework does not provide direct access to trial-by-trial latent decision variables required by these models. Second, reaction times were not collected, which constrains the application of sequential sampling models that rely on joint modeling of accuracy and response times. Finally, while the task involves categorical judgments (integration vs. segregation), it does not include explicit signal-absent or catch trials, which can help constrain sensitivity and criterion estimates within classical signal detection formulations. Despite these limitations, we agree that these approaches could still provide useful insights. In the revised manuscript, we will explore whether alternative modeling approaches (e.g., signal detection-based metrics or Bayesian psychometric modeling) can help further characterize the contributions of perceptual sensitivity, decision criterion, and response variability to our behavioral measures. While these analyses will necessarily remain exploratory given the structure of the current dataset, they may provide initial insights into whether the observed effects reflect perceptual or decisional dynamics. A more definitive dissociation, however, is beyond the scope of the present study and will be an important direction for future work.

      (3) Control analyses and robustness of EEG–behavior relationships

      The Reviewers raised interesting points regarding the interpretation of our control analyses and the potential influence of stimulus structure on the observed EEG–behavior relationships. We agree that these aspects require clarification and additional analyses to strengthen the robustness of our findings.

      First, regarding the control analyses across frequency bands, we acknowledge that while our main analyses appropriately dissociate oscillatory and aperiodic components using spectral parameterization, the control analyses were based on conventional band-power measures. As correctly noted by the reviewers, band-limited power estimates can be influenced by the aperiodic background, which complicates the interpretation of null effects in the other frequency bands. In the revised manuscript, we will address this issue by extending our spectral parameterization approach to these control analyses. Specifically, we will recompute band-specific measures after removing the aperiodic component, allowing a clearer comparison across frequency bands and a more robust assessment of the specificity of alpha-related effects. Preliminary analyses suggest that these updated results are likely to be consistent with our initial findings, thereby reinforcing the robustness of the reported effects.

      Another important point raised by the reviewers concerns the temporal structure of the stimulus stream. We agree that the continuous alternation of Gabor stimuli at varying durations introduces quasi-periodic stimulation rates that may induce entrainment of neural oscillations. Notably, some inter-stimulus intervals correspond to frequencies within the alpha range, which raises the possibility that the observed relationship between resting alpha frequency and integration thresholds may not solely reflect intrinsic sampling speed, but could also be influenced by the degree of alignment between an individual’s alpha rhythm and the temporal structure of the stimulus. As highlighted in prior work (e.g., Gulbinaite et al., 2017; Keitel et al., 2019; Gallina et al., 2023; Duecker et al., 2024), rhythmic stimulation in the alpha range can interact with intrinsic alpha oscillations and modulate both neural and perceptual processing. Although our study does not include EEG recordings during task performance and therefore cannot directly assess stimulus-locked responses or neural entrainment, we agree that this factor should be explicitly considered in the interpretation of our findings. To address this point, in the revised manuscript we will perform additional control analyses to assess the robustness of the observed relationships while accounting for potential rhythmic stimulation confounds. Specifically, we will explore whether the strength of behavioral effects and their relationship with EEG measures depends on the alignment between each participant’s individual alpha frequency and the effective stimulation rate induced by the stimulus presentation. In addition, we will test whether the association between resting-state alpha frequency and behavioral measures is disproportionately driven by stimulus durations corresponding to alpha-range temporal frequencies. These analyses will help determine whether the observed effects primarily reflect intrinsic sampling properties or are modulated by resonance-like interactions between endogenous rhythms and stimulus timing. We will also address all additional recommendations raised by the reviewers in the revised manuscript.

      References

      Demuru, M., & Fraschini, M. (2020). EEG fingerprinting: Subject-specific signature based on the aperiodic component of power spectrum. Computers in Biology and Medicine, 120, 103748.

      Deodato, M., & Melcher, D. (2024). Correlations between visual temporal resolution and individual alpha peak frequency: Evidence that internal and measurement noise drive null findings. Journal of Cognitive Neuroscience, 36(4), 590-601.

      Duecker, K., Doelling, K. B., Breska, A., Coffey, E. B., Sivarao, D. V., & Zoefel, B. (2024). Challenges and Approaches in the Study of Neural Entrainment. Journal of Neuroscience, 44(40).

      El Boustani, S., Marre, O., Béhuret, S., Baudot, P., Yger, P., Bal, T., ... & Frégnac, Y. (2009). Network-state modulation of power-law frequency-scaling in visual cortical neurons. PLoS computational biology, 5(9), e1000519.

      Euler, M. J., Vehar, J. V., Guevara, J. E., Geiger, A. R., Deboeck, P. R., & Lohse, K. R. (2024). Associations between the resting EEG aperiodic slope and broad domains of cognitive ability. Psychophysiology, 61(6), e14543.

      Gallina, J., Marsicano, G., Romei, V., & Bertini, C. (2023). Electrophysiological and Behavioral Effects of Alpha-Band Sensory Entrainment: Neural Mechanisms and Clinical Applications. Biomedicines, 11(5), 1399.

      Grandy, T. H., Werkle‐Bergner, M., Chicherio, C., Schmiedek, F., Lövdén, M., & Lindenberger, U. (2013). Peak individual alpha frequency qualifies as a stable neurophysiological trait marker in healthy younger and older adults. Psychophysiology, 50(6), 570-582.

      Gray, M. J., & Emmanouil, T. A. (2020). Individual alpha frequency increases during a task but is unchanged by alpha‐band flicker. Psychophysiology, 57(2), e13480.

      Gulbinaite, R., Van Viegen, T., Wieling, M., Cohen, M. X., & VanRullen, R. (2017). Individual alpha peak frequency predicts 10 Hz flicker effects on selective attention. Journal of Neuroscience, 37(42), 10173-10184.

      Keitel, C., Keitel, A., Benwell, C. S., Daube, C., Thut, G., & Gross, J. (2019). Stimulus-driven brain rhythms within the alpha band: The attentional-modulation conundrum. Journal of Neuroscience, 39(16), 3119-3129.

      Lendner, J. D., Helfrich, R. F., Mander, B. A., Romundstad, L., Lin, J. J., Walker, M. P., ... & Knight, R. T. (2020). An electrophysiological marker of arousal level in humans. elife, 9, e55092.

      Marx, E., Deutschländer, A., Stephan, T., Dieterich, M., Wiesmann, M., & Brandt, T. (2004). Eyes open and eyes closed as rest conditions: impact on brain activation patterns. Neuroimage, 21(4), 1818-1824.

      Pathania, A., Euler, M. J., Clark, M., Cowan, R. L., Duff, K., & Lohse, K. R. (2022). Resting EEG spectral slopes are associated with age-related differences in information processing speed. Biological Psychology, 168, 108261.

      Prins, N. (2012). The psychometric function: The lapse rate revisited. Journal of Vision, 12(6), 25-25.

      Prins, N. (2016). Psychophysics: a practical introduction. Academic Press.

      Ratcliff, R., & McKoon, G. (2008). The diffusion decision model: theory and data for two-choice decision tasks. Neural computation, 20(4), 873-922.

      Wichmann, F. A., & Hill, N. J. (2001). The psychometric function: I. Fitting, sampling, and goodness of fit. Perception & psychophysics, 63(8), 1293-1313.

      Wiesman, A. I., & Wilson, T. W. (2019). Alpha frequency entrainment reduces the effect of visual distractors. Journal of cognitive neuroscience, 31(9), 1392-1403.

    1. eLife Assessment

      This important study presents convincing evidence that uncovers a novel signaling axis impacting the post-mating response in females of the brown planthopper. The findings open several avenues for testing the molecular and neurobiological mechanisms of mating behavior in insects, and in the revised version the authors provide further evidence supporting their conclusions.

    2. Reviewer #2 (Public review):

      Summary:

      The work presented by Zhang and coauthors in this manuscript presents the study of the neuropeptide corazonin in modulating the post-mating response of the brown planthopper, with further validation in Drosophila melanogaster. To obtain their results, the authors used several different techniques that orthogonally demonstrate the involvement of corazonin signalling in regulating the female post-mating response in these species.

      They first injected synthetic corazonin peptide into female brown planthoppers, showing altered mating receptivity in virgin females and a higher number of laid eggs after mating. The role of corazonin in controlling these post-mating traits has been further validated by knocking down the expression of the corazonin gene by RNA interference and through CRISPR-Cas9 mutagenesis of the gene. Further proof of the importance of corazonin signaling in regulating the female post-mating response has been achieved by knocking down the expression or mutagenizing the gene coding for the corazonin receptor.

      Similar results have been obtained in the fruit fly Drosophila melanogaster, suggesting that corazonin signaling is involved in controlling the female post-mating response in multiple insect species.

      The study of the signalling pathways controlling the female post-mating response in insects other than Drosophila is scarce, and this limits the ability of biologists to draw conclusions about the evolution of the post-mating response in female insects. This is particularly relevant in the context of understanding how sexual conflict might work at the molecular and genetic levels, and how, ultimately, speciation might occur at this level. Furthermore, the study of the post-mating response could have practical implications, as it can lead to the development of control techniques, such as sterilization agents.

      The study, therefore, expands the knowledge of one of the signalling pathways that control the female post-mating response, the corazonin neuropeptide. This pathway is involved in controlling the post-mating response in both Nilaparvata lugens (the brown planthopper) and Drosophila melanogaster, suggesting its involvement in multiple insect species.

      The study uses multiple molecular approaches to convincingly demonstrate that corazonin controls the female post-mating response. The data supporting the main claim of the manuscript are solid and convincing.

    3. Author Response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This important study presents convincing evidence that uncovers a novel signaling axis impacting the post-mating response in females of the brown planthopper. The findings open several avenues for testing the molecular and neurobiological mechanisms of mating behavior in insects, although broad concerns remain about the relevance of some claims.

      Thank you very much for your letter and the insightful, valuable comments from the reviewers on our manuscript. These suggestions have been instrumental in strengthening the quality and clarity of our work. We have carefully addressed each concern, performed additional experiments, revised the relevant sections thoroughly, and made extensive refinements to the Discussion to clarify future research directions. Below is our detailed point-by-point response.

      Public Reviews:

      Reviewer #1 (Public review):

      In this work, Zhang et al, through a series of well-designed experiments, present a comprehensive study exploring the roles of the neuropeptide Corazonin (CRZ) and its receptor in controlling the female post-mating response (PMR) in the brown planthopper (BPH) Nilaparvata lugen and Drosophila melanogaster. Through a series of behavioural assays, micro-injections, gene knockdowns, Crispr/Cas gene editing, and immunostaining, the authors show that both CRZ and CrzR play a vital role in the female post-mating response, with impaired expression of either leading to quicker female remating and reduced ovulation in BPH. Notably, the authors find that this signaling is entirely endogenous in BPH females, with immunostaining of male accessory glands (MAGs) showing no evidence of CRZ expression. Further, the authors demonstrate that while CRZ is not expressed in the MAGs, BPH males with Crz knocked out show transcriptional dysregulation of several seminal fluid proteins and functionally link this dysregulation to an impaired PMR in BPH. In relation, the authors also find that in CrzR mutants, the injection of neither MAG extracts nor maccessin peptide triggered the PMR in BPH females. Finally, the authors extend this study to D. melanogaster, albeit on a more limited scale, and show that CRZ plays a vital role in maintaining PMR in D. melanogaster females with impaired CRZ signaling, once again leading to quicker female remating and reduced ovulation. The authors must be commended for their expansive set of complementary experiments. The manuscript is also generally well written. Given the seemingly conserved nature of CRZ, this work is a significant addition to the literature, opening several avenues for testing the molecular and neurobiological mechanisms in which CRZ triggers the PMR.

      However, there are some broad concerns/comments I had with this manuscript. The authors provide clear evidence that CRZ signaling plays a major role in the PMR of D. melanogaster, however, they provide no evidence that CRZ signaling is endogenous, as they did not check for expression in the MAGs of D. melanogaster males. Additionally, while the authors show that manipulating Crz in males leads to dysregulated seminal fluid expression and impaired PMR in BPH, the authors also find that CRZ injection in males in and of itself impairs PMR in BPH. The authors do not really address what this seemingly contradictory result could mean. While a lot of the figures have replicate numbers, the authors do not factor in replicate as an effect into their models, which they ideally should do. Finally, while the discussion is generally well-written, it lacks a broader conclusion about the wider implications of this study and what future work building on this could look like.

      Thank you very much for your insightful and valuable comments on our manuscript. We have carefully addressed each of your concerns, revised the relevant sections thoroughly, and conducted additional experiments to further strengthen our conclusions. To better focus on the core finding of this study, the critical role of Crz/CrzR signaling in regulating the post-mating response (PMR) of female brown planthoppers (BPH), and to eliminate potential confusion associated with the male-related data, we have removed the experiments investigating CRZ function in males from the current version of the manuscript. These observations on male CRZ signaling will be explored in greater depth and presented as a standalone study in a separate manuscript in the future.

      Reviewer #2 (Public review):

      Summary:

      The work presented by Zhang and coauthors in this manuscript presents the study of the neuropeptide corazonin in modulating the post-mating response of the brown planthopper, with further validation in Drosophila melanogaster. To obtain their results, the authors used several different techniques that orthogonally demonstrate the involvement of corazonin signalling in regulating the female post-mating response in these species.

      They first injected synthetic corazonin peptide into female brown planthoppers, showing altered mating receptivity in virgin females and a higher number of eggs laid after mating. The role of corazonin in controlling these post-mating traits has been further validated by knocking down the expression of the corazonin gene by RNA interference and through CRISPR-Cas9 mutagenesis of the gene. Further proof of the importance of corazonin signalling in regulating the female post-mating response has been achieved by knocking down the expression or mutagenizing the gene coding for the corazonin receptor.

      Similar results have been obtained in the fruit fly Drosophila melanogaster, suggesting that corazonin signalling is involved in controlling the female post-mating response in multiple insect species.

      Notably, the authors also show that corazonin controls gene expression in the male accessory glands and that disruption of this pathway in males compromises their ability to elicit normal post-mating responses in their mates.

      Strengths:

      The study of the signalling pathways controlling the female post-mating response in insects other than Drosophila is scarce, and this limits the ability of biologists to draw conclusions about the evolution of the post-mating response in female insects. This is particularly relevant in the context of understanding how sexual conflict might work at the molecular and genetic levels, and how, ultimately, speciation might occur at this level. Furthermore, the study of the post-mating response could have practical implications, as it can lead to the development of control techniques, such as sterilization agents.

      The study, therefore, expands the knowledge of one of the signalling pathways that control the female post-mating response, the corazonin neuropeptide. This pathway is involved in controlling the post-mating response in both Nilaparvata lugens (the brown planthopper) and Drosophila melanogaster, suggesting its involvement in multiple insect species.

      The study uses multiple molecular approaches to convincingly demonstrate that corazonin controls the female post-mating response.

      Thank you very much for your valuable and insightful comments on our manuscript. We highly appreciate your recognition of the study’s value, including its focus on non-model insects, the evolutionary implications of corazonin signaling, and the rigorous use of multiple molecular techniques. We have carefully addressed your suggestions and revised the manuscript accordingly to enhance its clarity, accuracy, and depth. Below is our detailed response to your comments.

      Weaknesses:

      The data supporting the main claims of the manuscript are solid and convincing. The statistical analysis of some of the data might be improved, particularly by tailoring the analysis to the type of data that has been collected.

      Thank you for your valuable suggestion regarding statistical analysis. We fully agree that tailoring statistical methods to the specific type of data enhances the rigor and reliability of our findings.

      In response, we have comprehensively re-evaluated and revised the statistical analyses for all datasets in the manuscript:

      (1) For proportion-based data (e.g., female mating receptivity, re-mating rate), we replaced inappropriate tests (e.g., ANOVA) with chi-square tests for contingency tables, which are more suitable for comparing categorical variables.

      (2) For time-series data (e.g., receptivity at different time points post-injection), we adopted generalized linear models (GLM) with logit links followed by pairwise contrasts to address concerns of multiple testing, instead of hour-by-hour Mann-Whitney tests.

      (3) For continuous data (e.g., number of eggs laid, gene expression levels), we retained Student’s t-tests or one-way ANOVA after verifying normality, and used non-parametric tests (Mann-Whitney, Kruskal-Wallis) for non-normally distributed data.

      All revisions have been clearly described in the figure legends and Methods section, ensuring transparency and reproducibility. We believe these adjustments significantly improve the statistical robustness of our conclusions.

      In the case of the corazonin effect in females, all the data are coherent; in the case of CRISPR-Cas9-induced mutagenesis, the analysis of the behavioural trait in heterozygotes might have helped in understanding the haplosufficiency of the gene and would have further proved the authors' point.

      Thank you for this insightful suggestion. We fully agree that analyzing the behavioral traits of heterozygous mutants is crucial for understanding the haplosufficiency of the Crz and CrzR genes, and we regret overlooking this aspect in the initial submission.

      To address this gap, we have conducted additional behavioral assays using heterozygous Crz (+/ΔCrz) and CrzR (+/CrzR<sup>M</sup>) mutant females.

      (1) For re-mating receptivity: We found no significant differences in either re-mating rate or egg-laying output between +/ΔCrz females and wild-type females. By contrast, +/CrzR<sup>M</sup> females exhibited re-mating and oviposition phenotypes comparable to those of homozygous CrzR mutants, with no significant differences detected between these two genotypes.

      (2) These results indicate that the Crz loss-of-function phenotype is recessive, and that a single functional copy of Crz is sufficient to sustain a normal post-mating response (PMR), but the CrzR loss-of-function phenotype is dominant, and that a single functional copy of CrzR is insufficient to maintain a normal post-mating response.

      This supports our core conclusion that CRZ signaling is critical for mediating the female PMR, as even partial reduction of gene dosage impairs the response.

      The heterozygote data have been integrated into the revised manuscript, including updated figures (e.g., Figure 1J-K for Crz heterozygotes and Figure 3I-J for CrzR heterozygotes) and corresponding legends. We believe this addition strengthens the rigor of our genetic evidence and provides valuable insights into the gene dosage requirements for CRZ-mediated PMR regulation.

      Less consistency was achieved in males (Figure 5): the authors show that injection of CRZ and RNAi of crz, or mutant crz, has the same effect on male fitness. However, the CRZ injection should activate the pathway, and crz RNAi and mutant crz should inhibit the pathway, yet they have the same effect. A comment about this discrepancy would have improved the clarity of the manuscript, pointing to new points that need to be clarified and opening new scientific discussion.

      Thank you for highlighting this important discrepancy in the male-related CRZ signaling data. We fully acknowledge the inconsistency: CRZ injection (which was intended to activate the pathway) and Crz RNAi/mutagenesis (which was intended to inhibit the pathway) yielded similar effects on male fitness, and we regret not addressing this ambiguity in the initial submission.

      To resolve this confusion and refocus the current manuscript on its core objective—elucidating the role of endogenous CRZ/CrzR signaling in female post-mating response (PMR), we have removed all experiments, analyses, and discussions related to male CRZ function. This decision ensures that the manuscript maintains a clear, cohesive narrative centered on female reproductive physiology, as recommended by both reviewers and the editorial team.

      Regarding the observed discrepancy in males, we recognize its scientific significance and plan to investigate it thoroughly in a standalone follow-up study.

      Recommendations for the authors:

      Reviewing Editor Comments:

      The manuscript would be significantly strengthened by an explanation of the seemingly contradictory results obtained in males, where both CRZ injections and Crz silencing afford the same results. Additionally, Crz expression data in the MAGs of D. melanogaster males is necessary to support your conclusions of endogenous signaling in this species. Besides correcting several imprecisions and inconsistencies in the text and figures, to improve quality and accuracy, the abstract should be restructured and the discussion modified as recommended by reviewers.

      Thank you for your comprehensive letter and valuable guidance. We have carefully addressed all the points raised by the editorial team and reviewers, and the revised manuscript now incorporates substantial improvements to clarity, accuracy, and scientific rigor. Below is our detailed response to your specific requests:

      Contradictory Male-Related Results

      We fully acknowledge the importance of addressing the contradictory findings in male CRZ signaling, where both CRZ injection and Crz silencing/mutagenesis yielded similar effects on male fitness. To resolve this ambiguity and maintain the manuscript’s focus on its core objective, elucidating endogenous CRZ/CrzR signaling in the female post-mating response (PMR), we have removed all male-related experiments, analyses, and discussions from the revised manuscript. This decision ensures that the current work remains cohesive and centered on female reproductive physiology, as recommended by the reviewers.

      We recognize the scientific significance of the male-specific discrepancy and plan to investigate it in a standalone follow-up study in the near future.

      Crz expression data in D. melanogaster Male Accessory Glands (MAGs)

      To support our conclusion of endogenous CRZ signaling in D. melanogaster females, we have supplemented the manuscript with additional experiments verifying the absence of CRZ in male MAGs:

      (1) RT-PCR Analysis: We detected no Crz mRNA in dissected male MAGs, whereas Crz expression was confirmed in the male head (positive control).

      (2) Immunohistochemistry and GAL4 system: Using the GAL4–UAS system (Crz-Gal4/UAS-mCD8-GFP) to label CRZ-producing neurons, combined with anti-CRZ antibody staining, we observed no CRZ-specific signal in male MAGs.

      These results demonstrate that D. melanogaster male MAGs neither synthesize nor contain CRZ peptide, confirming that CRZ acts as an endogenous female signaling factor (rather than a male-transferred seminal fluid component) in this species. The new data are included in Figure 5H-I and described in the Results and Methods sections.

      Correction of Imprecisions and Inconsistencies

      We have systematically revised the manuscript to address text and figure inaccuracies:

      Text Revisions: Corrected typos (e.g., Line 854), standardized species names (replacing “Drosophila” with “D. melanogaster” throughout), removed redundant or inappropriate sentences, and refined terminology (e.g., replacing “expression” with “localization” for protein detection).

      Figure Corrections: Fixed inconsistent Y-axis labels and numerical ranges (e.g., aligning percentages/probabilities with appropriate scales), resolved color scheme confusion, standardized oviposition-related labels to “Per female egg numbers within 3 days,” and added details on sample sizes and replicates to all figure legends.

      Statistical Improvements: Re-evaluated statistical analyses for proportion-based datasets (applying chi-square tests for contingency tables) and time-series data (using generalized linear models to address multiple testing), with revised methods clearly described in the text and figure legends.

      Abstract Restructuring and Discussion Modification

      Abstract: We have restructured the abstract to group results thematically (rather than sequentially) for improved readability. The revised abstract emphasizes the core findings: CRZ/CrzR signaling is critical for female PMR in both N. lugens and D. melanogaster, acts endogenously in females, and is required for male seminal fluid factors to induce PMR. Male-related content has been removed since experimental data are deleted from the rest of the paper.

      Discussion: We have modified the discussion to include the evolutionary conservation of CRZ-mediated female PMR, the molecular and neurobiological implications of CRZ/CrzR signaling, and future research directions (e.g., dissecting downstream pathways in the female reproductive tract and brain). We have also reduced tangential content and clarified how our findings advance understanding of female endogenous signaling in PMR regulation. A new section was added at the end, which discusses outstanding questions related to CRZ and the PMR in both insect species.

      To both the above-mentioned sections and the Introduction we also added new text to emphasize that CRZ is a paralog of the vertebrate peptide gonadotropin-releasing hormone (GnRH), a hormone known to regulate reproduction in vertebrates (including humans), thus suggesting conservation of an ancient role in reproduction.

      All revisions in the manuscript are highlighted in red for easy reference. We believe these changes significantly strengthen the study’s focus, clarity, and scientific impact. Thank you again for your time and consideration.

      Reviewer #1 (Recommendations for the authors):

      (1) The abstract could benefit from some restructuring. Right now, it reads like a sequential reporting of the results, but clumping together results thematically would make it easier to read, in my opinion. Also, see above re: my concerns about no evidence for the signal being endogenous in D. melanogaster.

      Thank you for your constructive suggestions regarding the abstract and the evidence for endogenous CRZ signaling in D. melanogaster. We fully agree with your feedback and have addressed both points thoroughly in the revised manuscript:

      (1) Abstract Restructuring

      We have restructured the abstract to group results thematically, rather than sequentially, to enhance readability and highlight the core findings. The revised abstract now organizes key information into three cohesive sections:

      The context and significance of female post-mating response (PMR) regulation, emphasizing the gap in understanding endogenous female signaling pathways.

      The core findings across both study species (Nilaparvata lugens and D. melanogaster), including the critical role of CRZ/CrzR signaling in suppressing re-mating and promoting oviposition, and its requirement for male seminal fluid factors to induce a PMR.

      The conclusion regarding the evolutionary conservation of endogenous CRZ signaling in female PMR, reinforcing the study’s broader implications.

      We also added new text to emphasize that CRZ is a paralog of the vertebrate peptide gonadotropin-releasing hormone (GnRH), a hormone known to regulate reproduction in vertebrates (including humans), thus suggesting conservation of an ancient role in reproduction.

      This thematic structure eliminates the linear “result-by-result” narrative, making the abstract more concise and impactful while clearly communicating the study’s key contributions.

      (2) Evidence for Endogenous CRZ Signaling in female D. melanogaster

      To address your concern about the lack of evidence for endogenous signaling in female D. melanogaster, we have supplemented the manuscript with two sets of critical experiments confirming that CRZ is not derived from male accessory glands (MAGs) but acts endogenously in females:

      RT-PCR Analysis: We performed RT-PCR on dissected male MAGs, male heads (positive control), and female tissues. Results showed no detectable Crz mRNA in MAGs, confirming that males do not synthesize CRZ in this tissue.

      Immunohistochemical and Genetic Labeling: Using the GAL4–UAS system (Crz-Gal4/UAS-mCD8-GFP) to label Crz-expressing neurons, combined with anti-CRZ antibody labeling, we observed no crz/CRZ signal in male MAGs. This confirms that MAGs neither produce nor sequester mature CRZ peptide.

      These findings demonstrate that CRZ signaling in D. melanogaster females is endogenous, as the peptide cannot be transferred from males during copulation. The new data are presented in Figure 5H-I and described in the Results section, with corresponding methods detailed in the Methods section.

      The revised abstract integrates this new evidence to explicitly state the endogenous nature of CRZ signaling in both BPH and D. melanogaster females, aligning with the thematic structure and addressing your concerns comprehensively. We believe these changes significantly improve the clarity and rigor of the abstract and the manuscript overall.

      (2) The authors use Drosophila as a broad placeholder throughout the manuscript, while they are specifically referring to D. melanogaster in several places. I would go through the manuscript and switch with the appropriate Drosophila species/species'.

      Thank you for pointing out this important detail regarding species-specific terminology. We fully agree with your suggestion to ensure accuracy and consistency in referencing the Drosophila species studied.

      We have systematically reviewed the entire manuscript, including the abstract, introduction, results, discussion, methods, and figure legends, and revised all instances where the general term “Drosophila” was used. All references now explicitly specify “D. melanogaster” to accurately reflect the species utilized in our experiments.

      (3) For the figures, I think the number of replicates is a distracting addition to the plot. This is still useful information, but could instead be added in as a line/table, in my opinion.

      Thank you very much for your suggestion. We have added the information on the number of replicates and sample sizes to the corresponding figure legends, which we hope improves clarity and readability.

      (4) There are typos in the y-axis label of all of the oviposition figures. A better re-wording would be "Per female egg numbers within 3 days".

      Thank you very much for your suggestion. Following your recommendation, we have now standardized the Y-axis label for all oviposition-related figures to “Number of eggs per female within 3 days.”

      (5) In Figure 1B and Figure 1 - Supplement 3a, since the comparisons are solely between control vs treatment, I would not join means across treatments that I am not comparing.

      To address this, we have revised Figure 1B and Figure 1—Supplement 3a by removing the connecting lines between group means. The updated figures now display independent mean ± SEM values for each dose (Figure 1B) and time point (Figure 1—Supplement 3a), with significance markers only applied to the control vs. treatment comparisons we actually tested. This revision eliminates any implied relationships between non-comparative groups and ensures the data visualization aligns with our statistical approach. We appreciate the reviewer’s suggestion, which has improved the clarity of the data presentation.

      (6) The authors mention courtship rate in lines 511, but from a look at the methods, this is not the courtship rate! This is a measure of the number of males engaging in any form of courtship. Also, in Figure 5 Supplement 2A, it appears that under 1% of males are courting. This seems extremely low. Do the authors mean percentages? In that case, I would reformat from 0 to 100/relabel the y-axis.

      Thank you for your observation and valuable feedback on this terminology and figure presentation issue. We fully acknowledge the inaccuracies and have addressed them comprehensively:

      (1) Correction of "Courtship Rate" Terminology

      We agree that the term “courtship rate” in Line 511 was incorrect, as our measurement reflects the proportion of males engaging in any form of courtship (not a rate per unit time). However, since we have removed all male-related data (including this section and associated figures) from the revised manuscript to focus on the core finding of female post-mating response (PMR), this terminology error has been eliminated entirely.

      (2) Revision of Figure 5 Supplement 2A

      Consistent with the removal of all male-related experiments, Figure 5 and its supplementary materials (including Supplement 2A) have been excluded from the revised manuscript. This ensures the current work remains cohesive and centered on female PMR, while also resolving the Y-axis labeling ambiguity you identified.

      We appreciate your careful attention to these details, which helps enhance the accuracy and clarity.

      (7) It appears Figure 5A, 5D, and 5G are mislabeled? Aren't all rematings with wild-type males?

      Thank you for identifying this labeling inconsistency. You are absolutely correct, all re-mating assays in the original figures involved wild-type males, and the mislabeling was an oversight.

      However, we have removed Figure 5 (and its associated subpanels A, D, G) entirely from the revised manuscript, as part of our decision to exclude all male-related data.

      (8) I am not sure I understand why a 30-minute post-injection threshold was chosen and what this table means. Could the authors elaborate on the methodology here on how they quantified premature ejaculation?

      Thank you for your question regarding the 30-minute post-injection observation window and the methodology for quantifying premature ejaculation.

      While we have removed all male-related data (including the corresponding table and premature ejaculation analyses) from the revised manuscript to focus on our core finding, this is no longer included in the manuscript.

      (9) Line 29 - "distensible" seems an odd choice of word here.

      We have revised Line 29 and removed “distensible”. “Peptide injection and knockdown of CRZ expression by RNAi or CRISPR/Cas9-mediated mutagenesis demonstrate that CRZ signaling suppresses mating receptivity”.

      (10) Line 57 - delete "a" from "a post-mating response" and "A PMR" because the authors are referring to a very specific suite of post-mating behaviours.

      We have revised Line 57 (and other relevant instances throughout the manuscript) to delete the article "a" from these phrases.

      (11) Line 352, delete a from "and in a significantly".

      We have revised Line 356 to remove the extraneous "a", correcting the phrase to "and in significantly".

      Reviewer #2 (Recommendations for the authors):

      The work presented in this manuscript presents the study of the neuropeptide corazonin in modulating the post-mating response of the brown planthopper, with further validation in Drosophila melanogaster. To obtain their results, the authors used several different techniques, including dsRNA injection to induce RNA interference and CRISPR-CAS9-mediated site-specific mutagenesis. The experimental design is appropriate; the results are solid and support the conclusion of the manuscript. Overall, the merit of the manuscript is to present compelling evidence that the female post-mating response is mediated by corazonin, at least in the analysed species. There are multiple reports in multiple insect species, indeed, that male factors, particularly those secreted by male accessory glands, induce post-mating response in females, but the female pathways underlying this phenomenon are poorly understood.

      There are points the authors can consider to improve the manuscript quality.

      Thank you for your generous and insightful assessment of our manuscript. We deeply appreciate your recognition of the study’s strengths, including the appropriate experimental design, solid results, and meaningful contribution to understanding female endogenous pathways in post-mating response (PMR) regulation.

      We have carefully incorporated all your constructive suggestions (e.g., statistical analysis revisions, figure label standardization, text refinements) to further strengthen the manuscript’s rigor and clarity. By focusing on corazonin (CRZ/corazonin receptor (CrzR) signaling in female brown planthoppers (Nilaparvata lugens) and validating these findings in Drosophila melanogaster, we aim to provide a conserved model for female endogenous PMR regulation across insect species.

      Thank you again for your thoughtful and supportive feedback, which has been instrumental in refining our work. We believe the revised manuscript now more effectively communicates the significance of CRZ-mediated female signaling in bridging the gap between male-derived cues and PMR execution.

      (1) Line 20: "optimal offspring". This is not a zoological parameter. One can use "optimal fitness".

      We have revised Line 20 to replace "optimal offspring" with "optimal fitness" as recommended.

      (2) Line 36-40: I think that the main message of the manuscript is the involvement of the corazonin pathway in controlling the female post-mating response. The involvement of corazonin in the male reproduction is also of note, but out of topic (in my opinion). The male corazonin is not transferred during mating from males to females, and the involvement of corazonin in controlling the gene expression in the MAGs is of note, but it is poorly related to the effect of corazonin in the female. I am not suggesting removing these data from the paper; they are important. But I do not find them that important to include them in the abstract, also because it confounds the reader at first. A similar statement can be made for the discussion (lines 728-745): making this the first piece of data commented on takes the stage, but this is not the main take-home message of the paper.

      Thank you for this suggestion. We fully agree that including male-related CRZ data in the abstract and leading the discussion with these results distracted from the primary focus and risked confounding readers. In fact, we also removed the entire section on the role of CRZ in males. We have addressed this issue comprehensively in the revised manuscript as follows:

      (1) Abstract Revision

      We have completely removed all content related to male CRZ function from the revised abstract. The updated abstract now exclusively emphasizes the core findings:

      The requirement of CRZ/CrzR signaling for mediating key female PMR traits (suppression of remating, promotion of oviposition) in both Nilaparvata lugens and Drosophila melanogaster;

      Experimental evidence confirming that CRZ acts as an endogenous female signaling factor (not a male-transferred molecule);

      The evolutionary conservation of CRZ-mediated female PMR regulation across the two insect species.

      We also added a comment on the evolutionary conservation of CRZ and GnRH signaling in reproduction.

      (2) Discussion Section Restructuring

      We have restructured the Discussion to prioritize the core message of female PMR regulation:

      Lead paragraph adjustment: Lines 728–745 (originally focusing on male CRZ and MAG gene expression) have been deleted.

      Revised opening focus: The Discussion now only contain a synthesis of our key findings on female CRZ signaling, including its molecular mechanisms, cross-species conservation, and implications for understanding endogenous female pathways downstream of male seminal fluid cues.

      We appreciate your suggestions for the narrative focus of the manuscript.

      (3) Line 49: "Reproductive behavior is critical for population sustenance and survival of the species": I find this intro a little teleological evolutionary speaking, and I am not totally sure that this has ever been demonstrated as a concept. I would skip it, simply saying "Reproductive behavior in insects is influenced...".

      Following your suggestion, we have revised Line 49 to streamline the introduction and avoid “teleological language”. The updated sentence now reads: "Reproductive behavior in insects is influenced by a complex interplay of neural, hormonal, and environmental factors."

      (4) Line 58: "A PMR has been documented across diverse insect taxa, including Drosophila melanogaster, Anopheles gambiae, Aedes aegypti, and the brown planthopper (BPH), Nilaparvata lugens". There are many other insect species for which PMR has been shown: crickets, fruit flies, grasshoppers, etc. Therefore, I would say "for example" to underline that it is not a complete list. Being an incomplete list, I suggest that the authors pay attention to the cited literature: the literature cited in the case of Anopheles gambiae demonstrates the synthesis of hormones in the MAGs, but it has nothing to do with PMR; there is nothing cited for Aedes aegypti, even if the authors named the species.

      Thank you for this constructive feedback on the framing of PMR studies across insect taxa and the accuracy of our cited literature. We fully agree with your suggestions and have addressed these issues comprehensively in the revised manuscript:

      (1) Revision of the Sentence Structure

      We have modified Line 58 to explicitly indicate that the listed species are examples rather than a complete inventory of insects with documented PMR. The revised sentence reads:

      "The PMR has been documented across diverse insect taxa, for example, Drosophila melanogasterAnopheles gambiaeAedes aegypti, crickets (Gryllodes sigillatus), grasshoppers (Dichromorpha viridis), and the brown planthopper (BPH)Nilaparvata lugens"

      (2) Correction of Literature Citations

      We have thoroughly reviewed the citations associated with the listed species to ensure they directly support the role of PMR:

      For Anopheles gambiae: We have replaced the previously cited study (focused on MAG hormone synthesis) with two relevant references that explicitly characterize PMR traits—including mating-induced oviposition stimulation and remating suppression—in this mosquito species.

      For Aedes aegypti: We have added two newly published studies that document key PMR phenotypes (e.g., post-mating refractoriness and altered feeding behavior) and their underlying molecular mechanisms in this species.

      For crickets (Gryllodes sigillatus): We added a newly published study that documents PMR phenotypes in Gryllodes sigillatus.

      We have also verified that the citations for D. melanogaster and N. lugens remain directly relevant to PMR regulation, with no adjustments needed.

      All revised citations are properly formatted and integrated into the text, with corresponding updates to the reference list.

      (5) Line 111-132: I find this redundant: it is a long summary of the methods and the results. I do not think it is needed here, but I think the authors should point to the main message of their data.

      Thank you for pointing out the redundancy of Lines 111–132. We fully agree that this section, disrupted the flow of the introduction of our study.

      To address this, we have completely removed Lines 111–132 from the revised manuscript. In place of this redundant content, we have added a concise, focused paragraph that emphasizes the central hypothesis and key objective of our work: specifically, to identify the endogenous female signaling pathways that mediate the post-mating response (PMR) downstream of male-derived cues, and to validate the conserved role of corazonin (CRZ) signaling in this process across Nilaparvata lugens and Drosophila melanogaster.

      (6) Line 156: This sentence is not needed here.

      We have deleted the sentence in Line 156 from the revised manuscript.

      (7) Figure 1E, J supplementary 3A: The label of the Y axis is the percentage of the mating females (expected 0-100%), but the numbers show the fraction (0-1). On the contrary, in Figure 1 Supplement 4, the label says "probability of survival" and the probability goes from 0 to 1, while the number of the axis goes from 0 to 100 (percentage).

      Thank you very much for pointing out these inconsistencies. We have carefully reviewed all Y-axis labels and corresponding numerical ranges throughout the manuscript and corrected the mismatched axes.

      (8) Figure1B, C, F, K supp 2, 3A: I found this use of colours confounding. Why did the authors use the light blue for sCRZ, but the mean and SE are shown in pink, which is the colour for CRZ? Furthermore, it is not reported anywhere how many individuals have been used per replicate. There is the total number of insects, the number of replicates, but there is no indication about the minimum number of insects per replicate in this and many other subsequent experiments.

      Thank you for identifying these critical inconsistencies in figure color coding and missing details on sample allocation per replicate, and we greatly appreciate your meticulous review of our data presentation.

      We have addressed these issues in the revised manuscript as follows:

      (1) Standardization of Color Coding

      We apologize for the confusing color mismatch between group labels and data points in Figure 1B, C, F, K, and Supplements 2 and 3A. We have unified the color scheme across some figures to ensure consistency:

      The sCRZ (control) group is now consistently represented by light blue for both labels and mean ± SE data points.

      The CRZ (treatment) group is now consistently represented by pink for both labels and mean ± SE data points.

      For Figures 1C, F, K and Supplementary Figure 2, we were concerned that the mean and s.e.m. bars might be visually obscured by the data points. To improve their visibility, we therefore used the opposite color to display the mean and s.e.m.

      All figure legends have been cross-checked and updated to reflect this standardized color coding.

      (2) Addition of Sample Size per Replicate

      We acknowledge that the lack of information on the minimum number of insects per replicate was a key gap in our experimental reporting. We have supplemented this critical detail in this way:

      Figure Legends: For Figure 1B, C, F, K, and Supplements 2 and 3A (as well as all subsequent experiments), we have added explicit statements specifying the minimum number of insects per replicate, alongside the total sample size and number of replicates (e.g., “n = 3 replicates, with a minimum of 10 females per replicate; total N = 35 females”). All revised figures and their corresponding legends have been integrated into the updated manuscript, and we have cross-checked all other figures to avoid similar issues.

      (9) Figure 1C, F, K, Supplementary Figure 3B: Y axis labels - "Eggs numbers of per female...". I suggest changing it to "Number of eggs per female...".

      We have revised the Y-axis labels for Figure 1C, F, K and Supplementary Figure 3B to Number of eggs per female...” as recommended. Additionally, we cross-checked all other oviposition-related figures in the manuscript to ensure uniform use of this standardized label, eliminating any inconsistent phrasing across the dataset.

      (10) Legend Figure 1B: Mann Whitney test. How did the authors perform the test? Hour by hour? I am not sure this is the best way to analyse the data, because it is a case of multiple testing. Probably a linear model or a glm might be a better fit.

      Thank you very much for pointing out this issue. In Figure 1B, each concentration group was analyzed using data from independent individuals, and therefore the comparisons do not involve repeated measures across time; for this reason, we consider the Mann–Whitney test appropriate for this dataset. For Figure 1—Supplement 3A, however, our original analysis compared treatment and control groups hour by hour, which indeed raises concerns regarding multiple testing. Following your suggestion, we have removed the potentially misleading connecting lines and reanalyzed the dataset using a generalized linear model (GLM). The updated figure and revised legend have been included in the revised manuscript.

      (11) Legend Figure 1E: ANOVA test. These are proportions, not continuous variables of the samples. Tests for proportions might be a better fit (chi-square, etc.).

      To address this issue, we have re-analyzed the proportional data in Figure 1E using Pearson’s chi-square test of independence, which directly evaluates the association between treatment group (sCRZ vs. CRZ) and the binary mating status (mated vs. unmated) of females. This test is statistically robust for proportional data and avoids the assumptions of normality and homogeneity of variances required for ANOVA.

      (12) Knockout experiments: I agree with the authors that the data are strong enough to sustain the conclusions. However, is the corazonin knockout haplosufficient or is it recessive? What is the behaviour of the heterozygotes?

      Thank you for this insightful question regarding the genetic basis of the corazonin (CRZ) knockout phenotype.

      To address your query, we have supplemented experiments with additional phenotypic analyses of heterozygous CRZ knockout females (+/ΔCrz), and we clarify the genetic nature of the knockout as follows:

      (1) Genetic basis of the CRZ knockout:

      The CRZ knockout line was generated via CRISPR-Cas9-mediated deletion of the Crz coding region, resulting in a recessive loss-of-function mutation. Homozygous knockout females (ΔCrz) exhibited the full phenotypic suite reported in the manuscript (impaired post-mating suppression of remating, reduced oviposition rate, and disrupted CRZ signaling in the reproductive tract).

      (2) Phenotype of heterozygous females:

      Behavioral and physiological assays of +/ΔCrz heterozygotes revealed no significant differences compared to wild-type (+/ΔCrz) females across all measured post-mating traits. Specifically:

      Remating rates of +/ΔCrz females were indistinguishable from wild-type controls at 48 h post-mating.

      Oviposition output of +/ΔCrz females matched wild-type levels over a 3-day assay period.

      (3) Updates to the manuscript:

      We have added these heterozygote data as figure1J and K in the revised manuscript, with corresponding descriptions in the Results and Methods sections. We have also explicitly noted the recessive nature of the Crz mutation in the Genetic Manipulation subsection, ensuring clarity for readers.

      These results confirm that the Crz knockout phenotype is fully recessive and that one functional copy of the Crz gene is sufficient to maintain normal post-mating responses—supporting our conclusion that CRZ signaling is required for mediating female PMR.

      We thank you again for raising this important point, which has strengthened the genetic rigor of our study.

      (13) Figure 1, Supplementary 1: I do not understand why the authors point out the fact that these are Protostomia. These are all Arthropoda, there is not a single species outside this Phylum. Caerostris darvini should be Caerostris darwini.

      Thank you for this feedback regarding Figure 1 and Supplementary Figure 1. We fully agree and have addressed these issues in the revised manuscript:

      (1) Removal of the "Protostomia" designation

      We have deleted all references to Protostomia from the figure legends and associated text.

      (2) Spelling correction of Caerostris darwini

      We apologize for the typographical error in the species epithet. We have corrected the misspelling Caerostris darvini to the taxonomically accurate Caerostris darwini (Darwin's bark spider) across all instances in Figure 1, Supplementary Figure 1, and their corresponding legends. We have also cross-checked all other species names in the manuscript to eliminate similar typographical errors.

      (14) Line 299: CRZ expression: I found this confounding, given that the authors were talking about the expression of the gene. I would use the term localization, referring to the protein/peptide (is it what the authors were pointing at?).

      To resolve this ambiguity, we have revised Line 299 to replace CRZ expression with CRZ peptide localization, which accurately describes the experimental focus (immunofluorescence staining and confocal imaging of the CRZ protein). We have also cross-checked the entire manuscript to standardize this terminology:

      We use Crz gene expression exclusively when referring to transcriptional analyses (e.g., qRT-PCR results).

      We use CRZ peptide localization when describing the spatial distribution of the protein (e.g., immunostaining assays).

      (15) Figure 2C: The expression is relative to...? I would make it explicit on the axis.

      Thank you for this helpful comment. We apologize that the normalization reference was not sufficiently clear in the original version. In the revised manuscript, we now explicitly state that RT–qPCR data were first normalized to the reference genes Actin and 18SrRNA, and then expressed relative to the mean expression level of the tissue showing the highest Crz expression, which was set to 1. We have clarified this information in the figure legend and the Methods section.

      We have revised Figure 2C as follows:

      Updated the Y-axis label to explicitly state the reference: “Relative Crz gene expression”.

      Added a supplementary note in the figure legend to confirm that relative expression values were calculated using the 2<sup>⁻ΔΔCt</sup> method, with the reference gene serving as the internal control for normalization.

      Additionally, we have cross-checked all other qRT-PCR-related figures in the manuscript to ensure that the reference for relative expression is clearly indicated on the corresponding axes, standardizing this key detail across all gene expression datasets.

      (16) Figures 3B, E, I, L, M, N: Percentage and proportions, as in Figure 1; furthermore, please provide the minimum number of individuals per replicate. Furthermore, as in Figure 1, the data are proportions, and I would use statistical tests that are studied for this kind of data.

      Thank you for this helpful suggestion. We have reviewed and corrected the Y-axis labels and corresponding numerical ranges in these figures, and we have added the number of replicates and the minimum number of individuals per replicate to the figure legends. In addition, following your recommendation, we have reanalyzed these proportion data using chi-square tests for contingency tables.

      (17) Figure 3: As in Figure 1, it would be interesting to know which is the behaviour of the heterozygotes.

      Thank you for suggesting to complement the data in Figure 3 with heterozygote phenotypic analyses.

      To address this, we have conducted additional behavioral and physiological assays of heterozygous CrzR knockout females (+/CrzR<sup>M</sup>) and integrated these data into the revised Figure 3 and its legend:

      Phenotypic characterization of heterozygotes: Across all traits measured in Figure 3 (e.g., remating rate and oviposition efficiency,), +/CrzR<sup>M</sup> females exhibited no significant differences compared to homozygotes.

      This confirms that the CrzR knockout phenotype is dominant and that one functional copy of the CrzR gene can’t to maintain normal post-mating response (PMR).

      Manuscript updates:

      We added heterozygote data in Figure 3I and J. Accordingly, we updated the Results text to reflect the revised panel labeling.

      We supplemented the figure legend with statistical comparisons between heterozygotes and wild-type groups (using chi-square tests for proportional data).

      We included a brief description of heterozygote phenotypes in the Results section to contextualize the genetic basis of the CrzR-mediated PMR regulation.

      (18) Figure 3 Supplement 1: Can the authors indicate which model for maximum likelihood they chose? Did they perform a pre-test to assess which substitution model was the best for their data?

      Thank you for this critical question regarding the model selection for maximum likelihood (ML) phylogenetic analysis in Figure 3 Supplement 1. We fully agree that specifying the substitution model and validation process is essential for ensuring the reproducibility and rigor of phylogenetic inferences.

      To address this, we have supplemented the manuscript with detailed information on the model selection and validation steps, as follows:

      (1) Substitution model selection

      Prior to constructing the ML tree, we performed a model selection pre-test using the ModelFinder tool integrated in IQ-TREE 2, which evaluates the fit of candidate nucleotide substitution models to the CrzR amino sequence alignment via the Bayesian Information Criterion (BIC). The model selection procedure identified the LG+G model as the best-fit substitution model for our dataset. This model uses the Le and Gascuel (LG) amino-acid substitution matrix and incorporates a gamma-distributed rate variation among sites (G) to account for among-site rate heterogeneity.

      (2) Manuscript updates

      We have added this detailed model selection process and the final LG + G model specification to the legend of Figure 3 Supplement 1.

      We have also included information on bootstrap validation (10000 ultrafast bootstrap replicates) to support the node support values reported in the phylogenetic tree.

      (19) Figure 4 Supplement 1: I would be explicit about what it is relative to (which gene).

      Thank you for this helpful comment, In the revised manuscript, we now explicitly state that RT–qPCR data were first normalized to the reference gene Actin, and then expressed relative to the mean expression level of the tissue showing the highest CrzR expression, which was set to 1. This normalization strategy provides a robust and biologically representative reference. We have clarified this information in the figure legend and the Methods section.

      (20) Line 518 and Line 525 and Figure 5: The authors show that injection of CRZ and RNAi of crz or mutant crz has the same effect on male fitness. How do the authors explain this contradiction? The CRZ injection should activate the pathway, and crz RNAi and mutant crz should inhibit the pathway, but nevertheless, they have the same effect. I would probably test the expression of some of the genes whose expression is altered in crz mutant males (next paragraph) to see if an altered CRZ signalling pathway (both ways) might affect gene expression in the MAGs in the same way.

      Thank you for raising this important point. As explained above, we have removed all data related to CRZ function in male BPHs from the current version.

      (21) Figure 5, Figure 7: As in Figures 1 and 3, please pay attention to the percentages and proportions and the statistical tests.

      Thank you for pointing out these issues. We have carefully reviewed and corrected the percentage/proportion labeling in the relevant figures, including the Y-axis descriptions and numerical ranges, as well as revised the corresponding figure legends. In addition, we have reanalyzed the data using statistical tests appropriate for proportion data. All corresponding revisions have been incorporated into the updated manuscript.

      (22) Line 728-745: As already stated for the abstract, the male effect of crz is, to me, a side product, and I am not sure the male crz signalling has something to do with the female crz signalling. It is interesting, nobody showed that CRZ affects expression in the MAGs, but this is not the main message of the paper, and it confuses the reader. I would reduce the discussion about this aspect and move it to the end, but this is my own take.

      We have removed all data related to CRZ function in males for the reasons outlined above.

      (23) Material and methods/results: as a general suggestion, I would be explicit about the timing of receptivity inhibition in the species. I've seen the authors have established this in precedent work, and I would refer to that work and make the reader aware of how the receptivity works in the species (i.e., that it is not permanent and lasts for a few days after first mating). This allows a better understanding of the experimental design.

      Thank you for this valuable and constructive suggestion. We fully agree that explicitly describing the timing of receptivity inhibition in Nilaparvata lugens, and linking it to our earlier work, will strengthen the rigor and clarity of the manuscript.

      To address this, we have revised the Materials and Methods and Results sections as follows:

      (1) Materials and Methods (Experimental Design subsection)

      We have added a dedicated paragraph that explicitly defines the temporal dynamics of post-mating receptivity inhibition in N. lugens, with direct reference to our prior work[1]. The text clarifies:

      “In N. lugens, mating induces a transient suppression of female receptivity that is not permanent. Females typically start regain remating willingness 72 h after the first mating, as documented in our previous study[1]. This temporal window guided the design of our remating assays, in which females were paired with naive males at 48 h post-initial mating to capture both the suppressed and recovered phases of receptivity.”

      (2) Results (Post-mating Receptivity section)

      We have incorporated a brief contextual sentence at the start of the section to reinforce this key species-specific trait, ensuring that readers connect our assay timings to the temporal dynamics of receptivity in N. lugens.

      These revisions ensure that the rationale behind our experimental timing is transparent and well-supported, allowing readers to fully grasp how our assays were tailored to the biological characteristics of N. lugens.

      (24) Line 854: There is a typo "CRZ peptide. virgin female", the dot should be a comma.

      We have revised Line 854 to correct the punctuation: the dot has been replaced with a comma, resulting in the phrasing "CRZ peptide, virgin female". In addition, we have changed the wording in this sentence to ensure scientific rigor and to avoid colloquial expressions.

      (1) Zhang, Y.J., Zhang, N., Bu, R.T., Nässel, D.R., Gao, C.F., and Wu, S.F. (2025). A novel male accessory gland peptide reduces female post-mating receptivity in the brown planthopper. Plos Genet 21, e1011699. 10.1371/journal.pgen.1011699.

    1. eLife Assessment

      This study addresses an important question about how large-scale brain networks interact, and specifically how the default mode network exchanges information with the sensory cortex. The analyses are sophisticated, but at present provide incomplete evidence for the claims made in the paper.

    2. Reviewer #1 (Public review):

      Summary:

      This paper leverages 7T fMRI data from the Natural Scenes Dataset to investigate whether retinotopic coding, the position-selective organization of visual response structures, spontaneous resting-state interactions between the Default Network (DN) and the Dorsal Attention Network (dATN). Using individualized network parcellations and population receptive field (pRF) modeling, the authors show that DN voxels can be split into two subpopulations based on their response to visual stimulation: those with position-specific positive BOLD responses (+pRFs) and those with position-specific negative BOLD responses (-pRFs). Critically, these subpopulations relate differently to the dATN during rest: -pRFs are anticorrelated with the dATN, +pRFs are positively correlated, and non-retinotopic DN voxels show no coupling. The anticorrelation (and positive correlation) is enhanced when DN and dATN voxels share visual field preferences. An event-triggered analysis suggests that retinotopic coding shapes both "top-down" (DN-initiated) and "bottom-up" (dATN-initiated) spontaneous activity transients, supporting the claim that the retinotopic scaffold is intrinsic to the DN. These findings challenge the prevailing view of global DN-dATN antagonism and suggest retinotopic coding as an organizing principle for cross-network communication.

      Strengths:

      The central finding that what looks like network-level independence between DN and dATN decomposes into structured, bivalent interactions organized by voxel-level visual field preferences is a compelling demonstration that macro-scale network descriptions can hide meaningful substructure. The logic of the analysis is clean: pRF properties are estimated from retinotopic mapping data and then used to predict resting-state coupling in completely independent scanning sessions. This cross-session, cross-modality design rules out many circularity concerns.

      The use of individualized multi-session hierarchical Bayesian parcellation (Kong et al.) to define DN and dATN boundaries within each subject is the right methodological choice for this question. Network boundaries in posterior cortex, where DN and dATN interdigitate most closely, vary considerably across individuals, and group-average approaches would introduce exactly the kind of misassignment that would most confound the result.

      The matched-vs-random pRF analysis is well-controlled. The authors demonstrate that cortical distance between matched and randomly-matched dATN pRFs does not differ, effectively ruling out spatial proximity on the cortical surface as a confound. tSNR controls further show that signal quality differences do not drive the effect.

      The event-triggered analysis (Figure 3) is creative and adds genuine value. Showing that retinotopically-specific coupling persists during DN-initiated activity transients, not only dATN-initiated ones, is the key piece of evidence for the claim that the code is intrinsic to the DN rather than passively inherited through bottom-up visual drive.

      The result is observed consistently across all individual participants, which provides strong evidence for the robustness of the qualitative pattern despite the small sample size inherent to densely-sampled designs.

      Weaknesses

      (1) The nature of negative pRFs requires more scrutiny

      The entire interpretive framework depends on treating negative pRFs in the DN as genuine position-selective neural responses (suppression). However, negative BOLD signals are well known to arise from non-neural sources, specifically, vascular stealing (where activation in nearby tissue diverts blood from adjacent voxels) and macrovascular draining vein effects that produce spatially displaced signal inversions. These concerns are amplified at 7T, where T2*-weighted GE-EPI carries substantial macrovascular weighting. The DN and dATN interdigitate extensively in the posterior cortex, often within millimeters. A negative pRF in a DN voxel adjacent to a positive dATN voxel could, in principle, reflect the hemodynamic shadow of its neighbor rather than an independent neural response.

      The spatial dispersion control (matched vs. random pRFs have similar cortical distribution) is valuable but addresses long-range confounds, not *local* hemodynamic crosstalk. The reliability of sign and center position across runs is reassuring but does not exclude a vascular origin, as vascular architecture is itself stable across sessions. I would encourage the authors to test whether the matched-vs-random effect survives exclusion of voxels near large pial vessels (identifiable from T2* contrast or the venograms available in the NSD). These analyses would not be dispositive, but they would meaningfully strengthen the neural interpretation.

      (2) Amount of retinotopic mapping data and choice of pRF pipeline

      The NSD includes 6 runs of retinotopic mapping (~5 minutes each; 3 bar-aperture, 3 wedge/ring). The authors use only the 3 bar-aperture runs (~15 minutes total per subject) and fit their own pRFs using AFNI's 3dNLfim procedure, rather than using the pRF estimates provided as part of the NSD release (which were fitted using the analyzePRF toolbox with all 6 runs).

      Fifteen minutes of bar data is quite limited for reliable voxel-wise pRF estimation, especially in regions far from the early visual cortex, where signal-to-noise is inherently lower. Standard recommendations for robust pRF mapping in higher-order regions generally suggest substantially more data. The variance-explained threshold is close to the noise floor by design, meaning that a non-trivial number of the "retinotopic" DN voxels may be poorly estimated. Given that the core analyses depend on both the sign and the center position of these pRFs, the limited data is a significant concern.

      The authors do not explain why they chose to re-fit pRFs rather than use the NSD-provided estimates. If the motivation was methodological (e.g., the NSD pRF pipeline does not readily yield signed amplitude, or the bar-only fits were judged more appropriate for detecting negative responses), this should be made explicit. If the NSD-provided pRFs can reproduce the key findings, this would substantially increase confidence in the results. If they cannot, that divergence itself would be important to understand. I would ask the authors to address this choice and, if feasible, to report whether the core results replicate using the NSD-provided pRF estimates and/or whether using all 6 runs of retinotopy data changes the findings.

      (3) pRF model adequacy for the Default Network

      The isotropic Gaussian pRF model was developed for and validated in early and mid-level visual cortex, where it captures the dominant spatial selectivity of neuronal populations. In DN voxels where the model explains comparatively little variance, it is less clear that the model is capturing the right quantity. Specifically, the negative pRFs could conceivably be described by a model with a dominant suppressive surround (e.g., a difference-of-Gaussians model), in which what appears as a "negative pRF" in the standard model is actually the surround component of a center-surround mechanism whose center is poorly resolved. This distinction matters: a genuine inverted code (negative center response) implies a qualitatively different computation than inherited surround suppression from nearby visual cortex.

      The authors should consider discussing why the standard model is sufficient for the questions asked, or ideally, testing whether the sign distinction survives under alternative pRF model specifications.

      (4) Interpreting resting-state transients as top-down vs. bottom-up

      The event-triggered analysis labels high-amplitude DN pRF activations as "top-down events" and dATN activations as "bottom-up events." This is a reasonable inference given experience-sampling studies showing that rest involves alternation between internal and external attention, but it remains an inference. Without concurrent experience sampling, eye-tracking, or physiological monitoring, we cannot establish that a spontaneous DN transient reflects memory retrieval or internally-directed thought rather than a global arousal fluctuation. Similarly, dATN transients during rest could reflect covert shifts of spatial attention to remembered or imagined locations rather than bottom-up processing per se. I would ask the authors to soften this framing or to discuss what additional data would be needed to validate the top-down/bottom-up attribution.

      (5) The "retinotopic code" vs. "visual field bias" distinction

      The paper uses the language of a "retinotopic code" throughout and correctly distinguishes this from a "retinotopic map," noting that DN voxels do not form a continuous topographic representation on the cortical surface. This distinction deserves greater emphasis. In vision science, retinotopic maps carry computational significance through their topographic continuity and relationship to cortical wiring. A distributed collection of voxels with coarse visual field preferences but no cortical topography is a fundamentally different organizational feature. Recent reviews have drawn an explicit distinction between *retinotopic maps* and *visual field biases* (Groen, Dekker, Knapen & Silson, TiCS 2022), and the present findings may be more accurately characterized as the latter. Perhaps the authors think that the distinction is merely a signal-to-noise distinction, in which case I would invite them to clearly speak to this interpretation. In any case, this is not a criticism of the findings themselves, but clarity on this point would prevent conflation of two different organizational principles and would help position the work for both the vision and network neuroscience communities.

    3. Reviewer #2 (Public review):

      Summary:

      Using a public dataset of retinotopic mapping and resting-state data, the authors find that the default mode network has voxels that respond (positively or negatively) to visual stimulation at specific retinotopic positions, and that resting-state activity in these voxels is correlated with activity in more traditional sensory voxels with the same visual-location preference. The retinotopic specificity is bidirectional, such that high activity in default mode voxels drives activity only in voxels with matching receptive fields in sensory cortex, and vice versa. These findings are at odds with traditional views of the default mode network as having abstract (non-retinotopic) representations and competing (rather than cooperating) with external sensory representations.

      Strengths:

      This study continues an intriguing line of research about how default mode regions interact with the sensory cortex. Demonstrating that there are structured interactions between these regions at rest, and that these interactions are in fact organized according to retinotopic location (as opposed to traditional views of representational format in the default mode network), provides a new framework for thinking about large-scale internal and external brain networks. The authors make use of a well-powered public dataset that allows for precise estimates of pRFs and individual-specific resting-state networks, and develop a number of interesting analyses that characterize the relationships between DN and dATN voxels. The findings are exciting and could have a major impact on future studies in cognitive neuroimaging.

      The authors mention that these findings could shed light on internal/external interactions such as "anticipatory saccades or memory-guided attention," which is true, though I would argue that constructing DN representations of external stimuli is in fact even more fundamental than these specific cases (e.g., see Barnett and Bellana, 2025, "Situation models and the default mode network"). The "highways" identified in this study could play a vital role in real-world perceptual processes that are constantly translating external input into internal mental models.

      Weaknesses:

      (1) The criterion used for defining voxels as retinotopic seems very liberal. The authors show that only 5% of voxels have R^2>0.14 in a null analysis, and therefore define voxels with R^2>0.14 as retinotopic. Although all the networks in 1C show voxel distributions that differ from the null, the number of false positives above R^2>0.14 seems problematic, especially for the DN positive pRFs (red distribution) and to a lesser extent the DN negative pRFs (blue distribution). From visual inspection of the plot, the false discovery rate (fraction of voxels labeled as retinotopic that are false positives) looks like it would be greater than 50% for the DN-positive pRFs. The authors do show that the positive pRF voxels have above-chance consistency across runs, again providing evidence that there are true positive voxels in this set, but perhaps a stricter criterion (such as having consistent negative fits across runs) would provide more targeted identification of the DN voxels with true retinotopic sensitivity.

      (2) The claim that "opponency at rest between the DN and dATN appears to be driven by the subset of DN voxels with negative retinotopic tuning" is not well supported. The fraction of DN voxels with negative pRFs is small: 9.42% of DN voxels have pRFs, and 58.77% are negative, so about 6% of DN voxels have negative pRFs. The fact that any DN voxels have negative pRFs is notable, but the authors do not provide evidence that these 6% are driving the overall behavior of the DN. They do show (e.g., in Figure 2B) that negative and positive pRFs have opposing influences, but the overall correlation with dATN does not look similar to the negative pRF connectivity. I'm also unsure whether "opponency" is a reasonable description for two networks that are "independent (i.e., not correlated)" in this analysis.

      (3) The event-triggered analysis is effective at testing the bidirectional relationship between DN and dATN, with high activity in either network triggering a response in the other network. However, it would be helpful to show more validation that these "events" are meaningful windows of time to study. First, is 13 TRs a typical length of time that activity is elevated during one of these events? Second, the top-down and bottom-up terminology is perhaps too loaded and not well-justified; if the negative pRFs in the DN reflect a meaningful coding system, then couldn't low (rather than high) activity indicate a top-down event?

      (4) The framing of this paper relative to the authors' past week, such as Steel et al. 2024 ("A retinotopic code structures the interaction between perception and memory systems"), could be improved. The existence of negative pRFs in the DN and a functional relationship between these pRFs and the sensory pRFs have already been described in prior work. My understanding of the primary novelty here is that this paper examines resting-state data, showing that there are widespread spontaneous interactions between broad internal and external networks, but this distinction is not made explicit in the Introduction.

      (5) The definition of the default mode (DN) in this study aligns with past research, but the definition of the dorsal attention network (dATN) seems at odds with standard terminology. For example, the authors cite Fox et al. 2006, which depicts the dATN as including regions such as IPS, FEF, SMA, and MT+. Here, however, the "dATN" seems to be primarily lateral and ventral visual cortex (e.g., Figure S5). The exact location of these sensory pRFs is not critical to the authors' claims, but this labeling seems incorrect, and the motivation for defining/selecting the sensory network in this way is not described.

    4. Reviewer #3 (Public review):

      Summary:

      This paper addresses an important question (the relationship between DN and dATN, and the role of retinotopic coding) and uses a set of novel analyses.

      Strengths:

      Important question, novel analytical approaches (pRF-informed functional connectivity analysis).

      Weaknesses:

      Some of the key claims are not fully supported by the data presented. There is also a concern about over-interpretation of the results. Key issues:

      (1) The authors claim that retinotopic coding scaffolds the interaction between DMN and dATN. However, retinotopically tuned voxels account for a mere 9% of DMN voxels. So this appears to be a major overstatement. For instance, the statement that "these findings would position retinotopy as a unifying framework for brain-wide information processing" is not justified given the presented data.

      (2) Given that positive pRF voxels in DMN positively correlate with dATN voxels and negative pRF voxels in DMN negatively correlate with dATN voxels, there is a concern that these results could be contributed to by imprecise brain network parcellations. E.g., could some of the positive pRF voxels in DMN be erroneously assigned to DMN and actually belong to one of the other task-positive networks? There is insufficient validation of network parcellation to put this worry to rest, especially since it depends on ICA, which has a degree of arbitrariness built in.

      (3) The claim that retinotopic coding is intrinsic to the DN network is not supported by rigorous analysis and results. The analysis here has many arbitrary factors, including: the threshold of the 99th percentile of resting-state distribution; the designation of DN as "top-down" and dATN as "bottom-up"; the definition of "anti-matched" voxels instead of using randomly selected voxels; and the statistics being paired between matched and anti-matched voxels instead of using comparisons to baseline. Overall, I do not think that the result supports the conclusion that retinotopic coding in DN is intrinsic instead of being bottom-up-driven, given the very high threshold (99%) used and the fact that many other networks could also send bottom-up input to DN. Furthermore, the idea that bottom-up inputs only occur when the dATN (or any other RSN)'s spontaneous BOLD activity is above a certain threshold is a huge and unvalidated assumption.

    1. eLife Assessment

      This important study addresses a discrepancy between population-level growth laws and single-cell correlations. It shows, for flagellar and synthetic genes in E. coli, that while gene expression of certain genes reduces population-average growth, expression levels positively correlate with growth at the single-cell level. The measurements are mostly convincing, and the proposed mechanism-inheritance of growth factors such as ribosomes during asymmetric division- explains this observation. The theoretical analysis would benefit from clearer explanations and robustness checks.

    2. Reviewer #1 (Public review):

      Summary:

      Garcia-Alcala, Kratz and Cluzel investigate to what extent our understanding of bacterial physiology in bulk experiments can be applied to single-cell observations. They find that intrinsic noise may be powerful enough to even inverse the trends found in the bulk. The authors hypothesize that the asymmetric distribution of ribosomes to daughter cells during cell division plays the dominant role in the intrinsic noise and is able to generate the observed phenomenon. They do not show it directly, but the data and its agreement with the model are sufficient to support this claim.

      Strengths:

      The experimental part is convincing: the positive correlation between the elongation rate and promoter activity of unnecessary protein is clear, as well as the negative correlation between the mean values while changing the promoter strength. This was demonstrated in both rich and poor media. The causality between the growth rate and the promoter activity was shown using the negative lag time of the cross-correlation function. A simple, reasonable model accounts well for the data. This paper demonstrates an interesting phenomenon and provides a plausible theory for it, advancing our understanding of bacterial physiology on the single-cell level.

      Weaknesses:

      (1) Mean-reversion timescales were assumed to be longer than the simulation time and much longer than the cell cycle time. It is not clear whether the results are robust in case mean-reversion timescales become of the order of the cell-cycle or smaller. If not, is there an argument for such practically infinite reversion timescales?

      (2) It is not easy to understand the simulation part unless one reads Ref. [14]. k(t) is assumed Equation (1) from Reference [14]? Is it crucial that the ribosome noise appears only at the division? The ribosome noise strength \sigma_R=0.06 - is it lower or higher than the naively expected binomial division? Also, a more intuitive explanation of the Simpson paradox would help the reader.

      (3) It would be useful for the reader to see the raw data and not only the filtered one to appreciate the measurement noise level.

      (4) Negative lag time of the cross-correlation function is visible, but consider adding a statistical test for it.

      (5) Can you make similar cross-correlation plots using the model? Can you infer by using it, whether the data agrees better with the assumption that ribosomal noise appears only at division or continuous fluctuations during the cell cycle?

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by Garcia-Alcala et al. reports an interesting paradox: the cost of gene expression slows the population-average growth rate, whereas at the single-cell level, expression levels from these genes positively correlate with the growth rate. The effect is observed in the expression of flagellar genes and a gene under a synthetic promoter in E. coli. The findings are explained by the inheritance of growth factors, including ribosomes, during asymmetric division.

      Strengths:

      (1) The manuscript adds strength to an emerging body of literature showing that the population-level bacterial growth laws do not match correlations based on single-cell data. The evidence presented here is more striking than in previous works (such as Pavlou et al., Nat. Commun. 2025), as the trends in population-level data and single-cell data are reversed.

      (2) A relatively simple model correctly explains the trends in the data.

      Weaknesses:

      (1) It is not clear whether flagellar proteins are expressed proportionally to the reporter signal. Furthermore, it is questionable if E. coli bacteria in the mother machine channels are flagellated. If they are, they could potentially swim out of the channels, which is not the case when they do not carry the MotA E98K mutation. The authors should provide some evidence that E. coli expresses the actual filament proteins in the channels.

      (2) It is unclear what fraction of the total proteome mVenus represents in different measurements. Some quantification is needed (for example, using the Coomassie staining). Using f_U as high as 14.4% in simulations is questionable.

      (3) The data from the MC4100 strain does not directly match the trends of MG1655. The justification for filtering out the low-frequency components of MC4100 is not particularly convincing. It appears unlikely that ribosomes or other growth factors partition significantly differently in the MC4100 strain than in the MG1655 strain. Further discussion and a plot similar to Figure 1 (Left) for this strain are warranted.

      (4) The model needs to be described in more detail. A closed set of equations that has been simulated must be presented, along with all values of the model parameters and their sources. The authors should consider depositing their code on GitHub or another publicly accessible repository.

    1. eLife Assessment

      This important study measures single-unit activity in the middle temporal area (MT) of awake-behaving monkeys to test the idea that sensory adaptation contributes to flexible evidence accumulation during decision-making. Solid evidence is provided, showing that adaptation to different temporal contexts shapes both perceptual judgements and neural responses, but analyses aimed at establishing a direct link between them are less persuasive. This work has the potential to be of interest to a broad range of researchers working on visual perception, plasticity, and decision making.

    2. Reviewer #1 (Public review):

      Summary:

      Effective decision-making in dynamic environments requires the brain to flexibly adjust how sensory evidence is accumulated over time, a process often modeled as an adaptive "leak." McGaughey and Gold propose that this flexibility is not solely a property of downstream integrators but is also supported by stimulus-specific sensory adaptation in the middle temporal area (MT). By recording single-unit activity in rhesus macaques during a motion direction-discrimination task, the authors found that more rapidly changing environments lead to reduced sensory encoding and discriminability in MT, which they argue accounts partially for a "leakier" integration. Furthermore, the study identifies pupil-linked arousal as a parallel, independent mechanism contributing to this adaptive process.

      Strengths:

      The study addresses an important question in cognitive neuroscience by exploring the neural substrates of perceptual flexibility. A major strength is the novel focus on how sensory adaptation, rather than just downstream integration, contributes to behavioral changes in dynamic environments. By shifting the perspective toward the encoding stage, the authors provide a more comprehensive account of how the brain manages evidence accumulation. This conceptual advance is supported by a rigorous experimental approach that combines human-like psychophysics with large-scale single-unit recordings in the middle temporal area (MT) and pupillometry.

      Weaknesses:

      (1) Alternative mechanisms for performance differences

      The authors assume that the difference in performance between the low-switch (LS) and high-switch (HS) frequency conditions is explained by a change in the "leakiness" of integration. However, several other mechanisms could potentially explain this effect:

      (i) Temporal Uncertainty: Integration might start later in the HS condition, leading to lower performance.

      (ii) Reduced Efficiency: Integration could be less efficient in the HS condition (i.e., lower signal-to-noise ratio) without a change in the leak parameter itself.

      (iii)Evidence Contamination: Motion information from the adapting stimulus in the HS condition may be integrated rather than ignored, which might be the case since the transition from the adapting to the test stimulus is not externally cued.

      To distinguish between these alternatives, I suggest two possible analyses. First, a formal model comparison could be performed, though I acknowledge this may be inconclusive in the absence of response-time data. Second, an analysis of motion energy kernels could be revealing; the leak hypothesis makes the specific prediction that for long test stimuli, early samples should contribute more to the choice in the LS condition than in the HS condition, relative to late samples.

      (2) Independence of neural and pupil-linked signals


      The authors take the lack of session-wise correlation between context-dependent contributions from neural and pupil terms as evidence that these two signals provide independent contributions to the behavioral effect. However, could this lack of correlation simply be a result of high variability or noise in these estimates? The data shown in Figure 7B suggests that measurements are very noisy, which might obscure a potential relationship.

    3. Reviewer #2 (Public review):

      McGaughey & Gold trained rhesus macaque monkeys to perform a motion-direction discrimination task in which a behaviorally irrelevant adapting stimulus with either fast or slow direction alternations preceded a variable-duration test stimulus, while simultaneously recording single-unit activity in area MT and pupil diameter. They report that adaptation to the more rapidly changing stimulus was associated with reduced behavioral sensitivity, attenuated test-evoked MT responses, and larger pupil-linked arousal signals. The authors interpret these behavioral changes as evidence for a more "leaky" evidence-accumulation process, and argue that this apparent leak is implemented in part through context-dependent sensory adaptation in MT and in part through arousal-related mechanisms. More broadly, they conclude that flexible evidence accumulation in dynamic environments arises from distributed adjustments across sensory encoding and neuromodulatory systems rather than solely from changes within a downstream accumulator. If correct, this interpretation has significant implications not only for our understanding of the neural mechanisms of perceptual decision-making but also for broader theories concerning the functional role of sensory adaptation.

      The conclusions of the paper are mostly well supported by the data. Evidence for robust adaptation-induced changes in sensory encoding, behavior, and pupil dynamics is convincing, but further clarification and refinement are needed to establish a clear mechanistic link between these effects and decision-making processes.

      Aspects of the behavioral analysis would benefit from a tighter connection between theoretical claims about evidence accumulation and the empirical features of the psychometric functions. For example, the rightward shifts observed across adapting conditions are interpreted as consistent with a reset of accumulation on switch trials, but similar patterns could also arise from failures to detect the test stimulus on a subset of trials, leading responses to default to the final adaptor direction. Likewise, changes in psychometric slope and asymptote are attributed to differences in evidence accumulation without explicit modelling or consideration of alternative explanations. Clarifying how specific features of the psychometric functions map onto distinct components of the decision process will strengthen the link between the theoretical framework and the behavioral data.

      A slight concern is the lack of a consistent analytical approach for relating behavioral changes to neural and pupil-linked measures. Different sections of the manuscript rely on different behavioral metrics-such as differences in accuracy within a selected stimulus-duration range (e.g., Figure 5C) or psychometric slope differences (Figure 6C) - without clear justification for these choices. The analytical approach likewise varies between simple correlational analyses (Figure 5C, Figure 6C), pseudo-experimental group comparisons (Figures 5D, E), and the inclusion of neural or pupil terms in the behavioral psychometric regression model (Figure 7B). While each metric and approach may be defensible in isolation, adopting a more consistent framework will help convince readers that the reported effects are robust and not contingent on the selective choice of metric or analysis.

    4. Reviewer #3 (Public review):

      Summary:

      Environments change over time; therefore, optimal decision-making ought to discount older observations of the environment in favor of newer ones in a manner consistent with the amount of temporal instability. Computational models of perceptual decision-making model this temporal discounting with a 'leak' parameter that determines the rate at which older information is discarded. In this study, McGaughey and Gold examine the neurophysiological mechanisms that could underlie adaptation to different degrees of temporal instability. They developed a novel variant of the well-established perceptual decision-making random-dot-motion paradigm, in which the stimulus being evaluated was preceded by an 'adapting' stimulus with either high or low temporal stability. When the test stimulus was preceded by the adapting stimulus with lower temporal stability, NHPs showed reduced psychometric slopes, indicative of increased temporal discounting ('leak'). While the NHPs performed this task, single-unit neural activity was recorded in area MT, along with pupillometric data. The authors use these neural and pupil datasets to investigate two potential sources of adaptive discounting under varying amounts of temporal instability: sensory adaptation (changes in instantaneous evidence encoding), and arousal-related changes in evidence accumulation. MT neurons respond differently to the test stimulus under conditions of high vs low temporal stability of the adapting stimulus - when the adapting stimulus is more stable, MT neurons have larger and more selective responses to the test stimulus. In addition, evoked pupil responses to the test stimulus were modulated by the adapting stimulus. Both the strength of the difference in MT responses across contexts and the difference in pupil diameter across contexts were correlated with context-dependent modulation of the monkeys' behavior over sessions. The paper concludes that both sources appear to independently contribute to adaptive evidence accumulation, likely operating at different processing stages in the brain.

      Strengths:

      (1) While computational models of perceptual decision-making have been very useful for explaining behavior and neural responses in decision-making areas, we are still in search of some of the neural mechanisms that could implement such models. Studies such as this one, which aim to identify neural correlates of simplified model parameters, are quite crucial.

      (2) Analysis is generally careful and well-executed.

      (3) Prompts some interesting follow-up questions that could be answered with simultaneous recordings and causal manipulations, as the authors state in the Discussion - e.g., which areas are affected by arousal-related neuromodulation correlated with evoked pupil size and how.

      Weaknesses:

      (1) The task design may not be optimal. While the amount of time the monkey is exposed to each motion direction during the adapting stimulus is matched, it's hard to know if the reduced MT responses to the test stimulus are truly due to the greater frequency of switches during the HSF adapting stimulus or because the monkeys have been exposed to more repetitions of the stimulus. It's increased sensory adaptation in either case, but it makes it problematic to interpret this as temporal context-dependent adaptation specifically. I think this could potentially be partially addressed by an analysis that is in the paper, but could potentially be emphasized/fleshed out more, specifically the results shown in Figure 4D that seem to show that most of the reduction in neural response for adapting units occurs between the first and second stimuli.

      (2) The pupillometric analysis seems to be an indirect way of assessing whether the accumulator itself might be modulated by temporal context, but the link could be made clearer. The authors show that context-dependent behavior is related to pupil size, which is related to arousal/neuromodulation, but it would be helpful to have some idea of what neural mechanisms underlying adaptive decision-making are actually impacted by this neuromodulation. Lacking neural data to address this question (e.g., from a brain region proposed to be involved in the accumulation process), at least more discussion of this would be helpful. Essentially, I'm unsure of how to interpret the pupil results: the argument that temporal context affects instantaneous evidence encoding in MT that then drives the accumulator is very clear, but I am a bit confused about what, mechanistically, I should think about the effect of neuromodulation doing.

    1. eLife Assessment

      The valuable study aims to differentiate between foveal and peripheral attentional mechanisms in visual and frontal brain regions in monkeys engaged in a free-gaze visual search task. The authors interpret differences in responses between target and nontarget conditions as feature-based attention; however, this may not be the correct interpretation. The authors do not provide enough information on how they distinguish foveal and peripheral RFs. Consequently, the study provides only incomplete evidence that does not support the authors' conclusions, and the significance of the findings is not strong.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript aims to differentiate between foveal and peripheral attentional mechanisms in visual and frontal brain regions in monkeys engaged in a free-gaze visual search task.

      Strengths:

      The manuscript is clearly written, the question is important, and the behavioral task is interesting.

      Weaknesses:

      I have two major concerns.

      (1) The authors interpret divergence in neural responses to target vs nontarget as attention. But it is not. The subject has to attend to both target and nontarget stimuli to determine the stimulus category and thereby decide on the next action. Thus, divergence between target and nontarget responses could reflect categorical discrimination, but I am not sure this can be interpreted as attentional modulation. While it may be tempting to suggest that finding a stimulus of a specific category is "feature attention", analogous to, e.g., attending to the red stimulus, I don't believe this is correct. For the former, the animals have to attend to a stimulus, and examine the stimulus to determine the stimulus category, unlike a simpler discrimination, which may pop out. Given this, I am unconvinced that the interpretations in this manuscript are valid.

      (2) Regarding the RF classification of foveal and peripheral RFs for IT and PFC, prior work suggests that neurons in IT cortex (especially AIT) and PFC have RFs that largely include the foveal visual field. So, it would be important to include figures that show the RFs of neurons classified as foveal versus peripheral for all three areas.

    3. Reviewer #2 (Public review):

      Summary:

      In natural visual behavior, such as when one is looking for a face in the crowd, the eyes are moved from site to site, seeking possible matching targets. This involves attention both to the current view at the center of vision (the foveal location) as well as to upcoming views via attention to targets in the periphery. While it has been established that attention generally enhances neuronal response (compared to simple visual activation) at the attended spatial location, this study provides solid evidence that attention during active visual search leads to neuronal response enhancement only when the eye moves towards targets that exhibit the desired feature and category. This study thus moves the field towards understanding the neural encoding of active vision.

      This study examines the neuronal basis of feature-selective attention during active, freely behaving visual search. Traditional electrophysiological studies on visual attention in monkeys commonly used an eye fixation with a covert attention paradigm, but have not sufficiently addressed the roles of both foveal and peripheral attention in play during natural looking behavior. Here, the authors present a novel paradigm in which, during eye-movement mediated search, neuronal receptive fields are recorded in multiple cortical areas (sensory V4, temporal, and prefrontal areas). In this manner, as the eye foveates, items in the array fall into foveal or non-foveal recorded sites. Thus, the experimental paradigm is elegant, offering the opportunity to make multiple types of comparisons: target/distractor, towards/away from fovea, and areal. Specifically, following a category cue (face, house, hand, flower), freely initiated saccades are made to locate a categorically matching 'target' in an array of distractors. Feature attention is assessed by comparing eye saccades made to targets vs to distractors. Spatial attention is assessed by comparing saccades made 'towards' vs 'away' from targets. Statistics are rigorous and nicely designed. The detailed association of simultaneously obtained eye movement sequences and neural parameters is well done. These are valuable data that will contribute to our understanding of attentional modulation in visual search.

      Strengths:

      The significance of these findings is fundamental. Decades of attention research in vision have been based on the paradigm of visual fixation and covert peripheral attention. However, increasingly, the field has moved towards understanding how the visual system works during active vision. Here, the authors use an active visual search paradigm and record from multiple areas (V4, IT, PFC). They find enhancement of attention both in the foveal and peripheral locations, and, furthermore, a high degree of feature and categorical specificity. This provides valuable data for the concept of a foveal-peripheral attentional window in natural vision. The controls (comparisons of neuronal response during looks to targets vs distractors, and looks towards and away from the target) and statistical rigor make these findings quite compelling.

      Weaknesses:

      While the study is generally quite strong, there are a few weaknesses to be addressed.

      (1) Little rationale is provided for recording in the selected areas, V4, IT, and PFC. Given the respective roles in sensory, object recognition, and goal-directed behavior, some rationale for this design should be offered, and commonalities/distinctions between these areas should be discussed.

      (2) Given the reliance of all analyses on saccadic behavior (towards target/distractor, towards/away from target), additional description and summaries of eye movement behavior during single trials and across trials should be provided.

      (3) The dependency of findings on top-down (categorical & feature-specific) task design should be discussed.

    4. Reviewer #3 (Public review):

      In this manuscript, the authors investigate the role of attention in foveal processing during a naturalistic task. They record neural activity from extrastriate visual areas V4 and inferotemporal cortex, as well as from the lateral prefrontal cortex, in macaques performing a free-gaze visual search task. In this task, animals searched for a face or house target among multiple complex stimuli, with no constraints on eye movements. Unlike classic studies of visual attention, which often rely on controlled fixation, this work examines neural activity in both foveal and peripheral receptive fields during naturalistic eye movements.

      The main question addressed by the authors is how feature-based attention is distributed and coordinated across foveal and peripheral visual fields during active search, and how this attentional processing influences saccade behavior. The authors show that foveal units in visual areas exhibit feature-based attentional enhancement, with stronger responses when a fixated stimulus is a target compared to when the same stimulus serves as a distractor. Peripheral units in visual and prefrontal areas show both feature-based and spatial attentional modulation, consistent with prior work. Finally, the authors show that attentional modulation depends primarily on stimulus category rather than response magnitude, with neurons showing similar enhancement for all images within the target category regardless of how strongly individual images drive the cell.

      There are several notable strengths of this paper, including:

      (1) Disentangling feature-based and spatial attention during naturalistic vision remains a central challenge. This paper tackles both simultaneously, parsing neural populations by object selectivity (face-selective, house-selective, non-selective) and RF position (foveal vs. peripheral).

      (2) The unconstrained search task (Figure 1A) moves beyond the dominant fixed-gaze, cued-attention designs (Zhou & Desimone, 2011) to study attention as it operates during natural behavior, with sequential fixations and voluntary saccades.

      (3) The scale of the multi-area recordings is a major strength and is well aligned with current trends in primate and human neuroscience toward large-scale, multi-area recordings. Simultaneous recordings from visual and prefrontal areas, comprising over 4,900 foveal units and more than 1,500 peripheral units, enable meaningful cross-area latency comparisons and area-specific analyses of attentional modulation. This study builds on the authors' previous analyses of this dataset by expanding the scope to show that feature-based attention generalizes across neuronal classes and operates on categorical identity rather than response magnitude.

      (4) The combination of simultaneous multi-area recordings and a rich behavioral paradigm provides a dataset that is well-suited for population decoding, cross-area interaction analyses, and trial-by-trial prediction of saccade choices, which could substantially deepen mechanistic understanding beyond the largely univariate comparisons presented here.

      While the data broadly support the paper's main conclusions, several issues limit the strength of the mechanistic interpretation and should be taken into consideration:

      (1) Receptive field size is not explicitly quantified and may confound foveal-peripheral comparisons. Units are classified as foveal or peripheral based on responsiveness to the cue versus the search array (Methods, p. 17), but the manuscript lacks essential information about receptive field sizes, eccentricities, and the number of search stimuli falling within each receptive field and related proper controls. This is critical because receptive fields in visual area V4 at foveal eccentricities are relatively small (Gattass et al., 1988; Desimone & Schein, 1987), whereas receptive fields in inferotemporal cortex can span several degrees to tens of degrees and often include the fovea (Op de Beeck & Vogels, 2000; DiCarlo & Maunsell, 2003; Zoccolan et al., 2007). Given the 2{degree sign} × 2{degree sign} stimulus size, multiple search items could potentially fall simultaneously within peripheral receptive fields. This introduces a potential confound, as attentional modulation is known to be strongest when multiple stimuli appear within a single receptive field (Reynolds et al., 1999). Although the authors acknowledge this issue for visual area V4 (p. 17), it is neither quantified nor controlled for. Without explicit receptive field mapping relative to the search array, comparisons between foveal and peripheral units, as well as between visual areas, are difficult to interpret cleanly.

      (2) Attentional modulation is difficult to dissociate from saccade planning and decision-related signals. The free-gaze paradigm enhances ecological validity but introduces a temporal confound: mean distractor fixation durations are approximately 156 ms (p. 9), while attentional effects emerge between 137 and 170 ms after fixation onset (Figure 2). As a result, the reported attentional modulation coincides with the preparation of the subsequent saccade. Neural activity measured in the primary analysis window (150-225 ms; p. 19), therefore, likely reflects a mixture of visual, attentional, motor planning, target recognition, and behavioral relevance signals, all of which are known to modulate responses in visual areas at similar latencies (e.g., Chelazzi et al., 1998). Moreover, target fixations (~257 ms) and distractor fixations (~156 ms) occur on fundamentally different behavioral timescales, which may inflate apparent foveal attentional effects. While the authors suggest that these timing differences support the idea that foveal feature-based attention facilitates prolonged fixation on target stimuli, this interpretation is not fully supported by the current analyses. That said, the saccade-aligned analyses of peripheral units (Figure S3) partially mitigate this concern by demonstrating that feature-based modulation persists through saccade execution.

      (3) The "attention-out" condition for spatial attention lacks directional control. In the spatial attention analyses (Figures 4D-F), the "attention-out" condition appears to include all fixations followed by saccades directed away from the receptive field, regardless of saccade direction. This differs from classic spatial attention designs, which typically use controlled anti-saccades or saccades to fixed locations opposite the receptive field (e.g., Moore & Armstrong, 2003; Gregoriou et al., 2009). Saccades directed toward locations adjacent to, but outside, the receptive field may still partially engage spatial attention mechanisms near the receptive field via broad attentional fields or motor preparation gradients (Bisley & Goldberg, 2010). In addition, the "attention-out" condition likely contains a heterogeneous mixture of trials in which the stimulus in the receptive field is either a target or a distractor, since feature-based attention effects are derived from this same pool of trials. As a result, spatial and feature attention effects are not fully orthogonal, and variance related to feature attention may already be embedded in the spatial attention baseline.

    1. eLife Assessment

      This valuable study introduces a new framework for improving the automated sorting of extracellular action potentials. However, the evidence is incomplete; the biophysical model used for simulation is based on one simulation that does not necessarily reflect real experimental data, the test datasets are insufficiently diverse, and essential algorithmic details are currently missing. This work will be of interest to neuroscientists using high-density multichannel electrophysiology.

    2. Reviewer #1 (Public review):

      Summary:

      This work presents a flexible spike-sorting framework that allows users to run, swap, and benchmark individual modules commonly used in spike sorting. The paper argues and demonstrates that "opening the black box" is essential for understanding which components drive performance differences and for making progress toward more accurate and transparent spike sorting.<br /> Using this modular benchmarking pipeline, the work identifies electrode drift as a primary bottleneck for accurate sorting and introduces an end-to-end sorter ("Lupin") that combines the best-performing modules and is reported to outperform existing spike-sorting packages on their benchmark.

      Overall, this is a strong tool/resource contribution with clear potential to accelerate spike-sorting development and enable more rigorous comparisons. However, several claims, particularly around Lupin's or individual modules' superiority, are not yet supported robustly enough for the strength of the conclusions stated.

      Strengths:

      This work has high community value and practical utility. The effort to make benchmarking and spike sorting modules accessible and standardized is substantial and likely to be broadly useful.<br /> Treating spike sorting as a set of interchangeable modules is a useful approach to some extent, and it enables targeted improvements rather than 'new sorters' popping up, which are difficult to fully understand.

      Implementing this resource within SpikeInterface, an already widely used tool, will facilitate uptake and community contributions.

      Overall, I am positive about this manuscript as a resource paper. The core framework is compelling and timely.

      Weaknesses:

      (1) The main concern is the limited support for the claim that 'Lupin' and individual modules' outperform existing spike sorters.

      (2) Evidence is primarily from a single benchmark based on an intentionally simplified simulation. While the authors discuss the trade-offs between simulated and real data, the current evaluation does not provide enough diversity to justify claims of superiority.

      (3) While improving individual modules that run in a serial fashion could aid overall spike sorting performance, acknowledging that some end-to-end sorters work in an iterative fashion across multiple of these modules would be fair. Perhaps the optimal spike sorter is not a serial set of modules.

      (4) There is also a risk of benchmark overfitting. A modular approach makes it easy to select components that excel on specific benchmarks (or a specific project's data characteristics) without generalizing.

      Concrete ways to strengthen this work:

      (1) Evaluate on multiple simulation regimes, consider adding at least one biophysically detailed simulation, benchmark on multiple probe-geometries with neurons also clustered in different depth profiles (as this will affect drift solutions), and provide real-data validation. Even without full ground truth, real-data can be evaluated with expert curation, functional validation (e.g., refractory violations, quality metrics, unit waveform consistency), agreement across sorters, and consistency across time.

      (2) Related to real-data applicability, it is also important to acknowledge that modulatory approaches can enable overfitting to the needs of individual projects. Without real-data benchmarking (or benchmark diversity), it is unclear how the framework will guide users towards generalizable 'best practices' rather than optimized configurations that work for their specific conditions.

    3. Reviewer #2 (Public review):

      Summary:

      Spike sorting, that is, assigning events detected in extracellular electrophysiology data to the firing of individual neurons, is an inherently difficult computational problem involving multiple steps. The difficulty arises from low signal-to-noise, instability in signal due to the relative motion of the tissue and recording sites, and large volumes of data. Experimental ground truth data - where the correct assignment of spikes is known - is not available in large enough quantities to test algorithms. This paper describes a tool for creating fully synthetic ground truth data and benchmarking the individual steps of spike sorting to dissect the impact of signal-to-noise, firing rate, and motion correction on each step. This information is used to construct an optimized algorithm for sorting the ground truth data. One result of particular interest is the dominant role of motion correction in degrading accuracy. Another important technical result is that motion correction via interpolation of the voltage traces yields similar accuracy to interpolation of the spike templates.

      Strengths:

      The paper clearly shows the benefits of analyzing the complex process of spike sorting step by step. While this analysis has also been done in papers presenting spike sorters (for example, reference [32]), the tools presented here allow users and developers to do similar studies for their own work. This toolset will be very useful to many labs, especially those working in less studied brain areas or model systems, cases where the tuning of standard spike sorting tools is not a good match to the data.

      Weaknesses:

      The model ground truth data used in the paper does not need to be a perfect match to experimental data to provide useful benchmarking. However, as with all measurements of spike sorting accuracy, extrapolation to experimental data can be complicated. Users of these tools will need to assess how well the simulated data matches their recordings.

    4. Reviewer #3 (Public review):

      Overview:

      In this manuscript, the authors describe two additions to an existing toolbox (SpikeInterface, Buccino et al., 2020, eLife). The first addition is an empirical simulator for extracellular recordings, in which spikes from predefined templates are added up with Gaussian noise. The second addition involves granting user-level access to intermediate processing steps along spike sorting algorithms. The authors demonstrate the toolbox by evaluating functions (e.g., event detection) or sets of functions (e.g., feature extraction + clustering) on their simulated data, and suggest that a specific combination of function implementations provides performance improvement relative to kilosort4 (Pachitariu et al., 2024, Nature Methods).

      If the authors are interested in making this manuscript a suitable scientific contribution, the entire work has to be revised extensively. In particular, the simulator has to be extended and improved; the implementation of existing spike sorters has to be improved; the feedforward architecture of the modules has to be extended; the reporting of results has to follow standard reporting standards; new algorithms have to be explained in sufficient detail; and the manuscript has to undergo extensive proofreading.

      Notably, even assuming perfect implementation and descriptions, it is unclear to me whether the scope of the present work warrants a publication in a scientific journal, or is more suitable for an internal technical report or an e.g., a GitHub version release. To go beyond a scientifically-sound technical report, the authors may choose to demonstrate the utility of their new proposed sorter ("Lupin") and compare it to existing tools on multiple datasets.

      General comments:

      (1) The simulator itself has to be improved and extended. Right now, it simply generates, for every unit, a mother waveform from a sum of exponentials, scales that over channels, and then adds up multiple instantiations of every unit on every channel, along with noise. This is not a biophysical simulator: it is an ad hoc procedure, and the sentence "we firmly believe that.." (lines 482-483) does not make the procedure convincing. To make the simulator credible, the authors should: (1) use a set of biophysical equations, with multi-compartmental modeling of currents and return currents; (2) use noised data from extracellular recordings; or (3) some combination thereof.

      (2) The simulated dataset has to be extended in time. Maybe I missed something, but 500 units over 10 minutes, with some units having firing rates as low as 0.1 spikes/s, corresponds to some of the units firing an expected 60 spikes. This is clearly too short, and does not replicate the standard situation in extracellular experiments.

      (3) The simulated dataset has to be extended in space. The choice of using NeuroPixels 1.0 geometry is a poor one. Many labs use other monolithic electrode arrays (MEAs, silicon probes, other rigid arrays); tetrodes remain a major tool, and flexible probes (polyimide, mesh) are evolving. Assessing algorithms over a single spatial architecture is likely to lead to local maxima in performance and potentially erroneous conclusions.

      (4) The existing spike sorters evaluated are not completely described. Some sorters (e.g., SpyKING Circus and KS4) were described in previous publications, but it is unclear whether the implementation that was used for the present tests is exactly the same as those previously published. More importantly, some of the sorters evaluated (e.g., TDC, TDC2, SpyKING Circus 2) were never described in a peer-reviewed paper. This does not mean that they cannot be evaluated - but if they are, they must be described in full. Relying on the fact that the code is open source cannot replace a complete and accurate scientific description.

      (5) Related to the above, all relevant code should be made available online in permanent repositories, not only in author-controlled ones.

      (6) It is unclear why SpyKING Circus 2 and TDC2 are evaluated - these could potentially be described as straw men. I recommend reorganizing the manuscript so that after every module is evaluated separately based on a limited ground truth dataset, a single "best" sorter would be constructed, and then tested extensively (and compared to the de facto state of the art). Such reorganization would both demonstrate the utility of a modular approach and clarify the general usefulness of the outcome.

      (7) The new algorithms developed, for example, clustering and template matching, have to be described in more detail, and demonstrated graphically on simple datasets. This can be done in supplementary material if the authors prefer not to extend the manuscript too much.

      (8) This reviewer finds the description and interpretation of the results to be inadequate. As an example, focusing on Figure 5: The results in Figure 5A have to be supplemented and summarized as a scalar point estimate (e.g., median accuracy), an estimate of dispersion (e.g., using MAD, IQR, or SD), evaluated over multiple runs, and compared using statistical tests between tools and conditions (e.g., using a multi-dimensional analysis of variance, a mixed effect model, etc.). The results in Figure 5D must have an indication of dispersion. Any conclusions based on the numerical experiments must be based on these metrics and statistical evaluations.

      (9) The entire MS would benefit from expert proofreading; there are many language errors, mostly in indefinite articles and grammatical numbers.

    1. eLife Assessment

      This valuable study presents a real-time system for identifying multiple unrestrained marmosets in a home cage setting using a combination of face detection and color-coded beads. However, there is incomplete evidence regarding the generalizability and robustness of the system to unconstrained multi-animal environments.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript by Yang, Wang, and Cléry presents a lightweight pipeline for real-time identification of common marmosets in a laboratory setting. Models were trained and evaluated on data derived from a family of three closely related adults and a set of juvenile twins. Freely moving animals entered an enclosed space fixed to the housing cage door, which permitted the entry of individual animals for data acquisition. Utilizing YOLOv8-nano, identification was improved through the introduction of uniquely colored collar beads. Analyses of facial similarity showed close morphological relatedness amongst individuals and highlighted the need for highly discriminative classification. Overall, the authors offer a framework for identity tracking that prioritizes real-time inference. The authors demonstrate that combining facial detection with visual markers enables adequate identity assignment under controlled laboratory conditions with minimal cross-individual misclassification.

      Strengths:

      (1) The proposed pipeline offers a solution for real-time identity tracking in common marmosets. Its lightweight design enables deployment across a wide range of hardware configurations. Furthermore, if similar strategies are employed, this methodology is likely adaptable for other species with minimal modification.

      (2) Evaluation of closely related individuals provides a necessary stress test for the discrimination of facial identity tracking.

      Weaknesses:

      (1) The pipeline's reliance on controlled animal isolation and small visual markers raises questions about the approach's generalizability to unconstrained multi-animal environments. The provided confusion matrices (Figures 6-8) indicate that the most common misclassifications are background-related, possibly suggesting that detection specificity is the primary source of error. All things considered, these findings raise concerns about performance in its use in socially dynamic and visually complex environments.

      (2) The manuscript claims performance comparable to that of human experimenters but provides no explicit evidence to support these claims. While it is plausible that human experimenters may be less accurate in facial recognition tasks involving closely related marmosets, the authors don't provide evidence. Moreover, while that might be the case, the color-coded beads provide a salient identity cue for the model, which complicates the interpretation of this comparison grounded in facial recognition.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, Yang et al. develop a real-time system for automatic face detection and identification of multiple unrestrained common marmosets in a home cage setting.

      Strengths:

      The study aims to address an unmet need in behavioral neuroscience: the ability to non-invasively identify animals is crucial to the automated and rigorous study of neural behaviors; this is especially true for common marmosets, which are rapidly becoming a model system of choice for the study of complex social cognition. By using a YOLOv8 backbone, the study achieve human level performance, both in terms of precision and recall of the trained models.

      Weaknesses:

      The robustness of the system is not clear from the limited datasets presented. The use of color-coded beads undercuts the study's premise that the system achieves truly non-invasive tracking. Although the system achieves good performance in face detection, it does not perform as well for classification using faces alone (especially when the faces are similar, as in twin animals). Here, too, the color-coded beads play a key role in identity discrimination. The stated goals of the study and the actual results presented are therefore at odds.

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript, Yang et al introduce a new method for automatically identifying marmosets in their home cage using a supervised deep learning method that recognizes the face and colored beads on marmoset collars. The authors show a high precision rate of identifying marmosets to levels comparable to a human experimenter. The method overall seems robust at identifying marmosets at different life stages and different settings; however, given the current form, I'm struggling to see the generalizability and experimental utility of this method.

      Strengths:

      (1) The authors provide a near-perfect automatic identification of marmosets in their home cage.

      (2) This method is robust across lightning, camera angles, etc., making it potentially useful for marmoset (and other NHP) identification outside the housing cage as well

      Weaknesses:

      (1) Despite the almost perfect precision, in its current form, I'm failing to see how this method can be useful to other labs.

      (2) This is a nice methods manuscript, but the authors do not present results to show how their method can be used outside of identifying marmosets inside their home cages in a small field of view.

      (3) Reading the manuscript is strenuous, given its repetitive nature. Consolidating and shortening the results, as well as adding some definitions to the results section, would be helpful.

    1. eLife Assessment

      This useful study addresses the interesting question of how immune cells recognise infected erythrocytes in malaria. It proposes the parasite protein PfGBP-130 as an interaction partner of the human cell surface protein LFA 1, which could help explain how NK cells recognize infected erythrocytes. The conclusions are partially supported by pull-down and cell-based activation data. However, the overall evidence of direct interaction at the cell-cell interface and downstream effects is incomplete; stronger evidence is required to demonstrate surface exposure of PfGBP-130, as well as a direct role of this antigen in killing.

    2. Reviewer #1 (Public review):

      In this manuscript, the authors aim to determine the ligand on Plasmodium falciparum-infected erythrocytes for the NK cell integrin, LFA-1, following up on previous evidence that LFA-1 is important for immune cell-mediated recognition of iRBCs.

      They start by incubating LFA-1 with iRBCs and show by flow analysis that a substantial population of these iRBCs binds to the LFA-1 (Figure 1C). They do conduct the control with uninfected RBCs, but put this in the supplementary material. As this is a critical control, I think that it should be moved to Figure 1C as it is essential to allow interpretation of the iRBC data. The authors also do not state which strain of P. falciparum they used (line 144). This is critical information as different strains have different variant surface antigens and should be included. With these changes, this data seems convincing.

      They next incubated LFA-1 with the iRBCs, cross-linked and conducted a pulldown, identifying GP130 as a binding partner. Using cross-linkers is a dangerous strategy as it risks non-specific cross-linking. Did they try without cross-linking and find an interaction?

      They raised antibodies to PfGBP and showed IFA, which reveals that these antibodies stain iRBCs (Figure 2Ciii). This experiment lacks a critical control of uninfected RBCs, which needs to be included to show that the staining is specific. Without this, it is not possible to conclude that there is iRBC-specific staining with PfGBP.

      They then conduct a pulldown using LFA-Fc, which does show GP130 only in the presence of the LFA-Fc, but not when empty beads are used. This is convincing. BLI measurements are also used to study this interaction (Figure 2Ci). The BLI data is presented in such a way that any association phase is obscured by the y-axis, which makes it impossible to know whether there is binding here. I think that the data needs to be shown with some baseline before the addition of the ligand so that the association can be seen. The data is also a bit messy with a downward drift and the curves showing different shapes, for example, with the 1.0uM curve seeming to have a different association rate. Also, is this n=1? I think that this data needs to be repeated and replicated. As this is the only data which shows a direct interaction between LFA1 and GBP, as pulldowns are done with lysates, which might mean bridging components. I think that it is important to repeat the BLI or use additional biophysical methods to assess binding, to obtain more convincing data.

      The authors next do some modelling of the putative complex. This is done by homology modelling and docking, which is not the most up-to-date method and is overinterpreted. Personally, I would remove this data as I did not find it convincing, and it is not important for the story. If the authors wish to include it, then I think that they should validate the modelling by mutagenesis to show that the residues which the models indicate might bind are involved in the interaction.

      They next made GP130 and tested the binding of this to THP-1 cells, which are often used as a model for macrophages. They observe greater binding of PfGBP-Fc to these cells when compared with hIgG and show that LFA-1 siRNA reduces this binding. I was a little confused about how the flow plots related to the graph in the bottom right corner of Figure 3Bii. In the flow plots, hIgG control shows 12.8% of cells in the gated region, while the unstained cells has 5.63%, but the MFI data shows a decrease in binding for hIgG vs unstained cells. How is this consistent? Also, the siRNA reduces the number of cells in the gated region from 66.6% to 25.9%, which is still substantially more that 5.63% in the unstained control. This also doesn't seem quite consistent with the MFI data. Could the authors explain this? Also, perhaps an additional experiment would be to add soluble LFA-1 into this assay as an additional control to determine whether this blocks PfGBP binding to the THP-1 cells? It could be that there are additional mechanisms of binding which indicate why the siRNA has a partial effect. The same is true for the NK cell experiments in Figure 3Ci, in which the siRNA has a partial effect. The authors also test binding to HEK, HepG2 and 'stem' cells and claim 'only background levels of binding', but in each case, there is more binding to these cells by PfGBP-Fc than by hIgG, albeit less than in THP-1 and NK cells. Why have the authors decided that these increases are not significant? All in all, these experiments do indicate a role for the GBP-LFA1 interaction in the binding of immune cells to iRBCs, but perhaps not as absolutely as is suggested.

      The authors next produce CHO cells with PfGBP on the surface. These cells bind to LFA-1 specifically. When these cells were incubated with primary NK cells, they did see increases in activation markers, which were reduced by the addition of anti-CD11a, suggesting these to be specific. They also conduct the same experiment with anti-GBP with iRBCs, but this is in a different figure. It would be easier for the reader if Figure 5B were in the same figure as Figure 4B, as it is related data using the same method. I found this data convincing, showing that the LFA1:GBP interaction does contribute to immune cell recognition and activation.

      The authors next conduct an experiment in which they assess parasite growth in the presence of NK cells and in the presence of anti-GBP. They use Heochst staining as a measure of parasite growth and claim that NK cells reduce the number of parasites, but that anti-GBP abolishes this effect (Figure 5A). I found this experiment very unconvincing as there are small effects and no demonstration of significance. More commonly used approaches to study parasite growth are lactate dehydrogenase GIA assays or calcein-AM labelling. I did not find this experiment convincing and would either remove or supplement with additional data using a more robust assay, with repeats and tests of statistical significance.

      In summary, the authors present a set of data which comes together to indicate an interaction between LFA1 and PfGBP on the Plasmodium-infected erythrocyte surface. Pulldown studies show convincingly that these two proteins co-precipitate, and BLI data suggest that this is direct. Also convincing is that NK cell activation can be reduced using antibodies against either LFA1 or PfGBP, indicating that this interaction does play a role in immune cell recognition of iRBCs.

    3. Reviewer #2 (Public review):

      Summary:

      The authors used an LFA-1 αI-Fc fusion protein to pull down potential ligands and LC-MS/MS, leading to the selection of PfGBP-130 as a potential membrane protein on the surface of infected cells. PfGBP-130 antibodies were raised and used to support the surface localization. This putative ligand interacted strongly with LFA-1 (Kd = 15 nM). A presumed PfGBP-130 ectodomain interacts with monocytes and NK cells but not cells that lack LFA-1. PfGBP-130 antibodies also interfered with NK cell-mediated infected cell killing; the effect, although statistically significant, is modest. The authors propose that NK cells recognize infected cells via LFA-1 interaction with PfGBP-130 exposed on the host cell and that this interaction is critical to initiation of NK cell activation and killing of infected cells.

      Major points:

      (1) PfGBP-130 is proposed to be a membrane protein based on a single predicted transmembrane domain. Figures 2b and 3a show ribbon schematics with this TM domain at residues 51-68, in agreement with TM prediction algorithms such as TMHMM 2.0 and Phobius. However, this predicted TM is upstream of the PEXEL motif (residues 84-88, sequence RILAE), a conserved sequence for parasite protein export to host cytosol that is proteolytically processed at its 4th residue. Thus, residues 1-87 are removed from PfGBP-130 prior to export, yielding a mature protein without predicted TMs. Prior studies have determined that the mature PfGBP-130 lacks TMs and is retained as a soluble protein in host cell cytosol (PMID: 19055692, 35420481). Thus, the authors' model of PfGBP-130 as a surface-exposed membrane protein conflicts with both computational analysis of the mature protein and these prior reporter studies. An important simple experiment would be to evaluate PfGBP-130 membrane association in immunoblots using the authors' PfGBP-130 antibody after hypotonic lysis (PMID: 19055692) and after alkaline extraction (e.g. 100 mM NaCO3, pH 11 as frequently used, PMID: 33393463). If the prior studies and computational analyses are correct, the protein will be predominantly in the soluble and/or alkaline supernatant fractions.

      (2) Many findings rely on the specificity of antibodies generated against PfGPB-130 or NK cell receptors. Although the authors have included key controls (use of isotype control antibodies, lack of anti-PfGBP-130 binding to uninfected cells), cross-reactivity between P. falciparum antigens is well-recognized and could significantly undermine the interpretation of experiments (PMID: 2654292 and 1730474 provide key examples of antigens recognized by antibodies raised against other proteins). For example, the surface localization in IFA experiments (Figure 2B(iii)) could reflect anti-PfGBP-130 binding to an unrelated parasite surface antigen, a possibility not addressed by any of the authors' controls. As another example, the iRBC lysate immunoblot using this antibody in Fig. 2B(iv) suggests a MW of 95 kDa, which corresponds to the unprocessed pre-protein before export; cleavage in the PEXEL motif yields a processed mature protein of 85 kDa, which should be readily resolved from the pre-protein in immunoblots (PMID: 19055692). A better immunoblot using immature infected cell stages might show both the pre-protein and the mature protein as a doublet band.

      (3) PfGBP-130 is not essential for in vitro cultivation (PMID: 18614010 and MIS of 1.0 in the piggyBac mutagenesis screen as tabulated on plasmodb.org, indicating a highly dispensable gene). The authors should use the knockout line as a control in their IFA localization experiments to address antibody specificity. More fundamentally, their model predicts that NK cells should not recognize or kill infected cells from the knockout line when compared to their untransfected parent. Such results with the knockout line would compellingly support the authors' model without reliance on antibodies that may cross-react with other parasite antigens. PMID: 18614010 reported that the PfGBP-130 knockout exhibited increased membrane rigidity, suggesting an intracellular scaffolding protein rather than a surface localization and use as a ligand for LFA-1 interaction and NK cell-mediated killing.

      (4) PfGBP-130 non-essentiality raises the question of why the gene would be retained if it triggers NK cell-mediated killing of infected cells in vivo. Presumably, this killing would pose strong selective pressure against retention of PfGBP-130. Some speculation is warranted to support the model.

    4. Reviewer #3 (Public review):

      Summary:

      Malhotra and colleagues present evidence that the integrin LFA-1 on NK cells is a ligand for the Plasmodium falciparum protein GBP130 on the infected erythrocyte surface and that this interaction plays a role in the clearance of infected erythrocytes by NK cells.

      The authors first select a subdomain contained within the CD11a subunit of LFA-1 as a probe to discover possible binding proteins on the infected erythrocyte surface. Parasite-infected erythrocytes stained positively with this probe; the level of staining increased as the parasites progressed through the life cycle. Using the LFA-1-based probe in cross-linking pull-down experiments, GBP130 was identified by mass spectrometry as a co-purifying parasite protein. The N-terminal portion of GBP130 was recombinantly expressed and shown to interact with LFA-1 alpha-I by biolayer interferometry experiments. The full-length extracellular domain of GBP130 was then recombinantly expressed and used to stain primary human NK cells and THP-1 cells. Knocking down LFA-1 by siRNA reduced staining by GBP130. To assess the contribution of GBP130 to the activation of NK cells, CHO cells exogenously expressing GBP130 were incubated with primary NK cells. Transfecting CHO cells with GBP130 led to increased activation of co-incubated NK cells compared to mock-transfected and compared to GBP130 transfected cells, with the inclusion of anti-CD11a to block NK cell adhesion. Finally, CHO cells expressing GBP130 led to increased activation of NK cells compared to mock-transfected CHO cells.

      Overall, although the authors present data from NK cell killing assays that include appropriate controls, the data suggesting a direct interaction between PfGBP-130 and LFA-1 does not include the same necessary controls, for example, the use of blocking antibodies. Most critically, the biolayer interferometry experiments use a recombinant fragment of PfGBP-130, which does not include the residues predicted to be important for mediating specific interaction with LFA1. The biolayer interferometry data instead suggest non-specific interactions between PfGBP-130 and LFA1, as binding does not reach saturation.

  2. Mar 2026
    1. eLife Assessment

      This article presents valuable findings on how the timing of cooling affects the timing of autumn bud set in European beech saplings. The study leverages extensive experimental data and provides an interesting conceptual framework for the various ways in which warming can affect but set timing. The statistical analysis is compelling, but indicates some factors that may temper the authors' claims, while the designs of experiments offer incomplete support for the current claims as they rely on one population under extreme conditions for only one year each while a confounding effect (time in a chamber) sometimes lacks a control.

    2. Reviewer #1 (Public review):

      Summary:

      This study provided key experimental evidence for the "Solstice-as-Phenology-Switch Hypothesis" through two temperature manipulation experiments.

      Strengths:

      The research is data-rich, particularly in exploring the effects of pre- and post-solstice cooling, as well as daytime versus nighttime cooling, on bud set timing, showcasing significant innovation. The article is well-written, logically clear, and is likely to attract a wide readership.

      Comments on revisions:

      This is the second round of review, and I am generally very satisfied with the authors' revisions. However, a few detailed issues still require attention:

      The authors identified the summer solstice (June 21) as a phenological "switch point", but the flexibility of this switch point remains poorly understood. A more precise explanation of what "flexibility" means in this context is needed, along with a description of the specific experimental results that would demonstrate this flexibility.

      The experiment did not directly measure the specific date of the phenological switch point. Instead, it was inferred by comparing temperature effects before and after the solstice. The manuscript should clearly state that this switch point remains an inferred conceptual node rather than a directly measured variable.

      In Experiment 1, the effect of bud type (terminal vs. lateral) was inconsistent across the overall model and the different leafing groups. The authors should provide a more thorough discussion of potential reasons for this inconsistency. In addition, the statistical model for Experiment 1 indicates that the measured variables (summer cooling and leaf emergence date) explain only 23.4% of the variation in bud formation timing. This leaves over 76% of the variation unexplained, suggesting that other important factors are involved. The discussion should address this limitation in greater depth, moving beyond a focus on the measured variables.

    3. Reviewer #2 (Public review):

      In 'Developmental constraints mediate the summer solstice reversal of climate effects on European beech bud set [their original title]' Rebindaine and co-authors report on two experiments on Fagus sylvatica where they manipulated temperatures of saplings between day and night and at different times of year. I think the experiments are interesting, but I found the exact methods of them somewhat extreme compared to how the authors present them. Further, given that much of the experiment happened outside, I am not sure how much we can generalize from one year for each experiment, especially when conducted on one population of one species. I was also very concerned by the revisions.

      I expand briefly on these concerns and a few others for readers of the paper (see `The below comments relate to my original review'). Subsequent edits to the paper addressed some of these by providing a new figure and moving around the methods. Further, I am at a loss about their hypothesis, when they write in their letter: "Importantly, the Solstice-as-Phenology-Switch hypothesis does not assume that the reversal is fixed to June 21." Why on earth reference the solstice if the authors do not mean to exactly reference the solstice?

      The comments below relate to my original review with many of them still applying.

      Methods: As I read the Results I was surprised the authors did not give more info on the methods here. For example, they refer to the 'effect of July cooling' but never say what the cooling was. Once I read the methods I feared they were burying this as the methods feel quite extreme given the framing of the paper. The paper is framed as explaining observational results of natural systems, but the treatments are not natural for any system in Europe of which I have worked in. For example a low of 2 deg C at night and 7 deg C during the day through end of May and then 7/13 deg C in July is extreme. I think these methods need to be clearly laid out for the reader so they can judge what to make of the experiment before they see the results.

      I also think the control is confounded with growth chamber experience in Experiment 1. That is, the control plants never experience any time in a chamber, but all the treatments include significant time in a chamber. The authors mention how detrimental chamber time can be to saplings (indeed, they mention an aphid problem in experiment 2) so I think they need to be more upfront about this. The study is still very valuable, but -- again -- we may need to be more cautious in how much we infer from the results.

      Also, I suggest the authors add a figure to explain their experiments as they are very hard to follow. Perhaps this could be added to Figure 1?

      Finally, given how much the authors extrapolate to carbon and forests, I would have liked to see some metrics related to carbon assimilation, versus just information on timing.

      Fagus sylvatica: Fagus sylvatica is an extremely important tree to European forests, but it also has outlier responses to photoperiod and other cues (and leafs out very late) so using just this species to then state 'our results likely are generalisable across temperate tree species' seems questionable at best.

      Measuring end of season (EOS): It's well known that different parts of plants shut down at different times and each metric of end of season -- budset, end of radial expansion, leaf coloring etc. -- relate to different things. Thus I was surprised that the authors ignore all this complexity and seem to equate leaf coloring with budset (which can happen MONTHS before leaf coloring often) and with other metrics. The paper needs a much better connection to the physiology of end of season and a better explanation for the focus on budset. Relatedly, I was surprised the authors cite almost none of the literature on budset, which generally suggests is it is heavily controlled by photoperiod and population-level differences in photoperiod cues, meaning results may different with a different population of plants.

      Somewhat minor comments:<br /> (1) How can a bud type -- which is apical or lateral -- be a random effect? The model needs to try to estimate a variance for each random effect so doing this for n=2 is quite odd to me. I think the authors should also report the results with bud type as fixed, or report the bud types separately.<br /> (2) I didn't fully see how the authors results support the Solstice as Switch hypothesis, since what timing mattered seemed to depend on the timing of treatment and was not clearly related to solstice. Could it be that these results suggest the Solstice as Switch hypothesis is actually not well supported (e.g., line 135) and instead suggest that the pattern of climate in the summer months affects end of season timing?

    4. Author Response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This article presents valuable findings on how the timing of cooling affects the timing of autumn bud set in European beech saplings. The study leverages extensive experimental data and provides an interesting conceptual framework of the various ways in which warming can affect bud set timing. The support for the findings is incomplete, though extra justifications of the experimental settings, clarifications of the interpretation of the results, and alternative statistical analyses can make the conclusions more robust.

      We thank the editors and reviewers for their expert assessment of our findings and their interest in our conceptual framework. Below we respond to the specific reviewer and editor comments.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study provided key experimental evidence for the "Solstice-as-PhenologySwitch Hypothesis" through two temperature manipulation experiments.

      Strengths:

      The research is data-rich, particularly in exploring the effects of pre- and postsolstice cooling, as well as daytime versus nighttime cooling, on bud set timing, showcasing significant innovation. The article is well-written, logically clear, and is likely to attract a wide readership.

      Thank you for your generous description of our study and the manuscript.

      Weaknesses:

      However, there are several issues that need to be addressed.

      (1) In Experiment 1, significant differences were observed in the impact of cooling in July versus August. July cooling induced a delay in bud set dates that was 3.5 times greater in late-leafing trees compared to early-leafing ones, while August cooling induced comparable advances in bud set timing in both early- and late-leafing trees.

      The study did not explain why the timing (July vs. August) resulted in different mechanisms. Can a link be established between phenology and photosynthetic product accumulation? Additionally, can the study differentiate between the direct warming effect and the developmental effect, and quantify their relative contributions?

      We thank the reviewer for pointing out that we could improve our explanation of the different responses to July and August cooling in experiment 1. Whilst we incorporated this in the conceptual model and the figure caption (Fig. 1b), we now also address this topic in more depth in the discussion section, focussing on daylength and photosynthetic assimilation as the possible mediators of this change in responses (L350-371).

      For the early-season development effect vs the late-season temperature effect we can use the leaf-out day-of-year (as a proxy for development), and the summer cooling treatments (direct temperature effect) to assess the relative importance of these two components of our model. We have now included a variance partitioning analysis following this logic, see L246-252 for methods, L278-281 for results.

      (2) The two experimental setups differed in photoperiod: one used a 13-hour photoperiod at approximately 4,300 lux, while the other used an ambient day length of 16 hours with a light intensity of around 6,900 lux. What criteria were used to select these conditions, and do they accurately represent real-world scenarios? Furthermore, as shown in Figure S1, significant differences in soil moisture content existed between treatments - could this have influenced the conclusions?

      This question may reflect a misunderstanding regarding the light availability that we hope to address with improved clarification. The duration and intensity of the lighting in these experiments was always set to reflect the average conditions experienced in Zurich for those respective times of the year. Day length in spring is shorter than it is in summer, so the durations were simply adjusted to reflect this reality. The 13-hour, 4,300 lux conditions in experiment 1 were only for the April-May period, when we reduced developmental rates for the late-leafing trees (L125-129). In July, the photoperiod was set to 16 hours and light intensity was approximately 7,300 lux (L150-154). This is equitable to experiment 2–when treatments were applied in June and July–where photoperiod was 16 hours and light intensity approximately 6,900 lux (L206-207). These conditions reflect the average daylengths in Zurich, and the maximum light intensity output by the chambers.

      As mentioned in our initial author response, we do not think small differences in soil moisture levels should influence our conclusions. All pots were watered sufficiently to avoid water deficit, and all efforts were made to minimise differences in water availability. A Tukey honest significant difference test showed that only one treatment pair (6 - Late_July_Extreme vs. 7 - Early_August_Moderate, difference = 6%, p < 0.05) had significantly different soil water content, a pair whose responses are not compared. We have added words to this effect in the figure legend of Fig. S1.

      (3) The authors investigated how changes in air temperature around the summer solstice affected primary growth cessation, but the summer solstice also marks an important transition in photoperiod. How can the influence of photoperiod be distinguished from the temperature effect in this context?

      We agree that photoperiod likely plays a central role. Our conceptual model (Fig. 1) explicitly incorporates photoperiod as the framework within which temperature responses are regulated (L72-75, L627-629 & L638-641). The Solstice-as-Phenology-Switch hypothesis assumes that the annual progression of daylength sets the physiological “window” for trees’ responsiveness to temperature. Our experiments therefore focused on how temperature responses differ before versus after the solstice, while recognising that this reversal is likely enabled by the photoperiod signal. In other words, photoperiod provides the regulatory backdrop, and our results identify how diel and seasonal temperature cues are interpreted within that photoperiodic framework.

      (4) The study utilized potted trees in a controlled environment, which limits the generalization of the results to natural forests. Wild trees are subject to additional variables, such as competition and precipitation. Moreover, climate differences between years (2022 vs. 2023) were not controlled. As such, the conclusions may be overgeneralized to "all temperate tree species", as the experiment only involved potted European beech seedlings. The discussion would benefit from addressing species-specific differences.

      We agree that extrapolation from our experiments on Fagus sylvatica to other species and natural forests requires caution. However, it is precisely the controlled nature of our design that allowed us to isolate the precise mechanisms that appear to underpin the solstice switch, highlighting the role of diel and seasonal temperature variation. In natural systems, additional variables such as competition, precipitation, and soil heterogeneity can strongly influence phenology, but they also make it difficult to disentangle causal mechanisms. By minimising these confounding factors, our experiment provided a clear test of how temperature before and after the solstice regulates growth cessation.

      To acknowledge the limitation, we have toned down statements about generalisation (e.g. “likely generalisable” to “other temperate tree species may display similarities”; L409-411) and explicitly call for follow-up studies across species and forest contexts (L413–414). At the same time, we highlight that our findings align with independent evidence from manipulative experiments, satellite observations, flux measurements, and ground-based phenology, which suggests the mechanisms we report may extend beyond the specific populations studied here.

      Reviewer #2 (Public review):

      In 'Developmental constraints mediate the summer solstice reversal of climate effects on European beech bud set', Rebindaine and co-authors report on two experiments on Fagus sylvatica where they manipulated temperatures of saplings between day and night and at different times of year. I enjoyed reading this paper and found it well written. I think the experiments are interesting, but I found the exact methods somewhat extreme compared to how the authors present them. Further, given that much of the experiment happened outside, I am not sure how much we can generalize from one year for each experiment, especially when conducted on one population of one species. I next expand briefly on these concerns and a few others.

      Thank you for the kind comments. We appreciate your concerns regarding the severity of our treatments and the generalisability of our results, and you can find our detailed responses below.

      Concerns:

      (1) As I read the Results, I was surprised the authors did not give more information on the methods here. For example, they refer to the 'effect of July cooling' but never say what the cooling was. Once I read the methods, I feared they were burying this as the methods feel quite extreme given the framing of the paper. The paper is framed as explaining observational results of natural systems, but the treatments are not natural for any system in Europe that I have worked in. For example, a low of 2 {degree sign}C at night and 7 {degree sign}C during the day through the end of May and then 7/13 {degree sign}C in July is extreme. I think these methods need to be clearly laid out for the reader so they can judge what to make of the experiment before they see the results.

      We understand the concern regarding the structure of the manuscript and note that the methods section was moved to the end of the paper in accordance with eLife’s recommended formatting. We have now moved the methods section before the results to ensure that readers are familiar with the treatments before encountering the outcomes.

      We recognise that our temperature treatments were severe and do not mimic real world scenarios. They were deliberately designed to create large contrasts in developmental rates, thereby maximising our ability to detect the mechanisms underpinning the solstice switch. For example, the severe cooling between 4 April and 24 May was specifically designed to slow spring development as much as possible without damaging the plants (L129-L133). We have added text in the Methods to clarify this aim (L129-131 & L156-161).

      Regarding presentation, treatment details are now described in both the Methods and the relevant figure legends. Given this structure, we have chosen not to restate the full treatment conditions in the main Results text to avoid repetition.

      (2) I also think the control is confounded with the growth chamber experience in Experiment 1. That is, the control plants never experience any time in a chamber, but all the treatments include significant time in a chamber. The authors mention how detrimental chamber time can be to saplings (indeed, they mention an aphid problem in experiment 2), so I think they need to be more upfront about this. The study is still very valuable, but again, we may need to be more cautious in how much we infer from the results.

      We appreciate the reviewer’s concern about the potential confounding effect of chamber exposure in experiment 1. We have now discussed this limitation more explicitly, adding further explanation to the Methods (L146-148) and Discussion (L345-346).

      Note that chamber-related problems (e.g. aphid infestations) primarily occurred under warm chamber conditions, whereas our experiment 1 cooling treatments maintained low temperatures that suppressed such issues. This means that an equivalent “warm chamber control” could have been associated with its own artefacts, as trees kept under warm chamber conditions would have been exposed to additional stressors that were not present under natural growing conditions. To address this point, we included a chamber control in experiment 2. While aphid abundance was indeed higher in the warm chamber controls, chamber exposure itself had no detectable effect on autumn phenology. This suggests that the main findings of experiment 1 are unlikely to be artefacts of chamber conditions (L141145).

      Nevertheless, we agree that chamber exposure remains a potential limitation of experiment 1, which requires clear acknowledgement. We now state this more explicitly in the manuscript while also emphasising that our results are supported by experiment 2 and by converging lines of external evidence.

      (3) I suggest the authors add a figure to explain their experiments, as they are very hard to follow. Perhaps this could be added to Figure 1?

      We have now added figures to the methods section to depict the experimental timelines and settings more clearly (Figs. 2 and 3).

      (4) Given how much the authors extrapolate to carbon and forests, I would have liked to see some metrics related to carbon assimilation, versus just information on timing.

      We agree that including more data on photosynthetic assimilation would be valuable for interpreting phenological responses. Indeed, it was our intention to collect this information. However, unfortunately, we experienced technical challenges with the equipment available to us during the experimental period, which prevented us from collecting a full dataset. Nevertheless, we were able to obtain measurements during pre-solstice cooling (now presented as Fig. S12, including data for all treatments), which show that cooling treatments strongly reduced assimilation rates compared to controls. Importantly, these strong reductions occurred across all cooling treatments, yet their phenological outcomes differed markedly, demonstrating that assimilation alone cannot explain the observed responses. As we discuss, our findings are consistent with previous manipulative and observational studies reporting a weak role of late-season assimilation in controlling autumn phenology.

      (5) Fagus sylvatica is an extremely important tree to European forests, but it also has outlier responses to photoperiod and other cues (and leafs out very late), so using just this species to then state 'our results likely are generalisable across temperate tree species' seems questionable at best.

      We agree that Fagus sylvatica has a stronger photoperiod dependence than many other European tree species. As we note in our response to Reviewer 1 (comment 4), our findings align with previous research across temperate northern forests. Within our framework, interspecific variation in leaf-out timing would not alter the overall response pattern, though it could shift the specific timing of effect reversals. For example, earlier-leafing species may approach completion of development sooner and thus show sensitivity to late-season cooling earlier than F. sylvatica. Nevertheless, we acknowledge the importance of not overstating generality. We have therefore revised the manuscript to phrase conclusions more cautiously (L409411) and highlight the need for further research across species (L413–414).

      (6) Another concern relates to measuring the end of season (EOS). It is well known that different parts of plants shut down at different times, and each metric of end of season - budset, end of radial expansion, leaf coloring, etc - relates to different things. Thus, I was surprised that the authors ignore all this complexity and seem to equate leaf coloring with budset (which can happen MONTHS before leaf coloring often) and with other metrics. The paper needs a much better connection to the physiology of end of season and a better explanation for the focus on budset. Relatedly, I was surprised that the authors cite almost none of the literature on budset, which generally suggests it is heavily controlled by photoperiod and population-level differences in photoperiod cues, meaning results may be different with a different population of plants.

      We thank the reviewer for pointing out that our discussion of the responses of different EOS metrics needs more clarity. We agree with much of this perspective, and we have added an additional analysis of leaf chlorophyll content data to use leaf discolouration as an alternative EOS marker (L179-195 for methods, L296-311 for results). On this we would like to make two important points:

      Firstly, we agree that bud set often occurs before leaf discolouration, although this can depend on which definition of leaf discolouration is used. In experiment 1, bud set occurred on average on day-of-year (DOY) 262 and leaf senescence (50% loss of leaf chlorophyll) occurred on DOY 320. However, we do not necessarily agree that this excludes the combined discussion of bud set and leaf senescence timing. Whilst environmental drivers can affect parts of plants differently, often responses from different end-of-season indicators (e.g. bud set and loss of leaf chlorophyll) are similar, even if only directionally. Figure S11 shows how, across both experiments, treatment effects were tightly conserved (R<sup>2</sup> = 0.49) amongst the two phenometrics. In accordance with these revisions, we have updated the manuscript title to “Developmental constraints mediate the summer solstice reversal of climate effects on the autumn phenology of European beech” (L1-2).

      Secondly, shifts in bud set timing remain the primary focus of the manuscript as these shifts are of direct physiological relevance to plant development and dormancy induction, whereas leaf discolouration may simply follow bud set as a symptom of developmental completion. This is supported by our results, which show stronger responses of bud set than leaf senescence (Figs. 4 & 5 vs. Figs. S9 & S10).

      Following the reviewer’s suggestion, we have included more references on the topic of bud set and its environmental controls. The reviewer rightly stresses that photoperiod is considered the most important factor. As mentioned above (see Reviewer 1 comment 3), photoperiod is therefore key in our conceptual model. However, the responses we observed in F. sylvatica cannot be explained by photoperiod alone. For example, in experiment 1, July cooling delayed the autumn phenology of late-leafing trees but had negligible impact on early-leafing trees, even though both experienced the exact same photoperiod. Moreover, in experiment 2, day, night and full-day cooling showed substantial variations in their effects despite equal photoperiod across the climate regimes. This is why we suggest that the annual progression of photoperiod modulates the responses to temperature variations instead of eliciting complete control.

      (7) I didn't fully see how the authors' results support the Solstice as Switch hypothesis, since what timing mattered seemed to depend on the timing of treatment and was not clearly related to the solstice. Could it be that these results suggest the Solstice as Switch hypothesis is actually not well supported (e.g., line 135) and instead suggest that the pattern of climate in the summer months affects end-of season timing?

      We interpret this concern as relating to the flexibility in reversal timing that we observed. Importantly, the Solstice-as-Phenology-Switch hypothesis does not assume that the reversal is fixed to June 21. Rather the hypothesis implies that reversal occurs around the solstice, when photoperiod cues cause tree individuals to shift from accelerating to decelerating their seasonal development. Our conceptual model (Fig. 1) explicitly incorporates this flexibility by showing how the timing of the reversal depends on developmental speed: Individuals that develop more slowly (or leaf out later) cross the compensatory point later in the summer, whereas fast developing individuals reach it earlier.

      Our experiments support this framework: pre-solstice full-day cooling delayed bud set, whereas post-solstice full-day cooling advanced it, with differences between early- and late-developing individuals consistent with the model. Moreover, the contrasting impacts of daytime vs. night time cooling demonstrate how diel conditions can further shape when the reversal is expressed. Thus, rather than contradicting the Solstice-as-Phenology-Switch hypothesis, our findings reinforce it and extend it by showing how flexibility arises from interactions between developmental progression, diel temperature responses, and photoperiod.

      We have added an additional section in the Discussion that elaborates on how our results support the Solstice-as-Phenology-Switch hypothesis (L416-432).

      Recommendations for the authors:

      Reviewing Editor (Recommendations for the authors):

      (1) The current strength of evidence is incomplete. Extra justifications of the experimental settings, clarifications of the interpretation of the results, and alternative statistical analyses could make the conclusions more solid.

      We agree with the vast majority of the reviewer comments and have made the relevant edits. We believe that these have dramatically improved the clarity of the manuscript. The revised analyses have not changed our conclusions, though we have toned down generalisations.

      (2) The Solstice as Switch hypothesis is about the effect of temperature warming. However, the two experiments did not simulate warming but rather cooling. Although a temperature difference can be obtained compared to the control in both cases, the impacts on plant physiology and phenology should still be different between the two scenarios.

      Thank you for raising this point, which requires clearer communication in our manuscript. The Solstice-as-Phenology-Switch hypothesis posits that changes in temperature before and after the summer solstice have opposite effects on the autumn phenology of northern forest trees. While the hypothesis has most often been framed in terms of warming, the underlying mechanism concerns whether development is accelerated or slowed relative to ambient conditions. In essence, we are exploring the effect of changes in temperature – not warming per se. In warmer springs, development begins earlier and/or proceeds faster, while in colder springs the opposite occurs; the same logic applies to post-solstice conditions. We have extended our explanation in the Introduction (L69-71).

      In our experiments, we applied cooling to create strong contrasts in developmental rates without damaging the trees. These treatments allow us to test the direction of phenological responses relative to ambient conditions. Thus, although we used cooling rather than warming, the results are directly informative for the Solstice-as Switch framework, which concerns the relative effect of temperature changes rather than the absolute direction of manipulation.

      (3) The number of groups for bud type and summer temperature treatment is too small to be used as a random effect; it would be more appropriate to treat them as fixed-effect terms.

      We have revised the analysis to include bud type as a fixed effect. There are only very minor numerical adjustments (e.g. rounding to 4.8 days instead of 4.9, see L271) and inferences are not altered. We also report the bud type effects for experiment 1 (L262-266) and experiment 2 (L292-293)

      (4) Please add more clarifications for Figure 4 about what this figure is for and how you derived this figure, whether the data were from your experiments or others.

      We have rewritten the caption for Figure 6 (Fig. 4 in the previous manuscript) to clarify where the data came from and how the figure was generated (L687-693). This figure serves as a visual guide to aid the understanding of the processes that may govern the patterns we have observed. Figure 6a uses data from previous studies on diel patterns in F. sylvatica, specifically growth (Zweifel et al., 2021) and photosynthetic assimilation rates (Urban et al., 2014). To aid visualisation, we linearly interpolated between measurements points, converted the values to a relative percentage (compared to observed maximum), and then smoothed the resulting curves. Based on the evidence from experiment 2, we suggest there may be a temperature threshold below which overwintering responses (e.g. bud set) are induced in F. sylvatica. Figure 6b depicts a theoretical diel pattern of this potential threshold. In simple terms, the threshold must be lower at night because nights are typically colder than days.

      Reviewer #2 (Recommendations for the authors):

      (1) How can a bud type -- which is apical or lateral -- be a random effect? The model needs to try to estimate a variance for each random effect, so doing this for n=2 is quite odd to me. I think the authors should also report the results with bud type as fixed, or report the bud types separately.

      See point (3) in reviewing editor’s recommendations for the authors.

      (2) Could the authors move the methods earlier and remind readers of them in the results?

      We have addressed this issue, please see detailed response under reviewer 2’s concerns.

      Urban O, Klem K, Holišová P, Šigut L, Šprtová M, Teslová-Navrátilová P, Zitová M, Špunda V, Marek MV, Grace J. 2014. Impact of elevated CO2 concentration on dynamics of leaf photosynthesis in Fagus sylvatica is modulated by sky conditions. Environmental Pollution 185: 271–280.

      Zweifel R, Sterck F, Braun S, Buchmann N, Eugster W, Gessler A, Häni M, Peters RL, Walthert L, Wilhelm M, et al. 2021. Why trees grow at night. New Phytologist 231: 2174–2185.

    1. eLife Assessment

      The authors previously identified SLAP as a key suppressor of the Src tyrosine kinase and a tumor suppressor. In this important study, the authors show SLAP functions in a cell-autonomous fashion in colon stem cells and propose solid evidence that SLAP reduces tumorigenesis by inhibiting an EphB2-SRC axis.

    2. Reviewer #1 (Public review):

      Naim et al. use genetically engineered mouse models and tissue culture cell lines to investigate the role of the SLAP adaptor protein in colonic epithelium and colon tumour formation. The SLAP adaptor protein is known to be a negative regulator of tyrosine kinase signaling in hematopoietic cells, but its role outside the immune system is less well defined. Here, the authors use genetically engineered SLAP-deficient mice, tissue-specific SLAP KO, and colonic organoids to demonstrate that SLAP is expressed in cells of the colonic epithelium, where it acts as a cell-autonomous regulator of proliferation and differentiation. In addition, they provide biochemical evidence that loss of SLAP expression in cultured colonic organoids results in increased Src family kinase activity and global tyrosine phosphorylation, consistent with its known role as a suppressor of tyrosine kinase activity in immune cells. Consistently, treatment with an SRC kinase inhibitor inhibited the growth of SLAP-deficient organoids. These data provide solid evidence of a cell-autonomous role of SLAP in the colonic epithelium.

      This work would be improved by further description and interpretation of the SLAP expression pattern shown in the constitutive and tissue-specific KO to further support the conclusions made. In Supplementary Figure 1, magnification of the colon epithelium areas with SLAP expression shown by b-gal and anti-SLAP staining, highlighting regions of interest, would better support the conclusions regarding SLAP expression in specific regions of the colon epithelium. In Supplementary Figure 1B, the authors should indicate that the SLAP staining referred to is epithelial and in resident immune cells, as is mentioned in the text. Also, magnification of the boxed area of LRG5 staining in Figure 1 would improve this figure.

      Using a chemically induced model of colitis-associated cancer, the authors demonstrate that inactivation of SLAP shows a trend toward increased tumor formation (though this did not reach significance) as well as increased Src family kinase activity within tumors. Tumor spheres from SLAP-deficient animals showed enhanced growth that was suppressed by treatment with a Src family kinase inhibitor. Of note, the latter effect was specific to SLAP-deficient tumor spheres. These observations are convincing and support the authors' conclusion that SLAP has a tumor suppressor role in CRC through inhibition of SFK signaling.

      Mechanistically, elevated expression of the RTK, EphB2, was detected in immunoblots of SLAP KO colonic crypts, while overexpression of SLAP in CRC cell lines downregulated EphB2 protein levels. Using an EPHB2 inhibitor, the role of EPHB2 in the growth of SLAP-deficient colonic organoids was demonstrated. While these data generally support the authors' conclusion that SLAP limits colonic organoid growth by downregulating RTKS such as EphB2 and downstream Src family kinase activity, they do not show which cell types/regions in the colonic epithelium have increased EPHB2 protein and how this relates to SLAP and phospho-SRC expression, as shown in Figure 1 and Figure S1 immunocytochemistry. The expression of EphB2 and its role in colonic tumorsphere growth were not investigated.

      Overall, this work provides evidence of SLAP adaptor function in restricting tyrosine kinase signaling in the colonic epithelium, and suggests that loss of SLAP expression could promote tumorigenesis in this context.

    3. Reviewer #2 (Public review):

      Summary:

      Protein tyrosine kinases are subject to diverse regulatory mechanisms controlling their activity in normal situations. The authors previously identified SLAP (Src-like adaptor protein), a negative regulator of receptor tyrosine kinase (RTK) signaling, as a key suppressor of the cytoplasmic tyrosine kinase SRC in the normal colon and demonstrated that SLAP is downregulated in a majority of colorectal cancers (CRCs).

      In this study, the authors further explored SLAP functions in mouse models using constitutive and inducible epithelial-specific Slap deletion (villin-CreERT2 model). They found that loss of SLAP augments colonic epithelial cell proliferation and that induction of tumorigenesis by the AOM/DSS protocol mimicking CRC leads to more aggressive tumors in the absence of SLAP. This effect is apparently cell-autonomous as growth of normal and tumoral colonic organoids is SLAP-dependent in in vitro settings. Finally, the authors define that, in colon, SLAP represses EphB2, an RTK lying upstream of SRC, and show that inhibitors of EphB2 can partially limit tumorigenic development in vitro.

      Strengths:

      The manuscript is clearly and concisely written, making it easy to follow. The data obtained in the mouse models are very convincing.

      Weaknesses:

      Direct evidence that EphB2 is activated/phosphorylated in the absence of SLAP is lacking, as conclusions are only based on results obtained with inhibitors. Some other issues have to be addressed before acceptance, in particular, the relevance of the findings in CRC patients.

    4. Author Response:

      Public Reviews:

      Reviewer #1 (Public review):

      Naim et al. use genetically engineered mouse models and tissue culture cell lines to investigate the role of the SLAP adaptor protein in colonic epithelium and colon tumour formation. The SLAP adaptor protein is known to be a negative regulator of tyrosine kinase signaling in hematopoietic cells, but its role outside the immune system is less well defined. Here, the authors use genetically engineered SLAP-deficient mice, tissue-specific SLAP KO, and colonic organoids to demonstrate that SLAP is expressed in cells of the colonic epithelium, where it acts as a cell-autonomous regulator of proliferation and differentiation. In addition, they provide biochemical evidence that loss of SLAP expression in cultured colonic organoids results in increased Src family kinase activity and global tyrosine phosphorylation, consistent with its known role as a suppressor of tyrosine kinase activity in immune cells. Consistently, treatment with an SRC kinase inhibitor inhibited the growth of SLAP-deficient organoids. These data provide solid evidence of a cell-autonomous role of SLAP in the colonic epithelium.

      This work would be improved by further description and interpretation of the SLAP expression pattern shown in the constitutive and tissue-specific KO to further support the conclusions made. In Supplementary Figure 1, magnification of the colon epithelium areas with SLAP expression shown by b-gal and anti-SLAP staining, highlighting regions of interest, would better support the conclusions regarding SLAP expression in specific regions of the colon epithelium. In Supplementary Figure 1B, the authors should indicate that the SLAP staining referred to is epithelial and in resident immune cells, as is mentioned in the text. Also, magnification of the boxed area of LRG5 staining in Figure 1 would improve this figure.

      We thank the reviewer for their positive and constructive evaluation of our work.

      We agree that a more detailed description and visualization of SLAP expression in the colonic epithelium would strengthen our conclusions. In response, we will revise Fig 1 and S1 to better highlight SLAP expression patterns. Specifically, we will include higher-magnification images of the colonic epithelial regions in Suppl Fig 1, with clearly indicated regions of interest. We will also clarify in the legend of Suppl Figure 1B that SLAP staining is observed in both epithelial and resident immune cells, as described in the text. Additionally, we will provide a magnified view of the boxed area showing LGR5 staining in Figure 1 to improve clarity.

      Using a chemically induced model of colitis-associated cancer, the authors demonstrate that inactivation of SLAP shows a trend toward increased tumor formation (though this did not reach significance) as well as increased Src family kinase activity within tumors. Tumor spheres from SLAP-deficient animals showed enhanced growth that was suppressed by treatment with a Src family kinase inhibitor. Of note, the latter effect was specific to SLAP-deficient tumor spheres. These observations are convincing and support the authors' conclusion that SLAP has a tumor suppressor role in CRC through inhibition of SFK signaling.

      Mechanistically, elevated expression of the RTK, EphB2, was detected in immunoblots of SLAP KO colonic crypts, while overexpression of SLAP in CRC cell lines downregulated EphB2 protein levels. Using an EPHB2 inhibitor, the role of EPHB2 in the growth of SLAP-deficient colonic organoids was demonstrated. While these data generally support the authors' conclusion that SLAP limits colonic organoid growth by downregulating RTKS such as EphB2 and downstream Src family kinase activity, they do not show which cell types/regions in the colonic epithelium have increased EPHB2 protein and how this relates to SLAP and phospho-SRC expression, as shown in Figure 1 and Figure S1 immunocytochemistry. The expression of EphB2 and its role in colonic tumorsphere growth were not investigated.

      Overall, this work provides evidence of SLAP adaptor function in restricting tyrosine kinase signaling in the colonic epithelium, and suggests that loss of SLAP expression could promote tumorigenesis in this context.

      We also thank the reviewer for their positive comments regarding our tumor studies and the role of SLAP in regulating SFK signaling.

      Regarding the mechanistic insights involving EphB2, we appreciate the reviewer’s suggestion to further define its spatial expression and relationship with SLAP and phospho-SRC. To address this, we plan to extend our analysis to assess the effect of Slap depletion on EphB2 protein levels throughout the intestinal epithelium.

      We recognize that directly testing EphB2’s role in murine colonic tumorsphere formation would require a new cohort of SLAP knockout mice treated with AOM/DSS for 90 days, which is not feasible in the short term. To address this, we will instead use human colorectal cancer models to assess how SLAP modulation affects the response of tumoroids derived from cell lines to EphB2 inhibition, providing complementary mechanistic insights.

      Overall, we believe these additions will strengthen the manuscript and more fully address the reviewer’s concerns.

      Reviewer #2 (Public review):

      Summary:

      Protein tyrosine kinases are subject to diverse regulatory mechanisms controlling their activity in normal situations. The authors previously identified SLAP (Src-like adaptor protein), a negative regulator of receptor tyrosine kinase (RTK) signaling, as a key suppressor of the cytoplasmic tyrosine kinase SRC in the normal colon and demonstrated that SLAP is downregulated in a majority of colorectal cancers (CRCs).

      In this study, the authors further explored SLAP functions in mouse models using constitutive and inducible epithelial-specific Slap deletion (villin-CreERT2 model). They found that loss of SLAP augments colonic epithelial cell proliferation and that induction of tumorigenesis by the AOM/DSS protocol mimicking CRC leads to more aggressive tumors in the absence of SLAP. This effect is apparently cell-autonomous as growth of normal and tumoral colonic organoids is SLAP-dependent in in vitro settings. Finally, the authors define that, in colon, SLAP represses EphB2, an RTK lying upstream of SRC, and show that inhibitors of EphB2 can partially limit tumorigenic development in vitro.

      Strengths:

      The manuscript is clearly and concisely written, making it easy to follow. The data obtained in the mouse models are very convincing.

      Weaknesses:

      Direct evidence that EphB2 is activated/phosphorylated in the absence of SLAP is lacking, as conclusions are only based on results obtained with inhibitors. Some other issues have to be addressed before acceptance, in particular, the relevance of the findings in CRC patients.

      We thank the reviewer for their positive and constructive evaluation of our work.

      We agree that our conclusions regarding the SLAP–EphB2–SRC signaling axis rely in part on pharmacological inhibition. As outlined in the manuscript, EphB2 was selected primarily as a proof-of-concept receptor to illustrate how SLAP may indirectly regulate SRC activity through modulation of upstream receptor tyrosine kinases. We note that the use of two distinct classes of EphB inhibitors supports the robustness of our observations.

      To further strengthen this aspect of the study, we will assess EphB2 phosphorylation status in SLAP-deficient conditions, which will provide more direct evidence of its activation state and its contribution to SRC signaling.

    1. eLife Assessment

      This study presents an important study of the relationship between morphogen signaling and cell fate choices in the forming zebrafish neural tube, addressing a topical question in developmental biology. The authors provide a solid characterization of the precision limit for gene regulatory networks interpreting Shh, with single-cell resolution and state-of-the-art in vivo approaches. While the depth of analysis is restricted, particularly by the number of cell traces, the study will be of interest to developmental biologists interested in cellular decision-making.

    2. Reviewer #1 (Public Review):

      [Editors' note: This version has been assessed by the Reviewing Editor without further input from the original reviewers. Given the time elapsed since the original data collection, the authors have addressed the previous concerns by providing a more nuanced discussion of their results and acknowledging the limitations of the study to ensure the conclusions are supported by the existing data.]

      Throughout the paper, the authors do a fantastic job of highlighting caveats in their approach, from image acquisition to analysis. Despite this, some conclusions and viewpoints portrayed in this study do not appear well-supported by the provided data. Furthermore, there are a few technical points regarding the analysis that should be addressed.

      (1) Analysis of signaling traces

      - Relevance of "modeled signaling level": It is not clear whether this added complexity and potential for error (below) provides benefits over a more simple analysis such as taking the derivative (shown in Figure 3C). Could the authors provide evidence for the benefits? For example, does the "maximal response" given a simpler metric correlate less well with cell fate than that calculated from the fitted response?

      - Assumptions for "modeled signaling level": According to equation (1) Kaede levels are monotonically increasing. This is assumed given the stability of the fluorescent protein. However, this only holds for the "totally produced Kaede/fluorescence". Other metrics such as mean fluorescence can very well decrease over time due to growth and division. Does "intensity" mean total fluorescence? Visual inspection of the traces shown in Figure 2 suggests that "fluorescence intensity" can decrease. What does this mean for the inferred traces?

      - Estimation of Kaede reporter half-live: It is not clear how the mRNA stability of Kaede is estimated. It sounds like it was just assessed visually, which seems not entirely appropriate given the quantitative aspects of the rest of the study. Also, given that Shh signaling was inhibited on the level of Smoothened, it is not obvious how the dynamics of signaling shutdown affect the estimate. Most results in Figure 7 seem to be quite robust to the estimate of the half-live. That they are, might suggest that the whole analysis is unnecessary in the first place. However, not all are. Thus, it would be important to make this estimate more quantitative.

      (2) Assignment of fates and correlations

      - Error estimate for cell-type assignment: Trying to correlate signaling traces to cell fate decisions requires accurate cell fate assignment post-tracking. The provided protocol suggests a rather manual, expert-directed process of making those decisions. Can the authors provide any error-bound on those decisions, for example comparing the results obtained by two experts or something comparable? I am particularly concerned about the results regarding the higher degree of variability in the correlation between signaling dynamics and cell fate in the posterior neural tube. Here, the expression of Olig2 does not seem to segregate between different assigned fates, while it does so nicely in the anterior neural tube. This would suggest to me that cells in the posterior neural tube might not yet be fully committed to a fate or that there could be a relatively high error rate in assigning fates. Thus, the results could emerge from technical errors or differences in pure timing. Could the authors please comment on these possibilities?

      - Clustering and fates: One approach the authors use to analyze the correlation between signaling and fate is clustering of cell traces and comparison of the fate distributions in those clusters. There is a large number of clusters with only single traces, suggesting that the data (number of traces) might not be sufficient for this analysis. Furthermore, I am skeptical about clustering cells of different anterior-posterior identities together, given potential differences in the timing of signal reception and signaling. I am not convinced that this analysis reveals enough about how signaling maps to fate given the heterogeneity in traces in large clusters and the prevalence of extremely small clusters.

      - Signaling vector and hand-picked metrics: As an alternative approach, that might be better suited for their data, the authors then pick three metrics (based on their model-predicted signaling dynamics) and show that the maximal response is a very good predictor of fate for different anterior-posterior identities. Previous information-theoretic analysis of signaling dynamics has found that a whole time-vector of signaling can carry much more information than individual metrics (Selimkhanov et al, 2014, PMID: 25504722). Have the authors tried to use approaches that make use of the whole trace (such as simple classifiers (Granados et al, 2018, PMID: 29784812), or can comment on why this is not feasible for their data? The authors should at least make clear that their results present a lower bound to how accurately cells can make cell-fate decisions based on signaling dynamics.

      (3) Consequences of signaling heterogeneity

      The authors focus heavily on portraying that signaling dynamics are highly variable, which seems visually true at first glance. However, there is no metric used or a description given of what this actually means. Mainly, the variability seems to relate to the correlation between signaling and fate. However, given the data and analysis, I would argue that the decoding of signaling dynamics into fate is surprisingly accurate. So signaling dynamics that seem quite noisy and variable by visual inspection can actually be very well discriminated by cells, which to me appears very exciting.

      Indeed, simple features of signaling traces can predict cell fate as well as position (for anterior progenitors). Given that signaling should be a function of position, it naively seems as if signaling read-out could be almost perfect. It might be interesting to plot dorsal-ventral position vs the signaling metrics, to also investigate how Shh concentration/position maps to signaling dynamics, this would give an even more comprehensive view of signal transmission.

      There remains the discrepancy between signaling traces and fate in the posterior neural tube. The authors point towards differences in tissue architecture and difficulties in interpreting a "small" Shh gradient. However, the data seems consistent with differences in timing of cell-fate decisions between anterior and posterior cells. The authors show that fate does initially not correlate well with position in the posterior neural tube. So, signaling dynamics should likely also not, as they should rather be a function of position, given they are downstream of the Shh gradient. As mentioned above, not even Olig2 expression does segregate the assigned fates well. All this points towards a difference in the time of fate assignment between the anterior and posterior. Given likely delays in reporter protein production and maturation, it can thus not be expected that signaling dynamics correlate better with cell fate than the reporter "83%". Can the authors please discuss this possibility in the paper?

      Thus, while this paper represents an example of what the community needs to do to gain a better understanding of robust patterning under variability, the provided data is not always sufficient to make clear conclusions regarding the functional consequences of signaling dynamics.

    3. Reviewer #2 (Public Review):

      Summary:

      In this work, Xiong and colleagues examine the relationship between the profile of the morphogen Shh and the resulting cell fate decisions in the zebrafish neural tube. For this, the authors combine high-resolution live imaging of an established Shh reporter with reporter lines for the different progenitor types arising in the forming neural tube. One of the key observations in this manuscript is that, while, on average, cells respond to differences in Shh activity to adopt distinct progenitor fates, at the single cell level there is strong heterogeneity between Shh response and fate choices. Further, the authors showed that this heterogeneity was particularly prominent for the pMN fate, with similar Shh response dynamics to those observed in neighboring LFP progenitors.

      Strengths:

      It is important to directly correlate Shh activity with the downstream TFs marking distinct progenitor types in vivo and with single cell resolution. This additional analysis is in line with previous observations from these authors, namely in Xiong, 2013. Further, the authors show that cells in different anterior-posterior positions within the neural tube show distinct levels of heterogeneity in their response to Shh, which is a very interesting observation and merits further investigation.

      Weaknesses:

      This is a convincing work, however, adding a few more analyses and clarifications would, in my view, strengthen the key finding of heterogeneity between Shh response and the resulting cell fate choices.

    4. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Throughout the paper, the authors do a fantastic job of highlighting caveats in their approach, from image acquisition to analysis. Despite this, some conclusions and viewpoints portrayed in this study do not appear well-supported by the provided data. Furthermore, there are a few technical points regarding the analysis that should be addressed.

      We thank the reviewer for the comments, due to the age of the work and logistic constraints, we are unable to perform further experiments and analysis to address some of the concerns. We revised conclusions and viewpoints accordingly to reflect reviewer concerns.

      (1) Analysis of signaling traces

      Relevance of "modeled signaling level": It is not clear whether this added complexity and potential for error (below) provides benefits over a more simple analysis such as taking the derivative (shown in Figure 3C). Could the authors provide evidence for the benefits? For example, does the "maximal response" given a simpler metric correlate less well with cell fate than that calculated from the fitted response?

      We think the benefits of modeled signaling level are the conceptual accuracy to the extent possible with the data. It’s true that the assumptions brought-in may cause certain biases. We perform this and the simplest (raw data averaging, Fig.2). Intermediate results in between (such as the first derivative in Fig.3C) may correlate well or less well, but cannot be interpreted biologically.

      Assumptions for "modeled signaling level": According to equation (1) Kaede levels are monotonically increasing. This is assumed given the stability of the fluorescent protein. However, this only holds for the "totally produced Kaede/fluorescence." Other metrics such as mean fluorescence can very well decrease over time due to growth and division. Does "intensity" mean total fluorescence? Visual inspection of the traces shown in Figure 2 suggests that "fluorescence intensity" can decrease. What does this mean for the inferred traces?

      Yes the segmentations measure intensity in a fixed volume inside a cell, therefore it’s a spatial average (concentration) and is susceptible to cell volume changes. This has been noted in the revision. The raw measurement does fluctuate and can decrease, we think the short-time-scale fluctuations are likely measurement variations/errors rather than underlying big changes in concentration.

      Estimation of Kaede reporter half-live: It is not clear how the mRNA stability of Kaede is estimated. It sounds like it was just assessed visually, which seems not entirely appropriate given the quantitative aspects of the rest of the study. Also, given that Shh signaling was inhibited on the level of Smoothened, it is not obvious how the dynamics of signaling shutdown affect the estimate. Most results in Figure 7 seem to be quite robust to the estimate of the half-live. That they are, might suggest that the whole analysis is unnecessary in the first place. However, not all are. Thus, it would be important to make this estimate more quantitative.

      Yes we agree. Unfortunately we don’t have the quantitative data required to better estimate Kaede mRNA stability. The timing of Cyc inhibition to the ceasing of ptch mRNA production is roughly estimated but not necessarily precise in this context.

      (2) Assignment of fates and correlations

      Error estimate for cell-type assignment: Trying to correlate signaling traces to cell fate decisions requires accurate cell fate assignment post-tracking. The provided protocol suggests a rather manual, expert-directed process of making those decisions. Can the authors provide any error-bound on those decisions, for example comparing the results obtained by two experts or something comparable? I am particularly concerned about the results regarding the higher degree of variability in the correlation between signaling dynamics and cell fate in the posterior neural tube. Here, the expression of Olig2 does not seem to segregate between different assigned fates, while it does so nicely in the anterior neural tube. This would suggest to me that cells in the posterior neural tube might not yet be fully committed to a fate or that there could be a relatively high error rate in assigning fates. Thus, the results could emerge from technical errors or differences in pure timing. Could the authors please comment on these possibilities?

      This is a very insightful point. We did examine the posterior data again (cross-checked by 2 co-authors) to make sure the mixed situation has correct cell fate assignment. As established by others’ and our previous studies (See also Fig.1A), the identification of MFPs and LFPs in zebrafish spinal cord is very robust. The MFPs are the apical constricted single column of cells along the midline on top of the notochord, and the LFPs are the 2 columns of cells next to MFP on both sides. LFPs’ expression of olig2:gfp did vary more in the posterior (timing of response/commitment could be a factor as the reviewer pointed out), but eventually the cells at those positions will be V3 interneurons or floor plates and have not been observed to make motoneurons. There are 3 low Olig2:GFP pMNs in the anterior dataset (Fig.2B’) and 3 high Olig2:GFP LFPs in the posterior dataset (Fig.2D’) that we checked carefully. The heterogeneity argument is based on the verified tracking and final positioning of these cells.

      Clustering and fates: One approach the authors use to analyze the correlation between signaling and fate is clustering of cell traces and comparison of the fate distributions in those clusters. There is a large number of clusters with only single traces, suggesting that the data (number of traces) might not be sufficient for this analysis. Furthermore, I am skeptical about clustering cells of different anterior-posterior identities together, given potential differences in the timing of signal reception and signaling. I am not convinced that this analysis reveals enough about how signaling maps to fate given the heterogeneity in traces in large clusters and the prevalence of extremely small clusters.

      We agree. Due to the age of the work and logistic constraints, we are unable to perform further experiments and analysis to enrich the tracks for this revision. We are aware of upcoming, independent studies with many more systematic tracks and analysis which will address these concerns. We have added the caveats the reviewer raised.

      Signaling vector and hand-picked metrics: As an alternative approach, that might be better suited for their data, the authors then pick three metrics (based on their model-predicted signaling dynamics) and show that the maximal response is a very good predictor of fate for different anterior-posterior identities. Previous information-theoretic analysis of signaling dynamics has found that a whole time-vector of signaling can carry much more information than individual metrics (Selimkhanov et al, 2014, PMID: 25504722). Have the authors tried to use approaches that make use of the whole trace (such as simple classifiers (Granados et al, 2018, PMID: 29784812), or can comment on why this is not feasible for their data? The authors should at least make clear that their results present a lower bound to how accurately cells can make cell-fate decisions based on signaling dynamics.

      Thanks for these suggestions. We are limited by the measurement noise, coverage window of the traces and the number of tracks to make use of the full dynamics in a more informative manner.

      (3) Consequences of signaling heterogeneity

      The authors focus heavily on portraying that signaling dynamics are highly variable, which seems visually true at first glance. However, there is no metric used or a description given of what this actually means. Mainly, the variability seems to relate to the correlation between signaling and fate. However, given the data and analysis, I would argue that the decoding of signaling dynamics into fate is surprisingly accurate. So signaling dynamics that seem quite noisy and variable by visual inspection can actually be very well discriminated by cells, which to me appears very exciting.

      Yes – we agree that most cells are actually accurate in such a highly dynamic tissue. In the literature, the view has been more focused on how the GRN enables this accuracy. We therefore highlighted the heterogeneity and limit of accuracy of the GRN here. We added this point to make our presentation more balanced.

      Indeed, simple features of signaling traces can predict cell fate as well as position (for anterior progenitors). Given that signaling should be a function of position, it naively seems as if signaling read-out could be almost perfect. It might be interesting to plot dorsal-ventral position vs the signaling metrics, to also investigate how Shh concentration/position maps to signaling dynamics, this would give an even more comprehensive view of signal transmission.

      We’d refer readers to our earlier study Xiong et al., 2013 where ptch2:kaede, nkx2:gfp and olig2:gfp were plotted against position over time in single cell tracks. It was found that position was not a good predictor of signaling levels or cell fates at early stages when the cell fates were specified.

      There remains the discrepancy between signaling traces and fate in the posterior neural tube. The authors point towards differences in tissue architecture and difficulties in interpreting a "small" Shh gradient. However, the data seems consistent with differences in timing of cell-fate decisions between anterior and posterior cells. The authors show that fate does initially not correlate well with position in the posterior neural tube. So, signaling dynamics should likely also not, as they should rather be a function of position, given they are downstream of the Shh gradient. As mentioned above, not even Olig2 expression does segregate the assigned fates well. All this points towards a difference in the time of fate assignment between the anterior and posterior. Given likely delays in reporter protein production and maturation, it can thus not be expected that signaling dynamics correlate better with cell fate than the reporter "83%". Can the authors please discuss this possibility in the paper?

      Yes this is an important point/caveat of live signaling and fate tracking. As discussed in the manuscript, due to the sensitivity limit of fluorescent imaging, it’s difficult to determine the time when cells start to respond to the signal, and how variable that is from cell to cell. The posterior cells may be more variable in either spatial or temporal responses compared to the anterior and we are not able to distinguish that. However, signaling dynamics is not necessarily a good function of position or time either, there is no evidence for that in our results here. The 83% correlation is thus striking for the posterior progenitors indicating a certain robust logic in the GRN to capture a strong (even short-lived) response to Shh, regardless of position or time. This is an interest possibility (we do not claim it a mechanism as we have not tested it with perturbations) that challenges the prevailing view in the field that these progenitors integrate Shh exposure over time, or that they acquire positional information by reading a gradient.

      The discussion has been modified to be more nuanced about these points.

      Thus, while this paper represents an example of what the community needs to do to gain a better understanding of robust patterning under variability, the provided data is not always sufficient to make clear conclusions regarding the functional consequences of signaling dynamics.

      We quite agree. Together with the reviewer, we look forward to seeing the publication of some recent, independent progresses overcoming the challenges in our work by other colleagues.

      Reviewer #2 (Public Review):

      Summary:

      In this work, Xiong and colleagues examine the relationship between the profile of the morphogen Shh and the resulting cell fate decisions in the zebrafish neural tube. For this, the authors combine high-resolution live imaging of an established Shh reporter with reporter lines for the different progenitor types arising in the forming neural tube. One of the key observations in this manuscript is that, while, on average, cells respond to differences in Shh activity to adopt distinct progenitor fates, at the single cell level there is strong heterogeneity between Shh response and fate choices. Further, the authors showed that this heterogeneity was particularly prominent for the pMN fate, with similar Shh response dynamics to those observed in neighboring LFP progenitors.

      Strengths:

      It is important to directly correlate Shh activity with the downstream TFs marking distinct progenitor types in vivo and with single cell resolution. This additional analysis is in line with previous observations from these authors, namely in Xiong, 2013. Further, the authors show that cells in different anterior-posterior positions within the neural tube show distinct levels of heterogeneity in their response to Shh, which is a very interesting observation and merits further investigation.

      Weaknesses:

      This is a convincing work, however, adding a few more analyses and clarifications would, in my view, strengthen the key finding of heterogeneity between Shh response and the resulting cell fate choices.

      We thank the reviewer for the comments, due to the age of the work and logistic constraints, we are unable to perform further experiments and analysis to address some of the concerns. We revised conclusions and viewpoints accordingly to reflect reviewer concerns.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for The Authors):

      Minor comments:

      y-axis label suddenly changes to Ptch2-reporter level in Figure 5. Is what is plotted different from what is seen as examples in Figure 3?

      Thanks! Figure 5 tracks are as Figure 3B, this has been annotated in the figure legends.

      There are random bounding boxes in some of the figures.

      Sometimes the m in "More dorsal" is stylized with a capital M and sometimes not. It is somewhat confusing as a name for cell types but it is fine if no alternative can be found.

      This study unfortunately does not include markers that distinguish the interneurons dorsal to pMNs. We categorized them collectively as “more dorsal”.

      Response-time is defined as "the amount of time with an above-basal Shh response". This seems to me as the definition of response duration. I would assume that response-time, means the time it takes until a response is first observed. Please consider changing this.

      We did not use “duration” because a response time course recorded in these tracks may include multiple durations (on and off). The duration of exposure/response has been specifically used in the field as a single period of response. So it’s a sum of active responding time here. Clarified in the text.

      Reviewer #2 (Recommendations for The Authors):

      (1) The authors address several possible setbacks of transforming the measured fluorescence intensity of the patched reporter into a readout of the Shh signaling activity over time, however, one aspect that isn't directly addressed is the potential effect of differences in the z position of analyzed cells. These could, at least in principle, be sufficient to introduce significant noise in the fluorescence measurements. Can the authors subset their datasets by initial, as well as average, z position and then re-examine the measured trends for both Shh activity and the intensity of the cell fate reporters used in the study?

      The zebrafish early neural plate/tube has a small thickness in z in dorsal-ventral imaging and the tissue is transparent. The depth-associated scattering contributes very little, if at all to the fluorescent signals in the imaged time window. This can be seen in the nuclear/membrane signal of the movies, which is largely uniform across the tissue in z in the neural tissue. It can also be seen that the notochord cells, further ventral, appears to be dimmer.

      (2) It is critical for the validity of this study that the intensity of the patched reporter introduced by the authors in 2012, and used again in this study, faithfully represents the signaling activity of Shh. In this study, the authors provide measurements of the transcriptional rate of Kaede and additional modeling for this purpose. However, an important point is to determine how sensitive is the reporter to changes in Shh signaling of different magnitudes?

      We consider this BAC reporter line a good (probably still the best live reporter) one as it resolves the endogenous gradient up to the dorsal interneuron domains (Huang et al., 2012, Xiong et al., 2013) and responds well to perturbations (Notch, Cyclopamine, etc). But it’s true that we don’t have information of how sensitive it responds to changes of different magnitude. As far as we know, there is no in vivo, single cell information of how Shh targets respond to signaling of different magnitudes.

      (3) To strengthen the previous point, it would be nice to extend the analysis in Figure 2, at least partially, using other readouts for Shh activity (e.g. GBS-GFP)?

      We have used a GBS-RFP line previously and found it to be lower resolution in terms of showing the DV gradient, compared to ptch2:kaede.

      (4) It is unclear to me what is the relevant time window during which cells respond to Shh in the anterior versus posterior domains to determine progenitor specification. This is a concern to me, since: i) the average heterogeneity of Shh activity seems to increase strongly in time (Figure 2A/C); and ii) it is important to exclude that the finding of heterogeneous relationship between Shh activity and fate choices is largely driven by later timepoints, where potentially its activity is no longer relevant for cell fate specification. Can this point be clarified when this data is introduced in the manuscript and further discussed?

      Yes this is an important point/caveat of live signaling and fate tracking. As discussed in the manuscript, due to the sensitivity limit of fluorescent imaging, it’s difficult to determine the time when cells start to respond to the signal, and how variable that is from cell to cell. The posterior cells may be more variable in either spatial or temporal responses compared to the anterior and we are not able to distinguish that.

      (i) The ptch2:kaede reporter variability is higher in terms of magnitude (the signal gets brighter) in later times but the heterogeneity (overlap between difference cell fate groups) is lower in later times

      (ii) Similarly, the heterogenous relationship is more pronounced in early time points. Since we do not know exactly when the activity becomes no longer relevant (from our earlier studies we do think that the cells become specified early, when Shh signaling is noisy), we modelled the response profile and searched for a good predictor. The maximum response stands out, particularly as a good indicator for the posterior cells, suggests an early window/time of specification.

      Discussion has been modified to clarify these points.

      (5) Is the response of the patched reporter, as well as cell fate reporters, to defined concentrations of exogenously provided Shh heterogeneous, for instance, in in vitro experiments?

      Well-controlled (e.g., microfluidics and labeled Shh molecules) in vitro experiments will be fantastic future directions. Existing tissue explant + Shh dose approaches do not resolve the heterogeneity of exposure at single cell level but may be helpful in testing the limits and variabilities at different magnitudes.

      (6) The source of noise in this system is not entirely clear to me. The authors seem to attribute the heterogeneity they observe to the way cells respond to Shh, but can it be excluded that the morphogen profile is itself noisy to start with? It is currently difficult to distinguish between these two possibilities, given that the Shh activity reporter used in this study is itself a transcriptional output of the pathway. Can the distribution of Shh itself be analyzed (even if in immunostainings) during neural tube formation?

      Yes we fully agree. More quantitative analysis may help dissecting the sources of noise. The morphogen profile (particularly through time) will be great. Currently no reagent is available to achieve that. Studies using an engineered morphogen or tagged morphogen suggest that the pattern through tissue reasonably captures simple diffusion dynamics. However, at single cell level considerable randomness may still remain and difficult to quantitatively compare with still staining.

      (7) It is unclear to me how the authors define the ultimate cell fate of cells in their analysis in Figure 6. The brief description in the methods and in the manuscript seems to suggest that, in combination with marker expression, the cell position is used as a criteria to assign the fate to the progenitors - if this is the case, I guess the observed relationship in Figure 6 with LMDV distance is almost a control? This could be clarified for the readers.

      Yes indeed Figure 6 is a control as LMDV distances lead to final positions which form part of our determination of cell fates.

      As established by others’ and our previous studies (See also Fig.1A), the identification of MFPs and LFPs in zebrafish spinal cord is very robust. The MFPs are the apical constricted single column of cells along the midline on top of the notochord, and the LFPs are the 2 columns of cells next to MFP on both sides. LFPs’ expression of olig2:gfp did vary more in the posterior (timing of response/commitment could be a factor as the reviewer pointed out), but eventually the cells at those positions will be V3 interneurons or floor plates and have not been observed to make motoneurons. There are 3 low Olig2:GFP pMNs in the anterior dataset (Fig.2B’) and 3 high Olig2:GFP LFPs in the posterior dataset (Fig.2D’) that we checked carefully.

      The methods of fate determination are described in detail in methods.

      (8) The graphs in Figures 6 and 7 are difficult to interpret. What proportion, and absolute number, of cells are "mis specified" when the authors show the distinct colored lines in the pMN, LFP or more dorsal domains? How do the authors determine where each cell fate domain begins and ends to access for "mis-specified" cells? Can the authors also provide the corresponding experimental images in the figure?

      We apologize for the difficulties to interpret these figures. The graphs are a ranked list of all cells using the specified metric. The visual is to help generate an intuition of how mixed vs clear-cut the pattern is given the tested metric. They are not to be interpreted as the actual pattern in the tissue and there are no data images that show these patterns.

      (9) Given the experimental limitations/technical challenges discussed by the authors during the paper, the score of around 90% of predictability of cell fate choices is rather high in the anterior domain, suggesting a minor functional role for heterogeneity in this region. Even for the posterior domain, the score of 83% predictability based on the maximum response to Shh is still relatively high. In my view, this author's conclusions should be adjusted to make this difference clearer in the abstract and discussion, highlighting that the heterogeneity between Shh response and cell fate choices, particularly in the pMN fate, are stronger in the posterior domain affecting the precision of cell fate decisions particularly in this region. Can the authors further comment on potential mechanisms driving this difference?

      Yes – we agree that most cells are actually accurate in such a highly dynamic tissue. In the literature, the view has been more focused on how the GRN enables this accuracy. We therefore highlighted the heterogeneity and limit of accuracy of the GRN here.

      We have added the fact that the Shh response is still the main determinant of the pattern despite the heterogeneity in the Discussion. We also further discussed possibilities of the anterior posterior differences.

      (10) Following up from the previous point, the data in Figure 7 suggests that there might be different underlying mechanisms in how anterior and posterior cells interpret the Shh profile, with anterior cells potentially responding to the integrated concentration of Shh (since response time, average response, or maximum response to Shh all provide similar predictability scores for cell fate choices). In contrast, only the maximum response to Shh can provide a good prediction of posterior cell fate, consistent with a more instantaneous response to morphogen concentration (and thus potentially more error-prone measurement of the Shh profile?). This is a very interesting observation in my view. Could this be further tested?

      Thank you. Yes we found this very interesting too. We discussed the possibilities, including the reviewer’s suggestion that these cells may have different contexts or strategy to interpret the signal. It is also possible that the anterior cells use the same strategy (maximum response at an early time) and the subsequent response/duration do not matter to their fate commitment. A precise approach to shut down Shh response dynamics in single cells (e.g., optogenetics) will enable the test of these ideas. We hope following up studies will take such approaches.

    1. eLife Assessment

      In this important study, DNA and RNA are co-imaged in single cells to show that the proximity of topologically associated domain (TAD) boundaries is uncoupled from the transcriptional activity of nearby genes. The evidence supporting these conclusions is convincing for the regions examined, with high-throughput imaging providing robust statistics. This work will be of interest to researchers studying genome architecture and its relationship to gene regulation.

    2. Reviewer #2 (Public review):

      Summary:

      Almansour et al., investigate whether the proximity of TAD boundaries is directly linked to gene activity. The authors use high-throughput imaging to simultaneously measure the gene activity and physical distances between boundary regions in an allele-specific manner. Using transcriptional inhibitors, expression induction, and acute depletion of CTCF and cohesin, they test whether proximity of boundaries affects, or is affected by, gene activity.

      Strengths:

      The combined use of DNA and RNA imaging enabled simultaneous measurement of boundary proximity and transcriptional status at individual alleles. This allows single-allele correlation between boundary proximity and gene activity at multiple loci across thousands of alleles.

      The use of both transcription inhibitors and transcription stimulation provides compelling and consistent evidence that boundary proximity can be disconnected from a gene's activity. The data convincingly support the conclusion that stable proximity between boundary regions is not required for ongoing transcription at the loci and timescales examined.

      This work strengthens the emerging view that genome organization at the level of domain boundaries does not impose a deterministic control over transcription.

      Strong disruption of boundary distances is only observed upon depletion of cohesin. Notably, this corresponds with the largest changes in gene activity. In contrast, depletion of CTCF actually had minimal impact on boundary distances and also had minimal impact on gene activity. This makes sense in light of previous work, where live cell imaging demonstrated that cohesin is more important for domain-structure, whereas CTCF is only important for blocking cohesin from continuing on, such that the fully formed loop occurs in a very small percentage of cells. Therefore, the fact that disruption of cohesin (more important for internal domain structure) affects gene activity while disruption of CTCF does not is exceptionally interesting.

      Weaknesses:

      In untreated cells, the distribution of distance measurements between boundary probes is exceptionally narrow. While depletion of RAD21 clearly demonstrates an ability to detect changes in this distribution, this tight baseline distribution may limit sensitivity to more subtle changes (like those one might expect from transcriptional influences).

      This approach primarily tests the role of boundary interactions rather than domain organization as a whole.

    3. Reviewer #3 (Public review):

      Summary:

      This study addresses a central question in genome organization: whether the positions of chromosomal domain boundaries are functionally coupled to gene activity. The authors use high-throughput imaging to simultaneously measure distances between boundary markers and nascent RNA production in thousands of individual cells, enabling direct comparison of boundary positions and transcriptional status at single chromosomal copies. This approach is applied across multiple loci, genes, and cell types, and is combined with acute transcriptional perturbations and depletion of architectural proteins to test the relationship between chromosome structure and gene activity in both directions.<br /> This work makes a meaningful contribution by providing direct, single-cell evidence that domain boundary positions and gene activity are largely uncoupled in this system.

      Strengths:

      A major strength of the work is its single-cell, single-allele resolution, which overcomes the averaging inherent to population-based assays. The authors consistently find that boundary proximity is largely independent of transcriptional status: active and inactive alleles have similar boundary distances, transcriptional perturbations do not shift boundary distributions, and depletion of the boundary factor CTCF does not alter gene expression, whereas cohesin depletion affects both boundary organization and transcription. These conclusions are supported by large numbers of alleles, multiple loci and cell types, and internal controls that distinguish boundary-specific effects from broader chromatin influences. The study offers a robust, scalable imaging pipeline that will be valuable for future studies linking genome organization and transcription at single-cell resolution.

      Weaknesses:

      The study has important limitations that are acknowledged by the authors. Measurements are restricted to distances between flanking boundaries and do not capture internal domain architecture, sub-domain structure, or finer-scale regulatory contacts. Resolution is limited by probe size and imaging, potentially masking subtle positional changes, and only a small set of loci is examined, leaving open how broadly the uncoupling generalizes. Some perturbation effects, particularly for RAD21, may involve mechanisms beyond boundary disruption.

    4. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) Conceptual framing and interpretation:

      The central conclusion may require more precise framing to avoid potential overreach. The authors' interpretation equating "physical distance between TAD boundaries" with overall "TAD boundary architecture," and "transcriptional bursting events" with broader "gene activity," could benefit from clarification. This framing may not fully capture the temporal dynamics of transcription or the regulatory complexity within TADs. Furthermore, the broad conclusion of an uncoupled relationship appears to challenge extensive prior evidence from perturbation studies showing that disrupting TAD boundaries can alter gene expression. The authors' own observation of reduced gene activity upon RAD21 degradation suggests that global TAD disruption can affect transcription. A more precise and limited conclusion, acknowledging that their data demonstrate a lack of detectable correlation between boundary distance and bursting activity in their system, would be more accurate and help reconcile these findings with the existing literature.

      We have modified statements throughout the manuscript, including in the title, to enhance the precision of our conclusions to avoid overreach. We have also added on p. 16 of our Discussion, a separate section on the limitations of the study, noting that our conclusions are limited to TAD boundary distances and do not reflect the structure of TAD boundaries or of TADs themselves. We have also expanded our Discussion of possible TAD functions on p. 14/15.

      (2) Technical methods and data presentation:

      (2.1) Accuracy and dimensionality of distance measurements: The manuscript does not clearly state whether distances are measured in 2D or 3D, nor does it sufficiently address precision limits. The stated Z-step size (1 µm) may be inadequate for accurately measuring sub-micron chromatin distances in 3D.

      We state in both the Results and Methods that our data represent 2D distances derived from maximal-intensity projections of 3D image stacks. We previously published a detailed analysis of the precision of this measurement approach applied to chromatin interactions and documented the effect of 2D vs 3D analysis on these types of measurements. This study by Finn et al., 2022 is cited in the text. We also show in Figure S3 and mention on p. 6 and 10 that we observe similar results using either 2D or 3D analysis.

      (2.2) Probe design and systematic error: The genomic coverage size of the BAC probes used for DNA FISH is not explicitly stated. Large probe coverage could inherently blur the precise spatial location of adjacent DNA loci. The reported average distance (~300 nm) may be influenced by the physical size of the probes, as well as systematic expansion or distortion introduced by sample fixation and FISH processing. Although such technical limitations are currently unavoidable, the authors should clarify how these factors might affect their ability to detect subtle distance changes.

      The genomic location and size of all probes are provided in Supplementary Table 1. We deliberately use relatively large BAC probes both to generate robust, highly reproducible signals and to eliminate effects arising from local chromatin behavior. In line with earlier characterization of BAC probes (Finn et al., Cell, 2019; Finn et al., Methods, 2022), we find a strong correlation between micro-C/Hi_C interaction frequency and distance measurements. Systematic errors such as sample fixation and FISH processing have previously been evaluated by comparison to live cell data (see Finn et al., 2019) and found to be negligible, especially as all our analyses involve pairwise comparisons, which would both be similarly affected by systematic errors. We discuss resolution limits due to probe size in our new section on study limitations on p. 16.

      (2.3) Data Visualization: The manuscript would benefit from including representative, zoomed-in regions of interest from the raw imaging data. This would allow readers to visually assess measured distance differences against background noise.

      Raw images for inspection at any magnification are available at https://figshare.com/projects/_b_TAD_boundaries_and_gene_activity_are_uncoupled_b_/271078.

      (2.4) Potential impact of resolution limits: In Figure 5, the micro-C data reveal a clear difference in interaction patterns inside versus outside the VARS2 locus TAD, yet the imaging data show no corresponding distance difference. This strongly suggests that the current imaging system, limited by optical resolution, probe size, and localisation accuracy, may be unable to resolve finer-scale spatial reorganizations associated with specific chromatin conformations (e.g., enhancer-promoter loops). The authors should explicitly discuss that their conclusion of "no coupling observed" may be constrained by the resolution and sensitivity of their method and does not preclude the possibility of detecting such associations with higher-precision measurements or in live-cell dynamics.

      We generally see good agreement between micro-C/Hi-C data and distance measurements. Specifically, we consistently find closer proximity of boundaries than non-boundaries and larger boundary distances for larger TADs than for smaller ones, as presented throughout the study. Contrary to the reviewer’s statement, this is also true for the VARS2 TAD, where we find statistically significant shorter boundary distances for boundary probes (350 nm) vs the outside control region (390 nm), which correlates with the difference in micro-C interaction score of 5847 vs 2308. These data are shown in Figure 3. Regardless, we mention the issue of resolution due to probe size in the study limitation section on p. 16.

      Reviewer #2 (Public review):

      In untreated cells, the distribution of distance measurements between boundary probes is exceptionally narrow. While depletion of RAD21 clearly demonstrates an ability to detect changes in this distribution, this tight baseline distribution may limit sensitivity to more subtle changes (like those one might expect from transcriptional influences). In addition, the correlation analysis is asymmetric, primarily stratifying by transcriptional status and then comparing boundary distances. Given the central claim that boundary architecture does not influence gene activity, the analysis should be done from the opposite perspective (stratifying by boundary distance).

      We mention the limitations on resolution of our approach in our discussion of study limitations on p. 16. An example of an analysis of stratifying by boundary distance is presented in Figure S3C. The conclusion is the same as stratifying by activity status.

      Strong disruption of boundary distances is only observed upon depletion of cohesin. Notably, this corresponds with the largest changes in gene activity. In contrast, depletion of CTCF actually had minimal impact on boundary distances and also had minimal impact on gene activity. This makes sense in light of previous work, where live cell imaging demonstrated that cohesin is more important for domain-structure, whereas CTCF is only important for blocking cohesin from continuing on, such that the fully formed loop occurs in a very small percentage of cells. Therefore, the fact that disruption of cohesin (more important for internal domain structure) affects gene activity while disruption of CTCF does not is exceptionally interesting but is lacking from the discussion.

      We mention the stronger effect of cohesion depletion compared to CTCF loss on gene expression in multiple locations in the Results and Discussion.

      On a related note, this approach primarily tests the role of boundary interactions rather than domain organization as a whole, and it should be acknowledged that internal domain structures are not directly assessed.

      We have modified statements throughout the manuscript to clearly indicate that our conclusions relate to boundary interactions rather than domain organization as a whole. We also discuss this in our section on study limitations.

      The comparison to work in other organisms (particularly the comparisons made to Drosophila) should be handled with care. The mechanisms underlying domain formation differ substantially across these systems, particularly regarding the differences in CTCF's role.

      We have modified our discussion of the data on Drosophila TADs, particularly as it relates to CTCF.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I couldn't locate the image data from figshare with the information provided (DOI: 10.6084/m9.figshare.30728354)

      The link has been updated

      https://figshare.com/projects/_b_TAD_boundaries_and_gene_activity_are_uncoupled_b_/271078.

      Reviewer #2 (Recommendations for the authors):

      Some of the conclusions overreach. I recommend revising the claims and discussion to focus solely on the proximity of boundaries, instead of TADs themselves. This would match better with your experiments.

      We have modified statements throughout the manuscript, including in the title, to enhance the precision of our conclusions to avoid overreach. We have also added on p. 16, a separate section on limitations of our study, noting that our conclusions are limited to TAD boundary distances and do not reflect on the structure of the TADs themselves. We have also expanded our Discussion of possible TAD functions on p. 14/15.

      I do disagree with the interpretation of the data in some parts, particularly at the end, where you state that disruption of TADs does not impact gene activity. For example, "Altogether, these results demonstrate that disruption of TAD boundary architecture is insufficient to alter gene expression" doesn't seem to match the results. Sure, depletion of CTCF minimally impacted gene expression, but it also minimally impacted the boundary distances. I think it is interesting that depletion of RAD21 had a bigger impact on both gene expression and boundary distances, and this should be discussed.

      We have deleted this statement and now mention on p. 13 that RAD21 depletion affected gene expression, whereas loss of CTCF did not, and on p. 15 that loss of RAD21 had a greater impact on boundary distances than loss of CTCF. We have also expanded our Discussion of possible TAD functions on p. 14/15.

      Related to this, I also recommend expanding the discussion of prior live-cell imaging work (ref 32) that showed that the fully formed CTCF loop is a rare event.

      We have expanded the discussion of prior live-cell imaging work in several locations.

      All the analysis is done from the perspective of the gene expression (e.g. group by expression and then measure distances). It would help to show that the inverse analysis is consistent (e.g. group by distances and measure gene expression).

      Analysis of data stratified by distance measurements is shown in Figure S3C.

      The discussion of the Drosophila work is strange, given that CTCF in Drosophila has a very different N-terminus, explaining why it doesn't really form loops. Sure, maybe it contributes to domains in some way, but probably no more than the dozens of other architectural proteins that have been found in that system. This work clearly focuses on CTCF-loop domains, so I would be specific about that. In the introduction, you do a good job of saying "in human cells, TADs are.... marked by binding sites for the CTCF protein". However, then you overgeneralize and state that TADs form via a process of loop extrusion. I think a simple statement before this to say that TADs in human cells have become somewhat synonymous with CTCF loop domains, and that is how you will use the term here. However, other organisms have TADs despite the lack of conservation of the CTCF protein.

      We have modified the text accordingly.

      On a related note, in the discussion, you cite two papers in Drosophila to state that "TADs form prior to the establishment of cell-type-specific gene expression programs", but that's not entirely accurate for those papers. They actually show that TADs occur coincident with ZGA, but loops form before that (ref 23: Espinola et al), or that there are indeed a few boundaries that show up before ZGA, but these correspond to RNA Polymerase (ref 24: Ing-Simmons et al.).

      We have corrected this statement.

    1. eLife Assessment

      The manuscript presents important findings on how C. elegans can utilize distinct molecular mechanisms and circuit engagements to regulate tactile-dependent locomotory behaviours through the AFD thermosensory neuron. The authors use multiple techniques including microfluidics, genetic manipulations and single-copy rescue experiments, to provide compelling evidence for the role of AFD/AIB electrical synaptic connections in this behaviour. The reviewers are satisfied with the comprehensive revisions made by the authors.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Rosero and Bai examined how the well-known thermosensory neuron in C. elegans, AFD, regulates context-dependent locomotory behavior based on the tactile experience. Here they show that AFD uses discrete cGMP signalling molecules and independent of its dendritic sensory endings regulates this locomotory behavior. The authors also show here that AFD's connection to one of the hub interneurons, AIB, through gap junction/electrical synapses, is necessary and sufficient for the regulation of this context-dependent locomotion modulation.

      Strengths:

      This is an interesting paper showcasing how a sensory neuron in C. elegans can employ a distinct set of molecular strategies and different physical parts to regulate a completely distinct set of behaviors, which were not been shown to be regulated by AFD before. The experiments were well performed and the results are clear. However, there are some questions about the mechanism of this regulation. This reviewer thinks that the authors should address these concerns before the final published version of this manuscript.

      Comments on revisions:

      In this revised manuscript, Rosero and Bai satisfactorily addressed all the concerns raised by this reviewer regarding their original manuscript. This reviewer appreciates the authors' effort. This revised and improved manuscript demonstrates that a sensory neuron in C. elegans can utilize distinct molecular strategies and circuit engagements to regulate distinct sets of behaviors. This reviewer believes that the manuscript is suitable for final acceptance in eLife.

    3. Reviewer #2 (Public review):

      The goal of the study was to uncover the mechanisms mediating tactile-context-dependent locomotion modulation in C. elegans, which represents an interesting model of behavioral plasticity. Starting from a candidate genetic screen focusing on guanylate cyclase (GCY) mutants, the authors identified the AFD-specific gcy-18 gene as essential for tactile-context-dependent locomotion modulation. AFD has been primarily characterized as a thermosensory neuron. However, key thermosensory transduction genes and the sensory ending structure of AFD were shown here to be dispensable for tactile-context locomotion modulation. AFD actuates tactile-context locomotion modulation via the cell-autonomous actions of GCY-18 and the CNG-3 cyclic nucleotide-gated channel, and via AFD's connection with AIB interneurons through electrical synapses. At the circuit level, AIB also receive inputs from the mechanosensory neuron FLP, which was also shown to be relevant for tactile-context-dependent locomotion modulation.

      For this study, the authors combined a very clever microfluidic-based behavioral assay with a large set of genetic manipulations to dissect the molecular and cellular pathways involved. Rescue experiments with single-copy transgenes are particularly convincing. The study is very clearly written, and the figures are nicely illustrated with diagrams that effectively convey the authors' interpretation. Overall, the convergence of behavioral assays, genetics, and circuit analysis provides convincing support for the proposed role of the AFD-AIB connection, potentially downstream of FLP via synapic and of other mechanosensory neurons via extra-synaptic communication.

      The facts that AFD mediates tactile-context locomotion modulation, that this role relies on GCY-18, and on electrical synapses linking AFD to AIB are new, somewhat unexpected, and interesting. The study raises intriguing and addressable questions about the role of innexin-based cellular communication in a multimodal sensory-behavior microcircuit, including the direction and nature of the signal(s) transmitted through these electrical synapses. These questions remain difficult to address in most experimental systems. The compact and genetically tractable nervous system of C. elegans provides a powerful entry point for addressing them in the context of an intact in vivo circuit.

    4. Reviewer #3 (Public review):

      Summary:

      Rosero and Bai report an unconventional role of AFD neurons in mediating tactile-dependent locomotion modulation, independent of their well-established thermosensory function. They partially elucidate the signaling mechanisms underlying this AFD-dependent behavioral modulation. The regulation does not require the sensory dendritic endings of AFD but rather the AFD neurons themselves. This process involves a distinct set of cGMP signaling proteins and CNG channel subunits separate from those involved in thermosensation or thermotaxis. Furthermore, the authors demonstrate that AIB interneurons connect AFD to mechanosensory circuits through electrical synapses. They conclude that, beyond its primary function in thermosensation, AFD contributes to context-dependent neuroplasticity and behavioral modulation via broader circuit connectivity.

      While the discovery of multifunctionality in AFD is not entirely unexpected, given the limited number of neurons in C. elegans (302 in total), the molecular and cellular mechanisms underlying this AFD-dependent behavioral modulation, as revealed in this study, provide valuable insights into the field.

      Strengths:

      (1) The authors uncover a novel role of AFD neurons in mediating tactile-dependent locomotion modulation, distinct from their well-established thermosensory function, providing an important conceptual contribution to our understanding of how individual neurons can support multiple, mechanistically separable behavioral functions.

      (2) They provide meaningful mechanistic insight into how AFD, GCY-18-dependent cGMP signaling, and AFD-AIB electrical coupling contribute to this AFD-dependent behavioral modulation.

      (3) The neural behavior assays utilizing two types of microfluidic chambers (uniform and binary chambers) are innovative and well-designed. In the revised manuscript the authors introduce a removable-barrier assay that physically separates exploration and assay phases. This independent behavioral approach addresses prior concerns about ongoing sensory input and confirms that tactile experience alone is sufficient to modulate locomotion.

      (4) By comparing AFD's role in locomotion modulation to its thermosensory function throughout the study, the authors present strong evidence supporting these as two independent functions of AFD.

      (5) The finding that AFD contributes to context-dependent behavioral modulation is significant, further reinforcing the growing evidence that individual neurons can serve multiple functions through broader circuit connectivity.

      Weaknesses:

      While the requirement for AFD, GCY-18, and AFD-AIB electrical coupling is well supported, the directionality of information flow and the precise mode of interaction between mechanosensory neurons, AIB, and AFD remain unclear and an area of future studies.

      Overall, the authors successfully achieve their primary aim of identifying and characterizing a novel role for AFD in tactile experience-dependent locomotion modulation. This work contributes meaningfully to the growing body of literature demonstrating multifunctionality and context-dependent reconfiguration of individual neurons within compact nervous systems.

    5. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Although the reviewers agree on the potential importance of this study, they have brought out multiple pertinent queries with respect to the interpretation of some of the results presented in the manuscript, that the authors should consider addressing. The reviewers have also suggested modifications that would increase the clarity of the manuscript.

      We appreciate the thoughtful evaluation of our manuscript by the reviewers and the editor. We are encouraged by their recognition of the importance of our study and have carefully considered all the points raised. In response, we have added new data and revised the text to address the concerns and improve the clarity of the manuscript. Our detailed responses to the reviewers’ comments are provided below.

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Rosero and Bai examined how the well-known thermosensory neuron in C. elegans, AFD, regulates context-dependent locomotory behavior based on the tactile experience. Here they show that AFD uses discrete cGMP signaling molecules and independent of its dendritic sensory endings regulates this locomotory behavior. The authors also show here that AFD's connection to one of the hub interneurons, AIB, through gap junction/electrical synapses, is necessary and sufficient for the regulation of this context-dependent locomotion modulation.

      Strengths:

      This is an interesting paper showcasing how a sensory neuron in C. elegans can employ a distinct set of molecular strategies and different physical parts to regulate a completely distinct set of behaviors, which were not been shown to be regulated by AFD before. The experiments were well performed and the results are clear. However, there are some questions about the mechanism of this regulation. This reviewer thinks that the authors should address these concerns before the final published version of this manuscript.

      Weaknesses:

      (1) The authors argued about the role of prior exposure to different physical contexts which might be responsible for the difference in their locomotory behavior. However, the worms in the binary chamber (with both non-uniformly sized and spaced pillars) experienced both sets of pillars for one hour prior to the assay and they were also free to move between two sets of environments during the assay. So, this is not completely a switch between two different types of tactile barriers (or not completely restricted to prior experience), but rather a difference between experiencing a more complex environment vs a simple uniform environment. They should rephrase their findings. To strictly argue about the prior experience, the authors need to somehow restrict the worms from entering the uniform assay zone during the 1hr training period.

      We agree that, in the original design, worms in the binary chamber experience a more complex physical environment while retaining access to both exploration and assay zones. We have therefore revised the manuscript to more clearly distinguish between behavioral differences due to exposure to a complex environment and modulation driven by prior experience.

      To directly test whether locomotion modulation can be sustained by prior physical experience in the absence of continued access to the exploration zone, we introduced a barrier-based assay that prevents worms from re-entering the exploration zone before locomotion is measured. The results section has been revised accordingly to explicitly address this point.

      Revisions to the manuscript:

      Lines 122-139: Added two paragraphs describing the new assay and summarizing the corresponding results.

      “Because worms in the binary chamber are exposed to both pillar types and remain free to move between exploration and assay zones, the behavioral differences described above could reflect exposure to a more complex physical environment rather than prior experience alone. To directly test whether locomotion is modulated by prior physical experience independently of continued access to the exploration zone, we designed microfluidic chambers in which the assay zone could be separated from the exploration zone by a removable barrier (Fig. 1–Supplement 1A). In these chambers, worms were initially allowed to explore the entire device, including exploration zones that either matched or differed from the assay zone. A barrier was then inserted to prevent worms in the assay zone from re-entering the exploration zones.

      Under these conditions, locomotion immediately after barrier insertion was higher in worms that had previously explored physical settings matching the assay zone (205 ± 8 µm/s) than in worms that had explored non-matching settings (151 ± 7 µm/s; p = 0.006; Fig. 1–Supplement 1B). This difference persisted when worms were recorded 40 minutes after barrier insertion, with animals in matching chamber retaining their higher locomotion rates (218 ± 11 µm/s) compared to those in non-matching chambers (185 ± 8 µm/s; p = 0.02; Fig. 1–Supplement 1B). These findings demonstrate that prior exploration of distinct physical environments can modulate locomotion even when worms are prevented from returning to those environments, supporting a role for prior physical experience independent of ongoing sensory input.”

      Figure 1–Supplement 1: New figure showing the experimental design and behavioral results.

      (2) The authors here argued that the sensory endings of AFD are not required for this novel role of AFD in context-dependent locomotion modulation. However, gcy-18 has been shown to be exclusively localized to the ciliated sensory endings of AFD and even misexpression of GCY-18 in other sensory neurons also leads to localizations in sensory endings (Nguyen et. al., 2014 and Takeishi et. al., 2016). They should check whether gcy-18 or tax-2 gets mislocalized in kcc-3 or tax-1 mutants.

      As the reviewer suggested, we examined GCY-18 localization in wild type animals and in mutants with defective sensory microvilli using a split-GFP strategy (He et al., 2019). We generated a gcy18::gfp11×7 knock-in strain using CRISPR–Cas9 to visualize endogenous GCY-18 localization. Consistent with prior studies, GCY-18 localized strongly to the AFD dendritic ending in wild-type animals (Figure 4– Supplement 1A, A′, A′′), with an additional weaker signal detectable near the soma and axon (Figure 4– Supplement 1A′′′).

      In kcc-3 mutants, GCY-18 remained localized to the distal dendrite despite disruption of sensory microvillar morphology (Figure 4–Supplement 1B–B′′). Similarly, in ttx-1 mutants, which completely lack AFD sensory microvilli, GCY-18 still localized to the distal dendrite (Figure 4–Supplement 1C–C′′) and remained detectable near the soma and axon (Figure 4–Supplement 1C′′′).

      In the revised manuscript, we clarify both the implications and the limitations of these imaging experiments, noting that “although these experiments do not identify the precise subcellular site at which GCY-18 acts, they show that disruption of sensory microvilli does not substantially alter GCY-18 localization within AFD.” The exact site at which GCY-18 functions to support locomotion modulation therefore remains an important open question for future investigation.

      Revisions to the manuscript:

      Figure 4-Supplement 1: Added a new figure reporting GCY-18 localization in wild type and mutant worms.

      Lines 268-280: Added a new paragraph reporting GCY-18 localization in wild type, kcc-3, and ttx-1 mutants and clarifying its relevance to the reviewer’s concern.

      “Given that gcy-18 is required for context-dependent locomotion modulation and that GCY-18 localizes to the distal dendrite of AFD, we next examined how disruption of sensory microvilli affects its localization in AFD. We used a split-GFP strategy to visualize endogenous GCY-18 [73]. A tandem array of seven GFP11 β-strands (GFP11x7) was inserted at the C-terminus of GCY-18 using CRISPR-Cas9. When complemented with GFP1-10, GCY-18::GFP11x7 fluorescence was strongly enriched at the AFD sensory microvilli near the nose (Fig. 4–Supplement 1A-A′′), consistent with previous reports [42,74,75]. In addition, weaker but reproducible GCY-18 signal was detected near the AFD soma and axon (Fig. 4–Supplement 1A′′′). Importantly, in kcc-3, which exhibit disrupted sensory microvilli, and ttx-1 mutants, which lack sensory microvilli, GCY-18 remained localized to the distal dendrite and was still detectable near the soma and axon (Fig. 4–Supplement 1B-B′′’ and 1C-C′′′). Although these experiments do not identify the precise subcellular site at which GCY-18 acts, they show that disruption or loss of sensory microvilli does not substantially alter GCY-18 localization within AFD.”

      (3) MEC-10 was shown to be required for physical space preference through its action in FLP and not the TRNs (PMID: 28349862). Since FLP is involved in harsh touch sensation while TRNs are involved in gentle touch sensation, which are the neuron types responsible for tactile sensation in the assay arena? Does mec-10 rescue in TRNs rescue the phenotype in the current paper?

      We performed cell-specific rescue experiments of mec-10. Single-copy expression of mec-10 cDNA in either FLP neurons alone (egl-44p) or TRNs alone (mec-18p) did not restore context-dependent locomotion modulation (Fig. 5A). In contrast, co-expression in both FLP and TRNs (egl-44p::mec-10 + mec18p::mec-10), as well as expression from the mec-10 promoter, rescued the phenotype.

      These results indicate that input from multiple mec-10-expressing neurons, including both FLP and TRNs, is required for context-dependent locomotion adjustment. This requirement differs from spatial preference behavior, where mec-10 acts specifically in FLP (Han et al., 2017), suggesting distinct mechanosensory circuits are engaged by different tactile-driven behaviors.

      Revisions to the manuscript:

      Fig. 5A: Updated to include the cell-specific rescue data.

      Lines 317-331: Added a new paragraph describing these findings.

      “The mec-10 gene is expressed in several mechanosensory neurons, including the six touch receptor neurons (TRNs) and the polymodal nociceptors FLP and PVD [77,79]. To determine which neurons are required for tactile-dependent locomotion modulation, we expressed mec-10 cDNA under cell-specific promoters: mec-18p (TRNs) [80], egl-44p (FLP) [81], or mec-10p (TRNs, FLP, and PVD) [79]. Expression in either FLP or TRNs alone did not restore modulation, as worms carrying egl-44p::mec-10 (Δspeed: -11± 4%) or mec-18p::mec-10 (Δspeed: -13 ± 4%) transgenes showed significantly reduced Δspeed compared to wild type (Δ speed: N2: 33 ± 6%; p < 0.0001 for both; Fig. 5A). By contrast, mec-10 co-expression in both FLP and TRNs (Δspeed: 16 ± 4%), or expression from the mec-10 promoter (Δspeed: 23 ± 4%), restored Δ speed to wild type levels (p = 0.20 and p = 0.57, respectively; Fig. 5A). These findings indicate that mec10 expression across multiple mechanosensory neuron types is required for context-dependent locomotion modulation. It is also worth noting that, while both tactile-dependent locomotion modulation and previously reported spatial preference require FLP, only the former depends on TRNs. Together, these findings suggest that distinct subsets of mechanosensory neurons differentially contribute to behaviors shaped by tactile experience.”

      (4) The authors mention that the most direct link between TRNs and AFD is through AIB, but as far as I understand, there are no reports to suggest synapses between TRNs and AIB. However, FLP and AIB are connected through both chemical and electrical synapses, which would make more sense as per their mec10 data. (the authors mentioned about the FLP-AIB-AFD circuit in their discussion but talked about TRNs as the sensory modality). mec-10 rescue experiment in TRNs would clarify this ambiguity.

      We agree with the reviewer that there are no reported synapses between TRNs and AIB, and we have revised Fig. 5 and the corresponding text to clarify this point. In the revised manuscript, we removed any implication of a direct TRN-AIB connection and instead focus on the established FLP-AIB-AFD pathway, while considering potential indirect contributions from TRNs.

      As the reviewer suggested, we performed cell-specific mec-10 rescue experiments. Expression of mec-10 in either FLP alone or TRNs alone was insufficient to restore tactile-dependent locomotion modulation, whereas co-expression in both cell types rescued the phenotype (revised Fig. 5A). These results indicate that FLP is essential for this behavior, consistent with the known FLP-AIB-AFD connectivity, and that TRNs are also required.

      Given that TRNs lack direct synapses with AIB, TRN requirement suggests the involvement of indirect communication, likely mediated through modulatory mechanisms such as neuropeptide signaling. Accordingly, we have revised the model (revised Fig. 5C) and the corresponding text to clarify that tactiledependent locomotion modulation integrates inputs from multiple mec-10-expressing neurons and does not rely on a direct TRN-AIB synaptic connection.

      Revisions to the manuscript:

      Lines 334–345: Revised paragraph to clarify circuit logic and remove implication of direct TRN-AIB synapses.

      “Touch-sensitive neurons that express mec-10, including TRNs, FLP, and PVD, do not form direct synapses with AFD, suggesting that tactile information is relayed through intermediary neurons. Because the interneuron AIB receives synaptic input from FLP and forms electrical synapses with AFD, we hypothesized that AIB could serve as a conduit for mechanosensory signals to reach AFD. To test whether AIB is required for tactile-dependent modulation, we examined locomotion in worms with genetically ablated AIB neurons using npr-9p::caspase expression [82]. AIB-ablated worms failed to adjust locomotion speed, showing a near-complete loss of modulation (∆speed: -1 ± 5%) compared to wild type (30 ± 8%, p = 0.001, Fig. 5B). These results demonstrate that AIB is required for AFD-mediated tactile-dependent locomotion modulation. However, because mec-10-expressing TRNs are also required, additional pathways beyond AIB likely contribute to transmitting tactile information to AFD, potentially involving indirect synaptic connections through other interneurons or long-distance signaling via neuropeptides or other modulators (Fig. 5C).”

      Fig. 5: Updated to include new cell-specific mec-10 rescue data and revised model.

      (5) Do inx-7 or inx-10 rescue in AFD and AIB using cell-specific promoters rescue the behavior?

      Yes. We tested this during revision. Using the AFD-specific srtx-1b promoter, we expressed inx10 cDNA selectively in AFD neurons of inx-10 mutant worms. This manipulation significantly restored tactile-dependent locomotion modulation compared to non-transgenic inx-10 mutants (Fig. 6D), demonstrating that inx-10 expression in AFD alone is sufficient to rescue the behavioral defect.

      Revisions to the manuscript:

      Line 366-370: Added a description of the AFD-specific inx-10 rescue results.

      “We next tested whether restoring inx-10 specifically in AFD would be sufficient to rescue the behavioral defect. Using the AFD-specific srtx-1b promoter, we expressed inx-10 cDNA in inx-10 mutant worms. These transgenic animals displayed significantly improved locomotion modulation (∆speed: 42 ± 5%) compared to non-transgenic inx-10 mutants (15 ± 4%; p = 0.018; Fig. 6D), indicating that inx-10 expression in AFD alone is sufficient to restore function.”

      Fig. 6D: Updated to include new cell-specific inx-10 rescue data.

      (6) How Guanylyl cyclase gcy-18 function is related to the electrical synapse activity between AFD and AIB? Is AFD downstream or upstream of AIB in this context?

      At present, the precise relationship between GCY-18 signaling and the AFD-AIB electrical synapse is not fully resolved. Given that AIB receives mechanosensory input from FLP, it is likely that AIB acts upstream of AFD during tactile-dependent locomotion modulation. However, because the AIB-AFD connection is mediated by gap junctions, communication could also be bi-directional, especially since small signaling molecules such as cGMP and Ca<sup>2+</sup> are known to diffuse through electrical synapses.

      We have therefore revised the manuscript to state explicitly that the directionality of information flow between AFD and AIB remains open, and that this will be an important question for future investigation (Line 455-458).

      “Together, these findings support a model in which AIB functions as a hub neuron that relays mechanosensory input from FLP to AFD to modulate locomotion (Fig. 5C). However, because electrical synapses are often bidirectional, information flow may also occur in the opposite direction, from AFD to AIB.”

      Reviewer #2 (Public review):

      Summary:

      The goal of the study was to uncover the mechanisms mediating tactile-context-dependent locomotion modulation in C. elegans, which represents an interesting model of behavioral plasticity. Starting from a candidate genetic screen focusing on guanylate cyclase (GCY) mutants, the authors identified the AFDspecific gcy-18 gene as essential for tactile-context-dependent locomotion modulation. AFD is primarily characterized as a thermo-sensory neuron. However, key thermosensory transduction genes and the sensory ending structure of AFD were shown here to be dispensable for tactile-context locomotion modulation. AFD actuates tactile-context locomotion modulation via the cell-autonomous actions of GCY-18 and the CNG-3 cyclic nucleotide-gated channel, and via AFD's connection with AIB interneurons through electrical synapses. This represents a potentially relevant synaptic connection linking AFD to the mechanosensory-behavior circuit.

      Strengths:

      (1) The fact that AFD mediates tactile-context locomotion modulation is new, rather surprising, and interesting.

      (2) The authors have combined a very clever microfluidic-based behavioral assay with a large set of genetic manipulations to dissect the molecular and cellular pathways involved. Rescue experiments with singlecopy transgenes are very convincing.

      (3) The study is very clearly written, and figures are nicely illustrated with diagrams that effectively convey the authors' interpretation.

      Weaknesses:

      (1) Whereas GCY-18 in AFD and the AFD-AIB synaptic connection clearly play a role in tactile-context locomotion modulation, whether and how they actually modulate the mechanosensory circuit and/or locomotion circuit remains unclear. The possibility of non-synaptic communication linking mechanosensory neurons and AFD (in either direction) was not explored. Thus, in the end, we have not learned much about what GCY-18 and the AFD-AIB module are doing to actuate tactile context-dependent locomotion modulation.

      We agree with the reviewer that although GCY-18 in AFD and the AFD-AIB connection are clearly required for tactile context-dependent locomotion modulation, the precise mechanisms by which they influence mechanosensory and locomotor circuits remain unresolved. In particular, the possibility of nonsynaptic communication or bidirectional signaling between mechanosensory neurons and AFD cannot be addressed by the current experiments and warrants future investigation.

      At the same time, we believe this study reveals several previously unrecognized aspects of tactiledependent locomotion modulation that provide a foundation for future mechanistic investigation.

      Specifically, we show that (i) GCY-18 functions in AFD to support tactile-dependent locomotion modulation; (ii) the cGMP-gated channel TAX-4, required for thermosensation, is dispensable for this process, whereas CNG-3 is required, revealing functional specialization within AFD; (iii) the interneuron AIB is necessary for this modulation; and (iv) restoring a single electrical connection between AFD and AIB using mammalian Cx36 is sufficient to rescue tactile-dependent modulation in innexin mutants.

      Accordingly, we now explicitly state in the revised Discussion that “a limitation of this study is that the directionality and mode of information flow between AFD and AIB remain unresolved, and defining this relationship will be an important goal for future investigation” (Line 472-475).

      (2) The authors only focused on speed readout, and we don't know if the many behavioral parameters that are modulated by tactile context are also under the control of AFD-mediated modulation.

      We used locomotion speed as the primary behavioral readout because it provides a robust measure for detecting whether behavior is modified by prior tactile experience, rather than to capture the full spectrum of motor outputs. This strategy is often used to assess experience-dependent behavioral plasticity across sensory modalities and enabled us to uncover the unexpected role of AFD in tactile-dependent plasticity.

      In the revised manuscript, we expanded our analysis to include additional behavioral parameters. As described in the Results, AFD-ablated worms showed a complete loss of context-dependent modulation not only in speed, but also in idle time and turning frequency, with no detectable differences between uniform and binary chambers (Fig. 4E). These data strengthen the conclusion that AFD broadly supports tactiledependent behavioral modulation rather than selectively affecting a single locomotor parameter.

      Revisions to the manuscript:

      Fig. 4E: Revised panel to include additional locomotion parameters, including idle time and turning frequency, in wild type and AFD-ablated worms.

      Lines 283–285: Expanded the results to describe changes in locomotion speed, idle time, or turning frequency of AFD-ablated mutant worms. “These animals showed no detectable differences between uniform and binary chambers in locomotion speed, idle time, or turning frequency (Fig. 4E).”

      (3) The AFD-AIB gap junction reconstruction experiment was conducted in an innexin double mutant background, in which the whole nervous system's functioning might be severely impaired, and its results should be interpreted with this limitation in mind.

      We appreciate the reviewer’s concern that the innexin double-mutant background may broadly affect nervous system function, and we agree that loss of innexins is not restricted to the AFD-AIB synapse and could introduce global circuit perturbations.

      Importantly, however, the specificity of the rescue is informative. In an innexin double-mutant background, where electrical coupling is broadly disrupted, re-establishing a single electrical synapse between AFD and AIB using Cx36 was sufficient to restore tactile-dependent locomotion modulation (Fig. 6D). The ability of a targeted AFD-AIB connection to rescue behavior despite the absence of many other electrical synapses argues against a purely global network defect and instead identifies the AFD-AIB electrical synapse as a critical locus for this modulation.

      To further address this concern, we performed an additional rescue experiment in a less perturbed genetic background. In the revised manuscript, we show that AFD-specific expression of inx-10 rescues locomotion modulation in inx-10 single mutants (Fig. 6D). Together, these complementary rescue approaches, one restoring endogenous innexin function in AFD and the other reconstituting an electrical synapse using Cx36, support the conclusion that AFD-AIB electrical coupling is sufficient to enable tactile-dependent locomotion modulation, rather than reflecting nonspecific recovery of global circuit function.

      Revision to the manuscript:

      Fig. 6D and Lines 366-370: Added new data and revised text showing that AFD-specific inx-10 expression restores tactile-dependent locomotion modulation.

      “We next tested whether restoring inx-10 specifically in AFD would be sufficient to rescue the behavioral defect. Using the AFD-specific srtx-1b promoter, we expressed inx-10 cDNA in inx-10 mutant worms. These transgenic animals displayed significantly improved locomotion modulation (∆speed: 42 ± 5%) compared to non-transgenic inx-10 mutants (15 ± 4%; p = 0.018; Fig. 6D), indicating that inx-10 expression in AFD alone is sufficient to restore function.”

      Reviewer #3 (Public review):

      Summary:

      Rosero and Bai report an unconventional role of AFD neurons in mediating tactile-dependent locomotion modulation, independent of their well-established thermosensory function. They partially elucidate the signaling mechanisms underlying this AFD-dependent behavioral modulation. The regulation does not require the sensory dendritic endings of AFD but rather the AFD neurons themselves. This process involves a distinct set of cGMP signaling proteins and CNG channel subunits separate from those involved in thermosensation or thermotaxis. Furthermore, the authors demonstrate that AIB interneurons connect AFD to mechanosensory circuits through electrical synapses. They conclude that, beyond its primary function in thermosensation, AFD contributes to context-dependent neuroplasticity and behavioral modulation via broader circuit connectivity.

      While the discovery of multifunctionality in AFD is not entirely unexpected, given the limited number of neurons in C. elegans (302 in total), the molecular and cellular mechanisms underlying this AFD-dependent behavioral modulation, as revealed in this study, provide valuable insights into the field.

      Strengths:

      (1) The authors uncover a novel role of AFD neurons in mediating tactile-dependent locomotion modulation, distinct from their well-established thermosensory function.

      (2) They provide partial insights into the signaling mechanisms underlying this AFD-dependent behavioral modulation.

      (3) The neural behavior assays utilizing two types of microfluidic chambers (uniform and binary chambers) are innovative and well-designed.

      (4) By comparing AFD's role in locomotion modulation to its thermosensory function throughout the study, the authors present strong evidence supporting these as two independent functions of AFD.

      (5) The finding that AFD contributes to context-dependent behavioral modulation is significant, further reinforcing the growing evidence that individual neurons can serve multiple functions through broader circuit connectivity.

      Weaknesses:

      (1) Limited Behavioral Assays: The study relies solely on neural behavior assays conducted using two types of microfluidic chambers (uniform and binary chambers) to assess context-dependent locomotion modulation. No additional behavioral assays were performed. To strengthen the conclusions, the authors should validate their findings using an independent method, at the very least by testing AFD-ablated animals and gcy-18 mutants with a second behavioral approach.

      The reviewer points out that the original study relied on locomotion assays in two microfluidic environments (uniform and binary chambers) and suggests validation using an independent behavioral approach, particularly for AFD-ablated animals and gcy-18 mutants.

      To address this concern, we developed an independent behavioral assay in which the exploration and assay environments are physically separated by a removable barrier (Figure 1–Supplement 1A). In this design, worms first explored distinct physical settings, after which a barrier was inserted to confine them to an identical assay zone. This approach allowed us to directly test whether context-dependent locomotion modulation can be maintained when worms are prevented from re-entering the exploration environment and must rely solely on prior experience.

      Using this assay, we found that wild-type worms that had previously explored environments matching the assay zone moved significantly faster than those that had explored non-matching environments (Figure 1– Supplement 1B-C). These results demonstrate that context-dependent locomotion modulation is retained even when ongoing sensory input from the exploration zone is eliminated, independently validating our original findings using a distinct behavioral paradigm.

      Further, using this same assay, we found that locomotion modulation was significantly impaired in both gcy-18 mutants and AFD-ablated worms (Figure 4–Supplement 2A). Together, these results provide independent behavioral evidence supporting the conclusion that AFD and gcy-18 are required for contextdependent locomotion modulation.

      Revision to the manuscript:

      Figure 1–Supplement 1A: Added schematic and results from the removable-barrier assay in wild type animals.

      Lines 120-137: Added corresponding Results text describing the new assay and wild-type behavior.

      “Because worms in the binary chamber are exposed to both pillar types and remain free to move between exploration and assay zones, the behavioral differences described above could reflect exposure to a more complex physical environment rather than prior experience alone. To directly test whether locomotion is modulated by prior physical experience independently of continued access to the exploration zone, we designed microfluidic chambers in which the assay zone could be separated from the exploration zone by a removable barrier (Fig. 1–Supplement 1A). In these chambers, worms were initially allowed to explore the entire device, including exploration zones that either matched or differed from the assay zone. A barrier was then inserted to prevent worms in the assay zone from re-entering the exploration zones.

      Under these conditions, locomotion immediately after barrier insertion was higher in worms that had previously explored physical settings matching the assay zone (205 ± 8 µm/s) than in worms that had explored non-matching settings (151 ± 7 µm/s; p = 0.006; Fig. 1–Supplement 1B). This difference persisted when worms were recorded 40 minutes after barrier insertion, with animals in matching chamber retaining their higher locomotion rates (218 ± 11 µm/s) compared to those in non-matching chambers (185 ± 8 µm/s; p = 0.02; Fig. 1–Supplement 1B). These findings demonstrate that prior exploration of distinct physical environments can modulate locomotion even when worms are prevented from returning to those environments, supporting a role for prior physical experience independent of ongoing sensory input.” Figure 4–Supplement 2A: Added data for gcy-18 mutants and AFD-ablated worms in the removable barrier assay.

      Lines 288-296: Added text describing behavioral defects in gcy-18 mutants and AFD-ablated worms using the new assay.

      “Building on our finding that locomotion modulation can be driven by prior physical experience even after worms are prevented from re-entering the exploration zones, we next tested whether AFD is required for this modulation using chambers in which the exploration and assay zones were separated by a removable barrier (Fig. 1–Supplement 1A). Under these conditions, locomotion modulation was significantly reduced in AFD-ablated worms (∆speed: -AFD = 1 ± 6% vs. N2 = 23 ± 7%; p = 0.036; Fig. 4–Supplement 2A). Similarly, gcy-18 mutants showed defective locomotion modulation (∆speed: gcy-18 = -1 ± 8% vs. N2 = 23 ± 7%; p = 0.034; Fig. 4–Supplement 2A). These results indicate that AFD and gcy-18 are required to generate locomotion modulation in response to recent physical experience, even when continued access to surrounding environments is restricted.”

      (2) Clarity in Behavioral Assay Methodology: The methodology for conducting the behavioral assays is unclear. It appears that worms were free to move between the exploration and assay zones, with no control over the duration each worm spent in either zone. This lack of regulation may introduce variability in tactile experience across individuals, potentially affecting the reproducibility and quantitativeness of the method. The authors should clarify whether and how they accounted for this variability.

      In the primary assay, worms were allowed to move freely between the exploration and assay zones for one hour, and each animal’s tactile experience depended on its exploratory trajectory. To address the resulting variability, we performed an a priori power analysis, which determined that approximately 160 worms distributed across more than 20 chambers per condition were sufficient to obtain reliable populationlevel measurements. This sampling strategy was applied consistently across all experiments. Accordingly, analyses emphasize well-powered population means rather than individual trajectories, ensuring robust and reproducible comparisons despite variability in individual experience.

      In addition, as described above, we developed a removable-barrier assay that eliminates variability from ongoing exploration by confining worms to the assay zone after a defined exploration period. The consistency of behavioral effects across both assays further supports the robustness and reproducibility of the approach.

      (3) Potential Developmental and Behavioral Confounds in Mutant Analysis: Several neuronal mutant strains were used in this study, yet the effects of these mutations on development and general behavior (e.g., movement ability) were not discussed. Although young adult worms were used for behavioral assays, were they at similar biological ages? To rule out confounding factors, locomotion assays assessing movement ability should be conducted (see reference PMID 25561524).

      To address the possibility that behavioral phenotypes in mutant strains arise from developmental defects or impaired general locomotion, we directly measured locomotion speed on agar plates and body length in gcy-18 mutant and AFD-ablated worms. Neither genotype showed defects in basal locomotion speed or body length compared to wild type animals (Figure 4–Supplement 2B-C), indicating that the observed modulation defects are not explained by impaired development or gross motor ability.

      To further control for developmental variability, all behavioral assays were performed using agesynchronized populations. Animals were selected at a defined gravid adult stage, identified by the presence of 5-10 eggs arranged in a single row within the gonad. All mutant strains reached this developmental stage approximately three days after egg laying, comparable to wild type animals.

      Revision to the manuscript:

      Figure 4–Supplement 2B-C: Added quantification of locomotion speed on agar plates and body length for gcy-18 mutants and AFD-ablated worms.

      Lines 297-304: Added text describing the data presented in Figure 4–Supplement 2B-C.

      “Finally, to determine whether the modulation defects observed in gcy-18 mutants and AFD-ablated worms could be attributed to developmental abnormalities or gross motor impairments, we measured locomotion speed and body length on standard NGM plates. Both day-1 adult AFD-ablated worms (speed: 281 ± 10 µm/s; p = 0.33; body length: 1.12 ± 0.01 mm; p = 0.76) and gcy-18 mutants (speed: 291 ± 13 µm/s; p = 0.22; body length: 1.15 ± 0.02 mm; p = 0.86) showed locomotion speeds and body lengths comparable to wild type controls (speed: 252 ± 30 µm/s; body length: 1.14 ± 0.02 mm; Fig. 4–Supplement 2B, C). These results indicate that the loss of context-dependent locomotion modulation is not due to developmental defects or gross impairments in locomotion.”

      (4) Definition and Baseline Measurements for Locomotion Categories: The finding that tax-4 and kcc-3 contribute to basal locomotion but not to context-dependent locomotion modulation is intriguing. The authors argue that distinct mechanisms regulate these two processes; however, the study does not clearly define the concepts of "basal locomotion" and "context-dependent locomotion," nor does it provide baseline measurements. A clear definition and baseline data are needed to support this conclusion.

      We define basal locomotion as the locomotion speed of worms measured in the binary chamber, where wild-type animals consistently exhibit lower locomotion rates. Measurements from the binary chamber therefore serve as the baseline reference for locomotion speed in our microfluidic assays. Context-dependent locomotion modulation is defined as the quantified difference in locomotion speed between worms in uniform chambers and those in binary chambers. These definitions are now stated in:

      Lines 199-201: “We examined the locomotion speed of mutant worms in the binary chambers, which we refer to as the basal speed because wild type worms consistently move slowest in this environment.”

      Lines 645-46: “Asterisks above horizontal black lines indicate statistically significant differences in basal speed, defined as speed of worms in the binary chamber”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The availability of strains has not been mentioned. This should be addressed.

      The revised Methods section now includes a complete list of strains used in this study, and we have added a statement indicating that all strains are available upon request.

      Minor comment:

      Figure 1C - it should be Idle, not Idel.

      We have corrected the y-axis label in Figure 1C to ‘Idle.’

      Reviewer #2 (Recommendations for the authors):

      This is an interesting and well-written article, which I greatly appreciated reading. There are a few concerns that the authors should address, in my opinion, to provide a more complete and convincing story.

      Major points:

      (1) Maybe the material transmitted to me was incomplete, but I did not find the gcy gene screen results. It seems important to present the screen results in full, together with the description of the alleles tested for the 24 gcy genes.

      The revised manuscript now includes the complete results of the gcy mutant screen in Figure 2– Supplement 1, with the alleles tested for all 24 gcy genes listed in Table S1.

      (2) I did not find the actual p-values, sample sizes for each condition, or raw data; nor a data availability statement indicating where to retrieve these.

      Statistical significance is indicated by asterisks in all figures, with definitions provided in each figure legend (n.s., p > 0.05; *, p < 0.05; **, p < 0.01; ***, p < 0.001). Sample sizes are shown as individual data points in the plots, and we have now added explicit n values to each figure legend for clarity. A Data Availability Statement has also been added to indicate where the raw data can be accessed. Where possible, we have included exact p-values. For analyses using Tukey-Kramer post hoc tests, p-values are reported to four decimal places, reflecting the output limits of the statistical software used.

      (3) It is not clear why the authors only quantified animal speed for most of the study. What about idle time, turns, and reversals? This choice limits the reach of the study, as we only partly understand what AFD is doing, notably to explain the phenotype in the preference assay.

      Data on idle time, turning frequency, and reversal frequency for wild-type worms are now included in Figure 1F. In addition, we present new data showing that AFD ablation disrupts context-dependent modulation of locomotion speed, idle time, and turning frequency (Figure 4E).

      (4) Figure 2D and related text: these conclusions are based on a single mutant analysis. Were the millionmutation project lines outcrossed? It would be much more convincing if more gcy alleles were tested (this should be relatively easy since classical alleles are available at the CGC for gcy-8 and gcy-18).

      The million-mutation project lines used in this study were outcrossed prior to analysis. In addition, we confirmed that the observed defects were specifically due to loss of gcy-18 function by rescuing the phenotype through expression of gcy-18 cDNA under AFD-specific promoters. This cell-specific rescue shows that the behavioral defects arise from disruption of gcy-18 rather than from background mutations.

      (5) It is hard to interpret the speed phenotype when the authors switch between Delta speed and absolute speed display from one figure to another, or even from one panel to another. If only tax-4 and kcc-3 display a constitutive speed phenotype, then there should be no problem showing the absolute speed data in every panel. This is important to convince the reader that major speed changes in mutants are not biasing the interpretation based on Deltas. Indeed, if some mutants move very fast, there might be a ceiling effect. Conversely, if they move very slowly, there might be a 'sickness' effect. Both effects could prevent seeing a tactile-context-dependent modulation, and the results would need to be interpreted much more carefully. Providing the full view on absolute speed levels would also really help support the whole discussion paragraph about the differential regulation of constitutive versus context-dependent locomotion (from L339 onward).

      We focus on ∆speed because it directly quantifies experience-dependent locomotion modulation relative to each strain’s own baseline, making it an appropriate metric for comparing tactile plasticity across genotypes. This approach avoids confounding effects from strain-specific differences in overall locomotion levels.

      At the same time, we agree that absolute locomotion speed is important to consider when interpreting behavioral phenotypes. To address this, we added plate-based locomotion speed and body length measurements for two key genotypes that lack modulation, gcy-18 mutants and AFD-ablated worms (Figure 4–Supplement 2B–C). Both exhibit normal locomotion on agar plates, indicating that their defects in tactiledependent modulation are not due to impaired motor ability or general sickness.

      In addition, among the mutants tested in microfluidic chambers, tax-4 mutants display elevated basal speed yet retain robust context-dependent modulation, indicating that ceiling effects do not limit detection of modulation.

      (6) The gap junction expression is a nice experiment. But there is a major limitation that should be stated: the electrical synapse re-construction is made in a double mutant background in which the whole animal circuitry might be severely affected. It might well be that the restoration of behavioral plasticity represents something totally irrelevant to wild-type nervous system functioning. A cell-specific innexin knockout is needed to fully support the relevance of the AFD-AIB connection.

      We agree that reconstruction of an electrical synapse in an innexin double-mutant background carries the limitation that global circuit function may be broadly affected. To address this concern, we performed an additional rescue experiment in a less perturbed genetic background.

      As described above, we show that AFD-specific expression of inx-10 is sufficient to restore tactiledependent locomotion modulation in inx-10 single mutants (Fig. 6D). This cell-specific rescue does not rely on a double-mutant background and converges on the same outcome as the Cx36-based electrical synapse reconstruction. Together, these complementary approaches support the conclusion that restoring AFD-AIB coupling is sufficient to enable tactile-dependent locomotion modulation, rather than reflecting nonspecific recovery from global circuit disruption.

      (7) How was developmental age controlled? It seems that all genotypes were grown for a fixed duration (72h). Some mutants, like gcy-8, might grow slower. It would be useful to at least provide control data in wildtype animals showing that behavioral performance is similar even in slightly younger animals (covering the developmental age of the youngest mutant).

      Developmental age was controlled by strict age synchronization and staging criteria rather than growth duration alone. Worms were synchronized by allowing 40-50 young adults to lay eggs on OP50-seeded NGM plates for two hours, after which adults were removed. Developmental stage was further assessed by gonadal morphology, and only young adult animals with 5-10 eggs arranged in a single row were selected for behavioral assays. Using these criteria, all strains, including mutants, consistently reached the assayed stage approximately three days after egg laying, comparable to wild type animals.

      To further address the possibility that subtle developmental differences could influence behavior, we measured locomotion speed on agar plates and body length for genotypes that show defects in contextdependent modulation. gcy-18 mutants and AFD-ablated worms exhibited normal locomotion rates and body size, indicating that their behavioral phenotypes are unlikely to arise from developmental delay or impaired general motor ability. These control data are now included in the revised manuscript (Figure 4– Supplement 2B–C).

      (8) Plasmid construction description is entirely lacking.

      Description of plasmid construction has been added to the revised Methods.

      Minor points:

      (1) 'Context-dependent locomotion' should be replaced by 'tactile context-dependent locomotion' or something similar throughout the manuscript when referring to the impact of the pillar environment.

      Presently, this phrasing shortcut makes the communication too vague throughout, and even confusing when presenting the result of supplementary Figure 2 (where both thermal and tactile contexts are manipulated).

      We appreciate this suggestion and have revised the terminology for clarity where appropriate. Prior to introducing the mechanosensory origin of the modulation (that is, before presenting the mec-10 data), we retain the broader term “context-dependent modulation” to avoid presupposing a tactile mechanism before it is experimentally established.

      (2) L97: Suggested change along the same lines: "prior experience" -> "prior tactile experience".

      We have made this change as suggested.

      (3) Figure 1A: Would the author consider swapping the order of conditions displayed in this diagram? It would make more sense to have the same left-to-right order in the whole figure with the binary chamber on the left, particularly since the author describes the results considering the binary chamber as the 'reference point'.

      The order of chambers in Figure 1A has been revised as suggested, with the binary chamber now shown on the left.

      (4) Figure 1C: 'idel' typo in the axis label.

      The y-axis label has been updated from “idel” to “idle.”

      (5) Without AFD-specific manipulations, the data with tax-4 and tax-2 mutants provide limited information regarding TAX-4 and TAX-2 role in AFD. It should be explicitly mentioned in the Results section that they might work in other neurons.]

      The revised manuscript now explicitly states that the tax-2(p694) allele affects multiple neurons, including BAG, ASE, ADE, and AFD (Lines 421–422).

      (6) L220-222: The strict meaning of this sentence implies that one attributes a role to AFD in controlling constitutive locomotion, but none of the presented data directly shows this (both kcc-3 and tax-4 mutant phenotypes could arise from additional neurons, regardless of any perturbation in AFD). This should be corrected.

      To avoid implying that AFD directly controls constitutive locomotion, we have removed the sentence in question, “Together, these findings suggest that the role of AFD neurons in modulating context-dependent locomotion is distinct from their thermosensory functions and differs from the mechanisms controlling basal locomotion”, from the revised manuscript.

      (7) L328-329: Overstatement. Without AFD-specific manipulation of TAX-2 and TAX-4, the different mutant phenotypes could be due to different cell types, rather than different protein pairs in the channel heteromers. I would recommend addressing this alternative possibility directly in the discussion, rather than focusing only on one (I agree, very cool) possibility.

      We have clarified this point in the revised text. We now explicitly note that the tax-2(p694) mutation affects tax-2 expression in multiple neurons (AFD, BAG, ASE, and ADE) (Lines 421–422).

      Reviewer #3 (Recommendations for the authors):

      (1) Clarification of inx Gene Expression Analysis (Lines 270-271): The authors should specify how the expression of inx genes in individual neurons was identified.

      The revised manuscript now specifies that innexin expression patterns were identified using the CeNGEN single-cell transcriptomic database (Lines 352–354).

      (2) Cx36 Expression in AFD and AIB (Lines 287-288): Further clarification is needed on how Cx36 expression was achieved in AFD and AIB.

      We have clarified that Cx36 was expressed specifically in AFD using the srtx-1b promoter and in AIB using the inx-1 promoter, as stated in the main text (Lines 372–373) and the Fig. 6 legend.

    1. eLife Assessment

      This important study deepens our understanding of how populations of a given species may diverge in their molecular and physiological patterns as a result of adaptation to different thermal regimes. By approaching this question from multiple directions, the authors provide solid evidence for adaptive changes in three strains of the diamondback moth after only three years of experimental evolution, and support the causal involvement of the PxSODC gene in thermal adaptation to both cold and hot temperatures. This work would benefit from more sophisticated phylogenetic analyses, better statistical support, and a more detailed discussion of the differences in the three strains at the pathway level.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Lei and co-workers aim to uncover the genetic underpinnings of thermal adaptation across three strains of the diamondback moth (Plutella xylostella) through experimental evolution over three years under three different thermal regimes. They identify systematic differences in trait responses (e.g., survival, fecundity), metabolic profiles, gene expression, and in the amino acid sequence of the PxSODC gene, among others. These results suggest that the diamondback moth has a strong potential for rapid physiological adaptation to different thermal regimes. Overall, this is a comprehensive and generally well-executed study that addresses an important question in the face of ongoing climate change.

      Strengths:

      The authors employ multiple approaches to identify signatures of thermal adaptation across the three strains, such as trait performance comparisons, metabolomics, transcriptomics, and amino acid sequence comparisons. All these different angles form a convincing picture of the underlying factors that underpin thermal adaptation in this experimental system. The manuscript is also generally well written and easy to understand.

      Weaknesses:

      I am unable to judge the validity of all aspects of this work; I will focus only on areas within my core expertise.

      (1) The authors identify pathways that are enriched in different strain comparisons (Figure 3E), but do not provide a detailed interpretation of these results. It would be great if the authors could explain in more detail how the physiological processes of a cold-adapted strain of this species may differ from those of a warmer-adapted strain.

      (2) The authors reconstruct a phylogenetic tree of the PxSODC gene using the neighbor-joining algorithm. The limitations of this algorithm have been known for many years now, especially for sequences separated by long evolutionary distances. According to Wang et al. (2016), the last common ancestor of the species shown in Figure S4C occurred 392-350 million years ago. Given this, I would strongly recommend that the authors infer a phylogenetic tree using model-based methods, such as those implemented in RAxML-NG or IQ-TREE. Also, in the absence of a valid outgroup sequence, I would show the gene tree as unrooted or rooted based on the corresponding species tree.

      (3) There is a key piece of the puzzle that is currently missing: the structural mechanism behind the mutational effects described in this study (e.g., Figure 5). The authors could leverage AlphaFold to generate structural models of different mutants and conduct molecular dynamics simulations to examine their conformational dynamics.

      References:

      Wang, Yh., Engel, M., Rafael, J. et al. Fossil record of stem groups employed in evaluating the chronogram of insects (Arthropoda: Hexapoda). Sci Rep 6, 38939 (2016). https://doi.org/10.1038/srep38939

    3. Reviewer #2 (Public review):

      Summary:

      In this paper, the authors set out to better understand the genetic mechanisms underlying thermal adaptation in insects. They experimentally evolved diamondback moth (Plutella xylostella) populations - a pest species with a wide distribution - under both hot (12h:12h 32{degree sign}C/27{degree sign}C) and cold (15{degree sign}C/10{degree sign}C) thermal conditions, and conducted phenotypic assays and metabolic and transcriptomic profiling to analyze how populations changed to deal with this thermal stress compared to the nonevolved ancestral population (constant 26{degree sign}C). Phenotypic assays showed that evolved hot populations had increased survival at high temperatures (42-43{degree sign}C) while evolved cold populations had lower freezing points compared to the ancestral population. When measured at the constant 26{degree sign}C conditions, metabolic and transcriptomic profiles of 3rd instar larvae from the evolved population were distinctive from the ancestral population, with a set of overlapping metabolic and transcriptomic pathways that were significantly differentially expressed in both hot and cold evolved populations compared to the ancestral. The authors narrowed down this set of candidate genes further by focusing on genes with high expression levels overall, whose expression profile was correlated with differentially expressed metabolites, and that contained mutants in both hot and cold strains. From this set, they chose the PxSODC gene for further functional validation, as it has previously been shown to be involved in the response of insects to abiotic stress with its antioxidative role in cellular defense. At the constant 26{degree sign}C, this gene showed lower expression across development in evolved strains compared to the ancestral population, while it showed similar expression patterns under thermal stress. Knockdown of PxSODC resulted in decreased survival rates at high temperatures and higher freezing points compared to the ancestral population. Based on this validation, the authors hypothesize that the non-synonymous mutation in the PxSODC gene that they found in the cold and hot evolved populations might alter the conformation of the PxSODC protein, increasing enzyme capacity. Their experimental evolution experiment furthermore indicates the capacity of the pest species, the diamondback moth, to adapt to a wide range of temperatures, providing insights into its capacity for global dispersal.

      Strengths:

      (1) The authors did a tremendous amount of work to characterize the mechanisms underlying thermal adaptation in the diamondback moth, artificially selecting populations for three years in the lab and characterizing how they evolved as a result at different biological levels: from phenotypes in different life stages, to larval metabolites and gene transcription, to functionally validating how one of the resulting gene candidates influences the capacity to deal with thermal stress.

      (2) The paper identifies and provides further evidence for candidate genetic mechanisms that might be particularly important for thermal adaptation in insects, including lipid metabolism, oxidoreductase activity, and DNA methylation. It is furthermore interesting that the authors found similar mechanisms to be involved in both the adaptation to cold and hot environments. Their functional validation of some of the genes involved in these mechanisms is very useful to understand how these genes might be causally involved in insect thermal adaptation.

      (3) The paper also has applied value: the diamondback moth is a pest species with a wide distribution, so understanding its adaptive capacity to different thermal environments is important for predicting the prevalence and potential further range expansion of this species under future climate change.

      Weaknesses:

      (1) The paper in its current form is hard to digest and would benefit from improved clarification of the storyline, as well as a tighter integration between the phenotypic, omics, and functional validation data. Currently, it is not always clear what the relevance is of all the reported results, nor why certain decisions were made, or how all the different methods the authors used fit together. For example, the authors functionally validated a second gene, PxDnmt1, but it is unclear why this particular gene was chosen, nor how it relates to their selection regimes when looking at the results obtained with the phenotyping and omics data collection. Seeing how much work the authors did, this makes the paper overwhelming and difficult to read.

      (2) The authors at times stretch their results too far, as the ecological relevance of their study design and results is not clear, limiting the generalizability and value of the results for understanding species' adaptive potential under climate change. For example, the selection regimes used present the minimum and maximum known temperatures at which the species can survive and develop, but it is unclear how the temperatures relate to the natural environment of the source population, to what extent wild populations might experience these temperatures, and whether they would experience them at the extended duration used (12h at max/min temperature). Moreover, I wonder whether the comparisons made would identify the genes that matter under natural conditions, as unevolved populations were kept under constant conditions compared to 12h:12h temperature regimes for the evolved populations, and the metabolic and transcriptomic profiling was done under a constant favorable 26{degree sign}C rather than under thermal stress in a, as far as I can tell, randomly chosen life stage (larval stage).

      (3) The paper in its current form does not adequately describe the statistical analyses underlying the results, nor do the authors share their code, making it very hard to judge whether the analyses used are appropriate and the results trustworthy. I have concerns about the inappropriate use of t-tests, the lack of correcting for confounding variables, and the need for multiple testing corrections.

    4. Author Response:

      Public Review:

      We thank you and the reviewers for the thoughtful and constructive comments. The feedback helps us strengthen the manuscript substantially, and we plan to address the key points in the revised version as follows.

      Reviewer #1 (Public review):

      First, in response to the request for a clearer biological interpretation of the pathway enrichment results, we will expand the Discussion to more directly integrate these findings with the observed life-history divergence between strains.

      Second, we agree with the concern regarding the phylogenetic inference of PxSODC. We will therefore re-infer the phylogeny using a model-based Maximum Likelihood approach implemented in IQ-TREE, and, in the absence of an appropriate outgroup, the revised tree will be presented as unrooted.

      Third, to address the suggestion for a structural explanation of the mutational effects, we will add new structural analyses using AlphaFold modeling and 100 ns molecular dynamics simulations of the wild-type and mutant PxSODC proteins across three physiologically relevant temperatures.

      Reviewer #2 (Public review):

      First, we will restructured the Results and streamlined the presentation to better emphasize the central narrative. Extensive descriptive datasets will be moved to the Supplementary Materials, and the rationale linking different analytical layers will be stated more explicitly.

      Second, we will also revise the manuscript to better frame the ecological relevance and limitations of the experimental design. Specifically, we will clarify that the thermal selection regimes were chosen to reflect ecologically relevant extremes for the source population from subtropical Fuzhou, where summer and winter temperatures can approach the ranges used in the experiment. We will further explain that the cycling temperature treatments were designed to approximate severe but naturally occurring diurnal fluctuations.

      Third, in response to concerns about statistical rigor and reproducibility, we will substantially expanded the statistical methods throughout the manuscript. The revised version will provide a clearer description of the analyses used for each dataset, including sample sizes, comparison structure, and statistical thresholds. We will also clarify the application of multiple-testing correction for both transcriptomic and metabolomic analyses, specified the criteria used in network analyses, and more clearly distinguished the statistical approaches used for pairwise versus multi-group comparisons.

    1. eLife Assessment

      This is a potentially important work on the organization of visual information in the rodent superior colliculus. It reports that the selectivity of neurons to line orientation and motion in the visual image is largely governed by the sensitivities of retinal neurons and their ordered projection to the superior colliculus. If confirmed, these conclusions could substantially revise prior thinking in this field. However, in the present state, the methods and analysis are incomplete and cannot justify all the claims.

    2. Reviewer #1 (Public review):

      Summary:

      When contemplating the role of any sensory area in the brain, an essential question is: How much of the neural code is inherited from the inputs versus constructed de novo by the local circuitry? This study tackles that important question for the case of the mouse superior colliculus (SC), a visual brain area that receives direct input from the retina. The specific aspects of the neural code are the representation of line orientation and direction of motion in the visual image. Over the past 10 years or so, there have been reports that the preferred directions and orientations of neurons vary systematically across the SC in a global map that is not present in the retina, and therefore computed locally.

      Here, the authors revisit this question by expanding the range of measurements: They record from the axonal boutons of retinal ganglion cells in the input layer of the SC, from the post-synaptic neurons there, and from neurons in deeper layers of the SC. They conclude that at any given location in the SC, the signals in retinal boutons recapitulate the tuning of retinal ganglion cells, and that SC neurons follow that organization, though it is lost in the deeper layers. Notably, they find no evidence for a global map of these response properties other than what is contributed by retinal input.

      Strengths:

      The study combines multiple recording methods - calcium imaging and electrical recording - to capture the activity of retinal inputs to the colliculus, the tuning of neurons in the superficial layers close to the input, as well as neurons in deeper layers. Furthermore, the work connects to the recent literature on gradients of tuning properties among retinal ganglion cells. All these set the stage for testing the correspondence between retinal inputs and collicular outputs.

      Weaknesses:

      The methods used to identify direction-selective and orientation-selective neurons based on visual responses are overly permissive and don't match common usage in this research area. Furthermore, the measurements covered only a small fraction of the visual field, which limits their ability to distinguish between different hypotheses for the global map of visual response properties. Relatedly, the claim that retinal input patterns explain much of the layout in the superior colliculus should be made more quantitative.

    3. Reviewer #2 (Public review):

      In this study, the authors investigate the spatial organization of direction and orientation selectivity in the mouse superior colliculus (SC) and its retinal inputs. By combining two-photon imaging of retinal boutons and SC neurons with Neuropixels recordings, they assess whether tuning preferences form structured maps or are arranged in a salt-and-pepper fashion. They further compare SC tuning organization to previously described retinal geometric models to determine the extent to which collicular responses inherit retinal topography. The authors conclude that SC inherits a cardinally biased topographic scaffold from the retina, which progressively weakens with depth, and that strong global maps are absent.

      A major strength of the study is the impressive combination of methodologies, including imaging of retinal boutons, imaging of SC neurons, and large-scale electrophysiological recordings across SC depth. The direct comparison to retinal geometric models is particularly valuable, as it situates the SC within a broader framework of retinotopic information transfer and advances our understanding of how retinal computations are transformed in downstream targets.

      A limitation of the study, however, is that the imaging experiments sample only a relatively small and spatially homogeneous region of the visual field, whereas the electrophysiological recordings cover a different portion of SC. This separation makes it difficult to form a fully integrated, global picture of the spatial organization of direction and orientation selectivity across the entire collicular map.

    4. Reviewer #3 (Public review):

      Summary:

      The authors studied the organisation of orientation and direction-selective retinal ganglion cells' boutons in the mouse superior colliculus. They confirmed the results already published (Molotkov, 2023; de Malmazet, 2024; Vita, 2024; Laniado, 2025), that retinal ganglion cells' boutons in the superior colliculus conserve the retinal organisation. Thereby, orientation and direction preferences of retinal boutons at each collicular location reflect the tuning of retinal ganglion cells found at the corresponding retinal location, that is covering a matching receptive field location.

      The authors also studied the organization of orientation and direction-selective neurons in the superior colliculus. They report a lack of functional organisation in the superior colliculus for neurons preferring the same stimulus orientation or direction of movement. This goes against several published reports (Ahmadlou and Heimel, 2015; Liang et al., 2023; De Malmazet et al., 2018; Feinberg and Meister, 2014; Kasai and Isa, 2021; Li et al., 2020) but echoes a study from Chen et al. (Chen, 2021). The latter authors contested the strength of the anatomical clustering of tuned alike direction-selective neurons. They found, however, that in about a quarter of their recordings, direction-selective cells with similar preferred directions did cluster anatomically in the superior colliculus.

      Here, the authors of the current manuscript under review report that local clustering of tuning was weak in all neural populations and confined to very small spatial scales (10-20 μm). This is one order of magnitude smaller than previously reported clusters of around 100-300μm wide. Therefore, the authors conclude that orientation and direction tuning in the mouse superior colliculus follows a salt and pepper organisation.

      Strengths & Weaknesses:

      Although the authors performed a solid analysis contesting the functional clustering of direction and orientation selective neurons, there seemed to be some elements in their data in favour of a functional clustering of neurons.

      As an illustration, the authors plotted in Figure 1Q the distribution of preferred orientations from all their recorded orientation-selective cells. The curve shows a clear bias, indicating that neurons preferring horizontal orientations were found two times more often than neurons encoding any other orientations. Moreover, the authors recorded all their neurons from a defined anatomical location of the colliculus, marked by the dotted rectangle in Figure 3A-C. Therefore, this suggests that orientation-selective cells in this particular collicular location are biased towards preferring horizontal orientations. This supports an anatomical clustering of tuned alike orientation-selective cells in the superior colliculus.

      Similarly, Figure 1P shows a bias in the preferred directions of direction-selective neurons in the same recording area. Cells tended to respond more to upward and forward-moving stimuli. The bias is more modest than the one described above for preferred orientations. However, it still seems significant. For example, cells preferring upwards movements appeared to be four times more abundant than cells preferring downward movements. As a consequence, it indicates that preferred directions might not be uniformly distributed and equally represented across the superior colliculus.

      These anatomical biases are also visible in the receptive field analysis of the paper. In Figure 3G, the authors plotted the distribution of preferred orientations for every 10-degree bins within the recorded field of view. Out of 26 bins containing more than one neuron, only 6 seemed to include cells not overwhelmingly preferring a single orientation. These were located towards the top right of the figure. Therefore, over almost 80% of the recorded superior colliculus, the data seem in agreement with the view that orientation-selective cells tend to prefer the same orientation at a given receptive location.

      The patch analysis in Figures 4G and H also seems to show some degree of coherence in the preferred orientation and direction of neighbouring tuned collicular cells. In both Figures 4 G and H, clear patches of similar preferred orientation and direction appeared to emerge. For example, in Figure 4H, there is a predominance of horizontally tuned patches. This was expected given the recording bias consisting of a majority of horizontally tuned cells. In addition, vertical and 45-degree patches are also visible, in blue and red, respectively. These patches overlap with the corresponding retinotopic locations in Figure 3G, where the histograms show that cells tend to prefer the same orientations, horizontal, vertical or 45 degrees.

      It is important to note that in the previous studies on functional clustering of orientation and direction, variability in the tuning of cells within clusters was always reported (Ahmadlou and Heimel, 2015; Chen et al., 2021; De Malmazet et al., 2018; Feinberg and Meister, 2014; Kasai and Isa, 2021; Li et al., 2020). This was more marked for direction-selective cells than for orientation-selective cells. In general, cells preferring all four cardinal directions were often recorded at any given collicular location. Similarly, orientation-selective cells could be found to prefer deviant orientations compared to adjacent cells. Therefore, it is not surprising to see locally mixed tuning in collicular neurons. However, what appeared significant in these studies was the overall proportion of cells with similar tuning in patches of the superior colliculus. As described above, this also seems to be the case in the data of this manuscript.

      To conclude, it seems that authors tend to overlook the sources of agreement between their data and previous reports showing functional clustering of cells in the superior colliculus. Instead, the authors tend to emphasise the dissimilarities and variability to put forward a contentious view on the organisation of orientation and direction selectivity in neurons of the superior colliculus. This, I fear, is detrimental to the field because it creates a sort of manufactured chaos that produces unnecessary confusion for readers who do not attentively read the manuscript. It would be valuable for the authors to consider rewriting the manuscript, acknowledging where their data, in fact, support some level of functional clustering.

    5. Author Response:

      We thank the reviewers and editors for their thoughtful and constructive assessment. We are encouraged that the reviewers viewed the combination of retinal bouton imaging, collicular neuron imaging, and depth-resolved electrophysiology, together with the comparison to retinal geometric models, as a strength of the study. As the reviewers note, our findings are consistent with previous in vitro studies showing topographic organization of tuning in the retina and with recent work demonstrating the precision of retinotopic mapping from retina to superior colliculus (SC). In revision, we will refine our definition of tuning, sharpen our claims about the spatial organization across SC and its correspondence to retinal topography, and make clearer our aim of reconciling seemingly opposing findings in the literature. In addition, we will provide a detailed response to all other points raised by the reviewers.

      A central point raised in the reviews concerns our definition of direction- and orientation-selective cells. We agree that relying only on statistical significance is not sufficient for our purposes, because the resulting classification can depend on recording duration and statistical power. In the revised manuscript, we will therefore introduce thresholding criteria for direction and orientation selectivity indices (DSI and OSI) in addition to significance-based testing. We will also make clearer that our primary question is which stimulus directions and orientations are represented at a given receptive field location, rather than the distribution of preferences among neurons classified as purely direction- or orientation-selective.

      We will also revise the text to define more precisely what our data do and do not establish about the large-scale organization across SC. Our intended conclusion is not that we identify a novel global organization, which would require sampling a larger portion of visual space, but rather that the regions we sampled are not well explained by previously proposed global maps in which each visual field location is dominated by a single tuning preference and the same organization is conserved across individuals. Instead, our data are more consistent with a retinal organization of biases toward specific directions and orientations that vary systematically across visual space.

      We will further clarify how we quantified the correspondence between our data and the previously established retinal model of direction and orientation tuning. In the current manuscript, we report the errors between model predictions and measured tuning preferences at the corresponding visual field locations. We then assess model performance by comparing the distribution of these errors with the errors obtained from two surrogate datasets: one in which the correspondence between visual field location and tuning preference is destroyed, and one in which the prior distribution of tuning preferences is assumed to be uniform. In the revised manuscript, we will make the interpretation of this comparison more explicit, so that the reported errors are clearly presented as the relevant effect-size measure alongside significance.

      Finally, we appreciate the reviewers’ concern that the manuscript may currently emphasize disagreement with previous studies too strongly. We will revise the Discussion to better acknowledge where our data support some degree of local bias or weak clustering, while clarifying that we do not find evidence for a robust, stereotyped global map that is consistent across animals. Our goal is to sharpen the manuscript so that it better reconciles seemingly divergent findings in the literature rather than setting them in opposition.

    1. eLife Assessment

      This important study advances our understanding of the neural substrate of planning trajectories towards a goal by using recurrent neural networks. The manuscript provides solid evidence for most of the claims, but it remains unclear whether the dynamics do indeed bear the defining characteristics of attractors, and the interpretation and scope of some claims may need to be reassessed in light of prior work. The work will be of broad interest to theoretical and systems neuroscientists and to cognitive scientists.

    2. Reviewer #1 (Public review):

      Summary:

      This work builds a theory to implement planning trajectories towards a goal in a known environment, inspired by analyses of prefrontal neural recordings. Unlike standard neural architectures for this task, such as value-based learning and successor representations, their proposed theory is able to adapt to novel goal locations within a trial. The key to the theory is that future times are represented by orthogonal groups of neurons. The recurrent connectivity between groups of neurons selective to specific future times and locations reflects the learned knowledge of the task. Finally, the authors show that standard networks trained on the task approximate their proposed theory.

      Strengths:

      The structure of the work is clear, and the presentation of the results is very well written, which is particularly noticeable given the consequential amount of results presented. The authors are able to link their theory with experimental findings in neural recordings. The reverse-engineering of trained recurrent neural networks is very thorough, by analyzing both dynamics and connectivity. The assumptions and predictions of their model are clearly stated.

      Weaknesses:

      It is unclear whether their proposed theory, "space-time attractors", actually is an attractor network. The authors used recurrent neural networks with very few timesteps, and long single neuron time constants with respect to the task time scales. Attractor networks, as the ones the authors cite, refer to networks that generate nontrivial patterns of activity through recurrent interactions, after long periods of time.

      The authors gloss over how the reward inputs are calculated. Computing these reward inputs should be part of the planning process, and the authors are implicitly leaving this problem aside. How does the reward input, which includes future time and location, depend on the actions that have not yet been taken by the agent? It feels like most of the planning computation is already provided by these reward inputs at the beginning of the trial. It could be that the network is only learning to process the planned sequence of actions present in the inputs.

    3. Reviewer #2 (Public review):

      This well-written manuscript proposes to use attractors in space and time (STA) as a mechanistic explanation for planning in the prefrontal cortex. The main conceptual hypothesis is that planning is implemented as attractor dynamics in a representation that encodes states at each time step jointly. Depending on inputs, the network relaxes to a trajectory that already contains future states that will be visited at each time step, rather than computing a scalar value at each point in time and space like other classical approaches from RL. The authors compare this approach to implementations such as TD learning and successor representation, and further show that trained recurrent neural networks on specific tasks involving planning develop structured subspaces resembling the ones postulated in STA.

      The idea of treating attracting trajectories unfolding in time as the computational substrate for planning is very interesting and potentially important. The explicit construction of a state x time representational space and its implementation via recurrent dynamics are appealing and convincing in the idealized tasks considered. I found the manuscript to be refreshingly explicit regarding several of the assumptions and limitations of the models, for example, the fact that certain advantages can be viewed as properties of the state space itself and not necessarily of a fundamentally new planning mechanism.

      Overall, the manuscript presents a cool attractor model that extends in time and explores its performance in a subset of illustrative tasks involving planning. My doubts concern mostly the interpretation and scope of the claims made in the manuscript. Here are a few comments where I detail my questions/concerns:

      (1) The authors nicely discuss that much of the difference between STA and classical TD or SR agents is "in some sense a property of the state space rather than the decision making algorithm," and that TD and SR could in principle be implemented in a comparable space x time representation. This is fair, but it also suggests that the central contribution of the manuscript lies primarily in the representational factorization (state x time tiling) and its dynamical implementation via attractors, rather than in a fundamentally new planning algorithm or theory, mechanistic or not. I think theory should be distinguished from mechanism, and it would therefore help the reader to describe the conceptual advancement more as a novel mechanism or implementation than a novel (mechanistic) theory for decision/planning.

      (2) Related to my previous point, I think it would be helpful to position STA more explicitly relative to computational/theoretical literature in which attractor networks encode temporally ordered patterns (so effectively including future times). For example, classical extensions of Hopfield networks with asymmetric connectivity implement retrieval of sequences and ordered transitions between patterns (Sompolinsky & Kanter, 1986). More recently, sequential attractors and limit-cycle dynamics have been constructed in structured recurrent networks by the Morrison group (Parmelee et al., 2021). These works do not implement an explicit discretized state x future-time tiling as in STA and do not specifically discuss the usage for planning. However, they do provide concrete precedents for attractor dynamics over temporally structured trajectories in terms of mechanism. It would be useful to discuss this literature and clarify a little what's new mechanistically in the view of the authors.

      (3) A central claim of the manuscript is that space-time trajectories are attractors of the STA dynamics. The manuscript does provide empirical evidence consistent with attractor-like behavior. However, it is not explicitly shown whether trajectory representations persist in the absence of sustained external inputs. So it's not clear to me whether the trajectories should be interpreted as intrinsic attractors of the recurrent system, which can be selected by delivering transient inputs, or whether they must be stabilized by a specific continuous external drive. It would be useful if the author could clarify/discuss this point.

      As far as I understand it, reward information is provided as input to specific populations encoding future time steps, and that's essential for rapid adaptation without rewiring connectivity. How such future-time-specific reward inputs would be generated and routed to distinct neural populations isn't entirely clear to me. Since this seems to be an essential component of the model, I think it would be important to discuss more deeply the source and plausibility of these reward signals related to different timesteps.

      (4) The authors note that vanilla STA scales linearly with planning horizon, and discuss potentially hierarchical extensions for longer horizons. They acknowledge that learning abstractions remains an open challenge, yet the examples of planning in the manuscript are restricted to very short temporal horizons and limited branching complexity. It is not obvious to me in what cases the current implementation and interpretation of STA remains viable (for example, in terms of relaxation iterations) as the horizon and branching factor increase. Relatively simple planning can be managed by simpler, less costly models/algorithms, whereas complex planning is a lot harder to deal with, and it's something that a mechanistic "theory" should address. In the context of the claims of the paper in its present form, I think this is possibly the most important conceptual and practical limitation in the manuscript.

      (5) The RNN analyses show that trained networks develop structured subspaces aligned with future time indices and exhibit perturbation behavior consistent with attractor-like dynamics. The manuscript also explicitly notes differences between the trained RNN and the handcrafted STA (e.g., long-range couplings between subspaces and differences in behavior of lower-value trajectories under perturbation), which I much appreciated. My doubt is on the specificity of this result, as trained RNNs on fixed-horizon tasks can develop latent dimensions correlated with temporal progress within a trial or time-to-goal. I think it would help the reader to clarify whether the results demonstrate that STA-like computations emerge in RNNs trained on planning tasks, or that RNNs generally develop some kind of structured spacetime representations when tasks involve future timesteps and some degree of flexibility in the decisions.

      A few more minor points, mainly concerning clarity:

      (1) The main dynamical equation combines a log-domain recurrent term, a floor operation, and a log-sum-exp normalization step, followed by exponentiation. The intuition/logic behind this specific formulation could be clarified for the reader. For example tt would be helpful to explain why the recurrent input appears inside a log, and also whether/how these operations relate to any multiplicative constraint.

      (2) While the computational cost of successor representation in an expanded NT x NT representation is discussed, the corresponding scaling of STA in terms of number of units and connections (as a function, for example, of the planning horizon) isn't clear to me. Perhaps the authors could compare costs more explicitly.

      (3) In the RNN analyses, structured subspaces aligned with future time indices are shown. I couldn't find a quantification of how much variance is captured by the subspaces, relative to other latent dimensions. Adding it would help get a feeling for the strength of the alignment.

    1. eLife Assessment

      This important study presents evidence that the Chromatin-linked adaptor for MSL complex proteins (CLAMP) GA-binding transcription factor (TF) regulates ~75% of HS-induced repression in Drosophila and suggests that CLAMP is the first known transcription factor to induce heat-stress-mediated repression of gene expression. While mechanistic details remain to be sorted out, this manuscript provides convincing evidence that novel pathways involving the CLAMP transcription factor repress gene expression during heat shock stress.

    2. Reviewer #1 (Public review):

      Summary:

      This work aims to identify the transcription factor responsible for targeting constitutively active genes for repression during heat stress. While the mechanisms underlying heat-stress-induced gene activation are well characterized - primarily involving Heat Shock Factor (HSF), the GA-binding factor GAF, and RNA Polymerase II pausing regulators - far less is known about how repression of constitutive genes is directed. Because known activation factors such as HSF and GAF do not account for repression, the authors sought a DNA-binding factor that could selectively target these genes. They focused on CLAMP (Chromatin-linked adaptor for MSL complex proteins) for several reasons. First, CLAMP recognizes GA-rich DNA motifs similar to those bound by GAF, suggesting it could compete with GAF at regulatory elements and shift transcriptional outcomes. Second, CLAMP has been shown to antagonize GAF binding in certain genomic contexts, suggesting it could counteract activation mechanisms. Third, CLAMP interacts with Negative Elongation Factor (NELF), a factor known to regulate transcriptional repression during heat stress. Finally, CLAMP promotes long-range chromatin interactions, indicating it may influence local chromatin architecture during the heat-stress response. Together, these properties led the authors to hypothesize that CLAMP helps mediate heat-stress-induced transcriptional repression of constitutively active genes.

      To test this hypothesis, the authors use immunofluorescence along with three techniques: (1) nascent RNA-sequencing (SLAM-seq) to define the function of CLAMP in heat stress-induced transcriptional activation and repression; (2) Micro-C to identify CLAMP-dependent and independent genome-wide, high-resolution local changes in chromatin organization after heat stress, and (3) HiChIP to identify CLAMP-bound 3D chromatin loop anchors associated with heat-stress-dependent transcriptional regulation.

      Analysis of heat-shocked salivary glands or KC cells showed results that aligned across all experiments, indicating that CLAMP is the primary repressor of gene activation upon heat shock. CLAMP also inhibits chromatin loop formation.

      Strengths:

      The techniques used here are comprehensive, and impressively, the data is unambiguous.

      Weaknesses:

      These techniques do not reveal the molecular mechanisms, but the authors provide a strong rationale and molecular hypotheses for future studies to dissect.

    3. Reviewer #2 (Public review):

      In this manuscript, Aguilera et al. investigate the mechanisms underlying transcriptional repression of constitutively expressed genes during heat stress. While the activation of heat-shock genes has been extensively studied, the mechanisms responsible for widespread transcriptional repression remain poorly understood. The authors propose that the GA-binding transcription factor CLAMP acts as a major regulator of heat-stress-induced transcriptional repression in Drosophila. Using nascent RNA-sequencing approaches, they report that CLAMP contributes to the repression of a large fraction of genes whose transcription decreases upon heat stress. In addition, the authors generate high-resolution Micro-C datasets to analyze changes in chromatin architecture during heat stress and report widespread alterations in chromatin looping associated with transcriptional changes. Based on these results, the study proposes that CLAMP regulates repression through both direct transcriptional mechanisms and modulation of local 3D genome architecture.

      The study addresses an important question in gene regulation: how transcription is rapidly repressed during environmental stress. The work is timely because most previous studies have focused on transcriptional activation of heat-shock genes, whereas repression mechanisms remain comparatively less understood. The integration of transcriptional profiling with high-resolution chromatin conformation data is a major strength of the manuscript and provides a valuable resource for the community studying genome organization and stress responses.

      The nascent RNA-sequencing experiments appear carefully designed and allow the authors to capture rapid transcriptional responses to heat stress. These data provide convincing evidence that heat stress leads to widespread transcriptional repression of constitutive genes and that CLAMP contributes substantially to this process. The genomic analyses linking CLAMP binding to repressed genes are also informative and support the idea that CLAMP plays a direct regulatory role at many loci.

      Another strength of the study is the generation of Micro-C datasets under heat stress conditions. These datasets provide a high-resolution view of chromatin architecture and reveal dynamic changes in local chromatin looping associated with transcriptional responses. The authors' analysis suggests that heat stress induces widespread reorganization of chromatin contacts, and that CLAMP may contribute to these structural changes. This dataset is likely to be useful for future studies exploring how environmental cues influence genome organization.

      Despite these strengths, several aspects of the study would benefit from further clarification. First, the mechanism by which CLAMP mediates transcriptional repression remains insufficiently defined. While the data support a role for CLAMP in the repression of a subset of genes during heat stress, the molecular basis of this effect is not fully explored. Second, although the Micro-C dataset represents a valuable resource for studying chromatin architecture during heat stress, the functional interpretation of the observed structural changes could be further developed. In particular, it would be helpful to better establish the relationship between the identified chromatin loops and gene regulation, and to clarify whether these structural changes play a causal role in transcriptional repression or instead reflect broader chromatin reorganization associated with the stress response.

    4. Reviewer #3 (Public review):

      Summary:

      Exposure to heat shock results in major changes to gene expression programs within the cell, and over the past decades, there has been extensive characterization of the mechanisms through which heat shock activates transcription. However, heat shock also leads to widespread repression of many genes, and the transcriptional mechanisms that mediate this repression have not been well understood. Here, the authors show that the transcription factor CLAMP mediates this heat shock-dependent repression via changes in local 3D chromatin looping. Intriguingly, CLAMP is already bound to chromatin prior to heat shock, but is necessary for the loss of local chromatin loops at its bound sites and repression of genes located within the loops. This study is significant because it defines chromatin looping, depending on a key transcription factor CLAMP, as the major mechanism through which genome-wide changes in gene repression occur in response to an inducible stimulus, heat shock.

      Strengths:

      The use of the SLAM-seq and Micro-C techniques to measure the necessity of CLAMP for heat shock-dependent transcription repression and local chromatin looping is excellent, and these approaches provide valuable insight into the role of CLAMP in heat shock-dependent repression that was not apparent with older approaches. The HiChIP approach provides an excellent method to test whether CLAMP is bound at sites where there are changes in looping upon heat shock, providing good support for their conclusions that CLAMP induces heat shock repression by decreasing loops. Appropriate controls are present, and there is robust statistical analysis of the bioinformatics data.

      Weaknesses:

      The study does not provide insight into how CLAMP mechanistically affects loops upon heat shock, although the discussion raises the possibility that this could result from biophysical changes since CLAMP is an intrinsically disordered protein.

    1. eLife Assessment

      This study investigates how the HIV inhibitor lenacapavir influences capsid mechanics and interactions with the nuclear pore complex. It provides important insights into how drug-induced hyperstabilization of the viral shell can compromise its structural integrity during nuclear entry. The modeling is technically sophisticated, and the analyses provide solid support for the mechanistic conclusions.

    2. Reviewer #1 (Public review):

      The paper from Hudait and Voth details a number of coarse-grained simulations as well as some experiments focused on the stability of HIV capsids in the presence of the drug lenacapavir. The authors find that LEN hyperstabilizes the capsid, making it fragile and prone to breaking inside the nuclear pore complex.

      I found the paper interesting. I have a few suggestions for clarification and/or improvement.

      (1) How directly comparable are the NPC-capsid and capsid-only simulations? A major result rests on the conclusion that the kinetics of rupture are faster inside the NPC, but are the numbers of LENs bound identical? Is the time really comparable, given that the simulations have different starting points? I'm not really doubting the result, but I think it could be made more rigorous/quantitative.

      (2) Related to the above, it is stated on page 12 that, based on the estimated free-energy barrier, pentamer dissociation should occur in ~10 us of CG time. But certainly, the simulations cover at least this length of time?

      (3) At first, I was surprised that even in a CG simulation, LEN would spontaneously bind to the correct site. But if I read the SI correctly, LEN was parameterized specifically to bind to hexamers and not pentamers. This is fine, but I think it's worth describing in the main text.

      Comments on revisions:

      I found that the authors addressed my concerns satisfactorily. The other reviewer raised a number of important points regarding the nuances of the model and the interpretation of the simulations, which the authors rebutted. I think the paper in its current form now is a worthwhile addition to the literature.

    3. Reviewer #3 (Public review):

      I have carefully reviewed the manuscript, the two referee reports, and the authors' detailed responses. I appreciate the substantial effort the authors have invested in addressing the reviewers' comments, and I also recognize the strength and ambition of the work. This is a technically sophisticated study that integrates coarse-grained modeling with live-cell imaging to address an important and timely question regarding HIV-1 capsid inhibition by lenacapavir.

      Embedded within Reviewer #2's report are several substantive points that warrant careful consideration, particularly with respect to framing, terminology, and engagement with the broader literature. I view my role here is to distinguish those issues from claims that I do not find to be supported.

      First, I do not agree with Reviewer #2's central assertion that the manuscript lacks novelty or fails to present meaningful new findings. While individual elements of the system studied here-capsid docking at the NPC, lenacapavir-induced capsid hyperstabilization, capsid rupture, and competition with FG- nucleoporins-have been observed previously, this work provides a coherent, mechanistic account of how these elements are coupled. In particular, the proposed sequence linking LEN-induced lattice hyperstabilization, preferential pentamer loss at the narrow end, NPC-induced mechanical stress, and failure of nuclear import represents a nontrivial integration that goes beyond prior phenomenological observations. I therefore do not view this work as redundant with existing literature.

      That said, Reviewer #2 is correct to note that the manuscript would benefit from broader and more explicit engagement with recent independent studies, including computational and hybrid modeling efforts that address capsid mechanics, nuclear entry, and LEN effects using different frameworks. While the authors' bottom-up coarse-grained approach is clearly distinct and, in many respects, more systematically derived, eLife readers would benefit from a clearer discussion of how the present results relate to, complement, or differ from these other approaches. I strongly encourage the authors to add a short discussion paragraph situating their work within this broader context, without disparaging alternative models.

      Second, I find that some mechanistic claims in the manuscript would benefit from more careful language distinguishing model-conditioned interpretation from de novo prediction. This applies in particular to discussions of LEN binding heterogeneity and stoichiometry, as well as to conclusions drawn from biased enhanced-sampling simulations. While I agree with the authors that parameterization does not invalidate mechanistic insight, it is important to be precise about what aspects of the behavior emerge from the simulations versus what is constrained by prior experimental knowledge. Modest tightening/revising of language (e.g., "suggests," "is consistent with," "within the model") would address this concern without weakening the scientific conclusions.

      Third, Reviewer #2 raises a legitimate semantic issue regarding the use of the term "elasticity." The manuscript infers changes in capsid mechanical response using heterogeneous elastic network models, which quantify effective stiffness and deformability rather than elasticity in the macroscopic materials sense. I recommend that the authors clarify this definition explicitly in the text to avoid confusion and unnecessary debate.

      Finally, I note that several of Reviewer #2's objections-particularly those asserting circular reasoning, misuse of enhanced sampling methods, or invalidity of coarse-grained predictions-reflect a misunderstanding of contemporary bottom-up coarse-grained modeling rather than genuine methodological flaws. I do not believe these points require further rebuttal or revision beyond what the authors have already provided.

      In summary, in my view, the manuscript represents a solid contribution to the field, provided that the authors undertake a limited set of targeted revisions aimed at improving framing, clarity, and engagement with the broader literature. Addressing these points will strengthen the manuscript and ensure that its contributions are clearly and fairly communicated to the community.

    4. Author response:

      The following is the authors’ response to the original reviews.

      It is important to make a few key points about our work. First, our paper is largely a computational biophysics paper, augmented by experimental results. Generally speaking, computational biophysics work intends to achieve one of two things (or both). One is to provide more molecular level insight into various behaviors of biomolecular systems that have not been (or cannot be) provided by qualitative experimental results alone. The second general goal of computational biophysics it to formulate new hypotheses to be tested subsequently by experiment. In our paper, we have achieved both of these goals and then confirmed the key computational results by experiment.

      eLife Assessment

      This study investigates how the HIV inhibitor lenacapavir influences capsid mechanics and interactions with the nuclear pore complex. It provides important insights into how drug-induced hyperstabilization of the viral shell can compromise its structural integrity during nuclear entry. While the modeling is technically sophisticated and the results are promising, some mechanistic interpretations rely on assumptions embedded in the simulations, leaving parts of the evidence incomplete.

      Given our response below, regarding the rigor and “completeness” of our work, we do not feel that an editorial judgement of “leaving parts of the evidence incomplete” is justified.

      We also note that another recent experimental paper has validated essentially every prediction made in our eLife paper: https://www.biorxiv.org/content/10.64898/2026.01.05.697065v1

      We thus disagree that the evidence we have presented in our paper is incomplete.

      Public Reviews:

      Reviewer #1 (Public review):

      The paper from Hudait and Voth details a number of coarse-grained simulations as well as some experiments focused on the stability of HIV capsids in the presence of the drug lenacapavir. The authors find that LEN hyperstabilizes the capsid, making it fragile and prone to breaking inside the nuclear pore complex.

      I found the paper interesting. I have a few suggestions for clarification and/or improvement. 

      (1) How directly comparable are the NPC-capsid and capsid-only simulations? A major result rests on the conclusion that the kinetics of rupture are faster inside the NPC, but are the numbers of LENs bound identical? Is the time really comparable, given that the simulations have different starting points? I'm not really doubting the result, but I think it could be made more rigorous/quantitative.

      We note (also in the manuscript) that it is difficult to compare the timescales obtained from coarse-grained MD simulations and experiments (“real time”) given that, by design, the CG simulations are accelerated to greatly enhance sampling. However, we can qualitatively compare the timescales of different CG simulations (without directly comparing the corresponding experimental timescales).

      We agree with the reviewer that the starting point of NPC-capsid and capsid-only simulations is different, as is the biological environment in which the rupture occurs. When analyzing the NPC-only and capsid-only simulations, what was striking to us was that at the NPC the capsid-LEN complex ruptures in a multicomponent environment, where several FG-NUPs compete to displace the LENs. It is well established in experiments that LEN has a detrimental effect on capsid integrity.

      In Figure 2, we plot the number of LEN molecules as a function of CG simulation time. The initial capsid-LEN complex was equilibrated without NPC and then placed at the cytoplasmic end of the NPC for docking. The number of LEN molecules for the capsid-only simulations and the NPC-docked simulations is nearly identical, and an insignificant number of LEN molecules unbind at the NPC. Hence, we added the following clarification:

      Page 10, paragraph 11

      “Note that the number of LEN molecules bound to the capsid for the free capsid and NPCdocked capsids are nearly identical. Hence, the disparity in timescale of lattice rupture is not only because of the effect of LEN on capsid lattice properties.”

      Is the time really comparable, given that the simulations have different starting points?

      Yes, the CG timescales of both the NPC and freely diffusing capsid unbiased simulations are comparable, since they were done using identical simulation settings.

      (2) Related to the above, it is stated on page 12 that, based on the estimated free-energy barrier, pentamer dissociation should occur in ~10 us of CG time. But certainly, the simulations cover at least this length of time?

      Our implicit solvent CG MD simulations are designed to access timescales far beyond the capabilities of the fully atomistic simulations. We reiterate here that it is difficult to directly compare the timescales obtained from CG MD simulations and experiments.

      As described in the text, there are 12 pentamers in the capsid (7 in the wide end and 5 in the narrow end). For the narrow end to rupture, all 5 pentamers should progressively dissociate. In our unbiased simulations (Fig. S5), in 25 us of CG time, we observe (partial) dissociation of one or two pentamers. Hence, our unbiased CG simulation timescales were not long enough to observe rupturing of the narrow end.

      (3) At first, I was surprised that even in a CG simulation, LEN would spontaneously bind to the correct site. But if I read the SI correctly, LEN was parameterized specifically to bind to hexamers and not pentamers. This is fine, but I think it's worth describing in the main text.

      We modified (see below) the main text to include the details.

      Page 4, paragraph 1

      “We model LEN and CA interactions such that LEN molecules can only bind to CA hexamers, and all interactions to CA pentamers are turned off, as in experiments, CA selectively associates with hexamers (25, 36).”

      Reviewer #2 (Public review):

      Here, Hudait et al. use CG modeling to investigate the mechanism by which Lenacapavir (LEN) treats HIV capsids that dock to the nuclear pore complex (NPC). However, the manuscript fails to present meaningful findings that were previously unreported in the literature and is thus of low impact. Many claims made in the manuscript are not substantiated by the presented data. Key mechanistic details that the work purports to reveal are artifacts of the parameterization choices or simulation/analysis design, with the simulations said to reveal details that they were specifically biased to reproduce. This makes the manuscript highly problematic, as its contributions to the literature would represent misconceptions based on oversights in modeling and thus mislead future readers. 

      We strongly disagree with these statements, and they do not reflect the facts. We provide a rebuttal to these statements in the “Author Response” statements below.

      (1) Considering the literature, it is unclear that the manuscript presents new scientific discoveries. The following are results from this paper that have been previously reported:

      (a) LEN-bound capsid can dock to the nuclear pore (Figure 2; see e.g. 10.1016/j.cell.2024.12.008 or 10.1128/mbio.03613-24). 

      (b) NUP98 interacts with the docked capsid (Figure 2; see e.g. 10.1016/j.virol.2013.02.008 or 10.1038/s41586-023-06969-7 or 10.1016/j.cell.2024.12.008). 

      (c) LEN and NUP98 compete for a binding interface (Figure 2; see e.g. 10.1126/science.abb4808 or 10.1371/journal.ppat.1004459). 

      (d) LEN creates capsid defects (Figure 3 and 5, see e.g. 10.1073/pnas.2420497122). 

      (e) RNP can emerge from a damaged capsid (Figure 3 and 5; see e.g. 10.1073/pnas.2117781119 or 10.7554/eLife.64776). 

      (f) LEN hyperstabilizes/reduces the elasticity of the capsid lattice (Figure 6; see e.g. 10.1371/journal.ppat.1012537). 

      The goal of our simulations (in combination with experiments from the Pathak group) is to provide molecular-level insight into the sequence of events of NPC docking of capsid and the effect of LEN binding leading to sequential dissociation of pentamers and leading to rupturing of the narrow end of the cone-shaped capsid. We also compare the events leading to capsid rupture at the NPC with the same for a freely diffusing capsid, akin to that in cytoplasm. The reviewer should carefully read the abstract of our paper. In fact, the above are all papers that present qualitative experimental results that help validate our model, but they do not provide details on the molecule-scale events. For example, the paper (10.1073/pnas.2420497122 written by our coauthors in the Pathak group) is extensively used to compare the behavior of LEN-bound capsid in the cytoplasm.

      (2) The mechanistic findings related to how these processes occur are problematic, either based on circular reasoning or unsubstantiated, based on the presented data. In some cases, features of parameterization and simulation/analysis design are erroneously interpreted as predictions by the CG models. 

      We strongly disagree with this assessment. Our CG NPC model is largely a “bottomup” model derived from molecular scale interactions sampled in atomistic simulations (see our previous paper in PNAS https://doi.org/10.1073/pnas.2313737121). The reviewer appears to be ignorant of the “bottom-up” approach based on rigorous statistical mechanics to derive moleculescale model (please refer to a detailed review on bottom-up coarse-graining: J. Chem. Theory. Comput., 2022, 18. 5759-5791).

      Using the “bottom-up” CG model of the NPC, we predicted several molecular-level details of capsid import and docking to the NPC. Our key predictions were that there is an intrinsic capsid lattice elasticity and also the pleomorphic nature of the NPC channel is key for successful capsid docking https://doi.org/10.1073/pnas.2313737121). Our computational predictions have benn, for example, validated in a recently published paper by an experimental group: Hou, Z., Shen, Y., Fronik, S. et al. HIV-1 nuclear import is selective and depends on both capsid elasticity and nuclear pore adaptability. Nat Microbiol 10, 1868–1885 (2025). https://doi.org/10.1038/s41564025-02054-z). Our work is an excellent example of how systematically derived “bottom-up” CG models can accurately predict molecular details of complex biological processes.

      We have now added the following statement:

      Page 3, Paragraph 1

      “Importantly, the computational predictions of capsid docking to the NPC central channel have been recently validated in a HIV-1 core import at the NPC using cryo-ET (33), demonstrating how systematically derived “bottom-up” CG models can accurately predict molecular details of complex biomolecular processes.”

      (a) Claim: LEN-bound capsids remain associated with the NPC after rupture. CG simulations did not reach the timescale needed to demonstrate continued association or failure to translocate, leaving the claim unsubstantiated.

      The reviewer fails to recognize that the statement is based on the experimental results of LEN-bound capsid that remains bound to the NPC after rupture and fails to translocate to the nuclear side (from the Pathak group in the section “Ruptured LEN-viral complexes remain bound to the NPC”). The Reviewers’ comment is incorrect. 

      (b) Claim: LEN contributes to loss of capsid elasticity. The authors do not measure elasticity here, only force constants of fluctuations between capsomers in freely diffusing capsids. Elasticity is defined as the ability of a material to undergo reversible deformation when subjected to stress. Other computational works that actually measure elasticity (e.g., 0.1371/journal.ppat.1012537) could represent a point of comparison but are not cited. The changes in force constants in the presence of LEN are shown in Figure 6C, but the text of the scale bar legend and units of k are not legible, so one cannot discern the magnitude or significance of the change.

      The concept of elasticity can extend down to the mesoscopic scale. Many examples can be found in the large number of elastic network models (ENMs) of proteins published by many authors. The reviewer also fails to comprehend the meaning of the effective spring constants in the HeteroENM model and how they relate to the response of the capsid to stress (e.g., in the NPC). Note, in the NPC central channel, the capsid encounters several nucleoporins (including disordered FG Nucleoporins that not have specific interactions to rest of the proteins), and also a confined environment. This environment can exert inward stress to the capsid, which is also reflected in stress on the capsid lattice. Furthermore, the cited computational AFM studies are very far from a realistic in vivo or even in vitro set of conditions. In contrast, our study presents a realistic environment which the capsid will encounter in NPC, and then these predictions are validated by experimental results.

      (c) Claim: Capsid defects are formed along striated patterns of capsid disorder. Data is not presented that correlates defects/cracks with striations. 

      We presented the data of formation of striated patterns of lattice stress in the capsid that runs from capsid narrow end to the wide end in coarse-grained model (https://doi.org/10.1073/pnas.2313737121), and atomistic model (https://doi.org/10.1073/pnas.2117781119). Both of our papers are extensively cited in the current manuscript. Also, when the capsid is ruptured, one cannot visualize the striated patterns.

      (d) Claim: Typically 1-2 LEN, but rarely 3 bind per capsid hexamer. The authors state: "The magnitude of the attractive interactions was adjusted to capture the substoichiometric binding of LEN to CA hexamers (Faysal et al., 2024). ... We simulated LEN binding to the capsid cone (in the absence of NPC), which resulted in a substoichiometric binding (~1.5 LEN per CA hexamer), consistent with experimental data (Singh et al., 2024)." This means LEN was specifically parameterized to reproduce the 1-2 binding ratio per hexamer apparent from experiments, so this was a parameterization choice, not a prediction by CG simulations as the authors erroneously claim: "This indicates that the probability of binding a third LEN molecule to a CA hexamer is impeded, likely due to steric effects that prevent the approach of an incoming molecule to a CA hexamer where 2 LEN molecules are already associated. ... Approximately 20% of CA hexamers remain unoccupied despite the availability of a large excess of unbound LEN molecules. This suggests a heterogeneity in the molecular environment of the capsid lattice for LEN binding." These statements represent gross over-interpretation of a bias deliberately introduced during parameterization, and the "finding" represents circular reasoning. Also, if "steric effects" play any role, the authors could analyze the model to characterize and report them rather than simply speculate.

      Reviewer comment: “This means LEN was specifically parameterized to reproduce the 1-2 binding ratio per hexamer apparent from experiments, so this was a parameterization choice, not a prediction by CG simulations as the authors erroneously claim.” – This comment by reviewer is deeply flawed and we strongly disagree. In our CG model there is no restriction on the number of LEN molecules that can bind to a CA hexamer. We again restate that, the experimental results on LEN binding to CA hexamers and inability of LEN to bind to pentamers were used as no allatom (AA) forcefield yet exists.

      The steric effect of the lack of third LEN binding to a hexamer is a likely hypothesis (which one is allowed to make). More importantly, an investigation of the steric effect of LEN binding to the CA hexamer is not the main goal of the manuscript.

      (e) Claim: Competition between NUP98 and LEN regulates capsid docking. The authors state: "A fraction of LEN molecules bound at the narrow end dissociate to allow NUP98 binding to the capsid ... Therefore, LEN can inhibit the efficient binding of the viral cores to the NPC, resulting in an increased number of cores in the cytoplasm." Capsid docking occurs regardless of the presence of LEN, and appears to occur at the same rate as the LEN-free capsid presented in the authors' previous work (Hudait &Voth, 2024). The presented data simply show that there is a fluctuation of bound LEN, with about 10 fewer (<5%) bound at the end of the simulation than at the beginning, and the curve (Figure 2A) does not clearly correlate with increased NUP98 contact. In that case, no data is shown that connects LEN binding with the regulation of the docking process. Further, the two quoted statements contradict each other. The presented data appear to show that NUP outcompetes LEN binding, rather than LEN inhibiting NUP binding. The "Therefore" statement is an attempt to reconcile with experimental studies, but is not substantiated by the presented data.

      We disagree with this spurious statement, and we see no real contradiction. We have now added a minor clarification that LEN can inhibit efficient capsid binding at significantly high concentration.

      Page 6, Paragraph 1

      “Therefore, at significantly high concentration LEN can inhibit the efficient binding of the viral cores to the NPC, resulting in an increased number of cores in the cytoplasm.”

      (f) Claim: LEN binding leads to spontaneous dissociation of pentamers. The CG simulation trajectories show pentamer dissociation. However, it is quite difficult to believe that a pentamer in the wide end of the capsid would dissociate and diffuse 100 nm away before a hexamer in the narrow end (previously between two pentamers and now only partially coordinated, also in a highly curved environment, and further under the force of the extruding RNA) would dissociate, as in Figure 2B. A more plausible explanation could be force balance between pent-hex versus hex-hex contacts, an aspect of CG parameterization. No further modeling is presented to explain the release of pentamers, and changes in pent-hex stiffness are not apparent in the force constant fluctuation analysis in Figure 6C.

      This is both a misrepresentation of the simulations and a failure to understand them (as well as the supporting experiments) on the part of the reviewer. In the presence of LEN, the hexameric lattice is hyperstabilized. In contrast, the pentamers are not. As a consequence, the pentamers are dissociated. The pentamers at the narrow end are dissociated first, due to high curvature. The reviewer, from a point of being uninformed, simply speculates on what they think should happen. Moreover, as emphasized earlier and which the reviewer fails to comprehend is that ours is a “bottom-up CG model” so it predicts, not builds in, these effects.

      (g) Claim: WTMetaD simulations predict capsid rupture. The authors state: "In WTMetaD simulations, we used the mean coordination number (Figure S6) between CA proteins in pentamers and in hexamers as the reaction coordinate." This means that the coordination number, the number of pent-hex contacts, is the bias used to accelerate simulation sampling. Yet the authors then interpret a change in coordination number leading to capsid rupture as a discovery, representing a fundamental misuse of the WTMetaD method. Changes in coordination number cannot be claimed as an emergent property when they are in fact the applied bias, when the simulation forced them to sample such states. The bias must be orthogonal to the feature of interest for that feature to be discoverable. While the reported free energies are orthogonal to the reaction coordinate, the structural and stepwise-mechanism "findings" here represent circular reasoning.

      Unfortunately, the reviewer appears to be quite uninformed on the WTMetaD method and what it does. The chosen collective variable (CV) in our case is the coordination variable and the MetaD samples along that variable (the conditional free energy) as it is designed to do. The reviewer may wish to educate themself by reading Dama et al (https://doi.org/10.1103/PhysRevLett.112.240602). We also note that “emergent properties” are not along some other, uncoupled coordinate.

      (3) Another major concern with this work is the excessive self-citation, and the conspicuous lack of engagement with similar computational modeling studies that investigate the HIV capsid and its interactions with LEN, capsid mechanical properties relevant to nuclear entry, and other capsidNPC simulations (e.g., 10.1016/j.cell.2024.12.008 and 10.1371/journal.ppat.1012537). Other such studies available in the literature include examination of varying aspects of the system at both CG and all-atom levels of resolution, which could be highly complementary to the present work and, in many cases, lend support to the authors' claims rather than detract from them. The choice to omit relevant literature implies either a lack of perspective or a lack of collegiality, which the presentation of the work suffers from. Overall, it is essential to discuss findings in the context of competing studies to give readers an accurate view of the state of the field and how the present work fits into it. It is appropriate in a CG modeling study to discuss the potential weaknesses of the methodology, points of disagreement with alternative modeling studies, and any lack of correlation with a broader range of experimental work. Qualitative agreement with select experiments does not constitute model validation. 

      We disagree with this statement and point out where we have cited other work, including the ones mentioned above. However, our CG model is a largely bottom-up CG model which differs from other more ad hoc CG approaches (and some well-known CG models). We do not wish to emphasize the obvious flaws in those other CG approaches and models, since that is not the focus of our manuscript.

      (4) Other critiques, questions, concerns:

      (a) The first Results sub-heading presents "results", complete with several supplementary figures and a movie that are from a previous publication about the development of the HIV capsid-NPC model in the absence of LEN (Hudait &Voth, 2024). This information should be included as part of the introduction or an abbreviated main-text methods section rather than being included within Results as if it represents a newly reported advancement, as this could be misleading. 

      The movie in question (capsid docking to NPC without LEN) is essential for comparison of LEN-binding dynamics. Different from our previous paper, we simulated significantly longer timescales of capsid docking and performed several additional analyses that is relevant to this paper. Moreover, the first section of the result is titled “Coarse-grained modeling and simulation”, hence we only present a summary of the CG models and key validation steps in this section.

      (b) The authors say the unbiased simulations of capsid-NPC docking were run as two independent replicates, but results from only one trajectory are ever shown plotted over time. It is not mentioned if the time series data are averaged or smoothed, so what is the shadow in these plots (e.g., Figures 1,2, and Supplementary Figure 5)?

      These simulations are the average from two replicas. “For all the plots, the solid lines are the mean values calculated from the time series of two independent replicas, and the shaded region is the standard deviation at each timestep.” This was mentioned in the original figure caption.

      (c) Why do the insets showing LEN binding in Figure 2A look so different from the models they are apparently zoomed in on? Both instances really look like they are taken from different simulation frames, rather than being a zoomed-in view.

      It is difficult to discern a high curvature region of the capsid due to object overlap of different regions of the capsid. This is likely a case of “perspective distortion” in image processing.

      (d) What are the sudden jerks apparent in the SI movies? Perhaps this is related to the rate at which trajectory frames are saved, but occasionally, during the relatively smooth motion of the capsidNPC complex, something dramatic happens all of a sudden in a frame. For example, significant and apparently instantaneous reorientation of the cone far beyond what preceding motions suggest is possible (SI movie 2, at timestamp 0.22), RNP extrusion suddenly in a single frame (SI movie 2, at timestamp 0.27), and simultaneous opening of all pentamers all at once starting in a single frame (SI movie 2, at timestamp 0.33). This almost makes the movie look generated from separate trajectories or discontinuous portions of the same trajectory. If movies have been edited for visual clarity (e.g., to skip over time when "nothing" is happening and focus on the exciting aspects), then the authors should state so in the captions. 

      This is due to the rate at which trajectory frames are saved for movie generation for faster processing of the movies. We added the following in movie caption: 

      “The movie frames correspond to snapshots every 250000 𝜏<sub>CG</sub>.” 

      (e) Figure 3c presents a time series of the degree of defects at pent-hex and hex-hex interfaces, but I do not understand the normalization. The authors state, "we represented the defects as the number of under-coordinated CA monomers of the hexamers at the pentamer-hexamer-pentamer and hexamer-hexamer interface as N_Pen-Hex and N_Hex-Hex ... Note that in N_Pen-Hex and N_Hex-Hex are calculated by normalizing by the total number of CA pentamer (12) and hexamer rings (209) respectively." Shouldn't the number of uncoordinated monomers be normalized by the number of that type of monomer, rather than the number of capsomers/rings? E.g., 12*5 and 209*6, rather than 12 and 209?

      We prefer to continue with the current normalization, since typically in the HIV-1 literature capsids are represented as a collection of hexamers and pentamers (rather than total number of CA monomers).

      (f) The authors state that "Although high computational cost precluded us from continuing these CG MD simulations, we expect these defects at the hexamer-hexamer interface to propagate the high curvature ends of the capsid." The defects being reported are apparently propagating from (not towards) the high curvature ends of the capsid. 

      We corrected the statement as follows:

      “Although high computational cost precluded us from continuing these CG MD simulations, we expect these defects at the hexamer-hexamer interface to propagate from the high curvature to low curvature end of the capsid.”

      (g) The first half of the paper uses the color orange in figures to indicate LEN, but the second half uses orange to indicate defects, and this could be confusing for some readers. Both LEN and "defects" are simply a cluster of spheres, so highlighted defects appear to represent LEN without careful reading of captions.

      We only show LEN in Figure 1, and in rest of the figures the bound LEN molecules are not shown for clarity. The defects are shown in a darker shade of orange (amber). 

      (h) SI Figure S3 captions says "The CA monomers to which at least one LEN molecule is bound are shown in orange spheres. The CA monomers to which no LEN molecule is bound are shown in white spheres. " While in contradiction, the main-text Fig 2 says "The CA monomers to which at least one LEN molecule is bound are shown in white spheres. The CA monomers to which no LEN molecule is bound are shown in orange spheres. " One of these must be a typo.

      We have corrected the erroneous caption in Fig. S3. The color scheme in Fig. 2 and Fig. S3 are now consistent.

      (i) The authors state that: "CG MD simulations and live-cell imaging demonstrate that LEN-treated capsids dock at the NPC and rupture at the narrow end when bound to the central channel and then remain associated to the NPC after rupture." However, the live cell imaging data do not show where rupture occurs, such that this statement is at least partially false. It is also unclear that CG simulations show that cores remain bound following rupture, given that simulations were not extended to the timescale needed to observe this, again rendering the statement partially false.

      We modified the statement as follows:

      “CG MD simulations complemented by the outcome of live-cell imaging demonstrate that LENtreated capsids dock at the NPC and rupture at the narrow end when bound to the central channel and then remain associated with the NPC after rupture.”

      (j) The authors state: "We previously demonstrated that the RNP complex inside the capsid contributes to internal mechanical strain on the lattice driven by CACTD-RNP interactions and condensation state of RNP complex (Hudait &Voth, 2024). " In that case, why do the present CG models detect no difference in results for condensed versus uncondensed RNP?

      In our previous paper, the difference from condensation state of RNP complex appear only in the pill-shaped capsid, and not in the cone-shaped capsid. In this manuscript, we only investigated the cone-shaped capsid.

      (k) The authors state: "The distribution demonstrates that the binding of LEN to the distorted lattice sites is energetically favorable. Since LEN localizes at the hydrophobic pocket between two adjoining CA monomers, it is sterically favorable to accommodate the incoming molecule at a distorted lattice site. This can be attributed to the higher available void volume at the distorted lattice relative to an ordered lattice, the latter being tightly packed. This also allows the drug molecule to avoid the multitude of unfavorable CA-LEN interactions and establish the energetically favorable interactions leading to a successful binding event. " What multitude of unfavorable interactions are the authors referring to? Data is not presented to substantiate the claim of increased void volume between hexamers in the distorted lattice. Capsomer distortion is shown as a schematic in Figure 6A rather than in the context of the actual model.

      “What multitude of unfavorable interactions are the authors referring to?” We have now added the following sentence to clarify

      “Here we denote unfavorable CA-LEN interactions as all interactions other than the electrostatic and van der Waal interactions that lead to CA-LEN binding (17).”

      “In the distorted lattice, there is an increase of void volume is based on standard solid-state physics understanding. We added the word “likely” in the statement. “. This can likely be attributed to the higher available void volume at the distorted lattice relative to an ordered lattice, the latter being tightly packed (41).”

      Moreover, in one of our previous manuscripts, we established that compressive or expansive strain induces more closely packed or expanded lattice (A. Yu et al., Strain and rupture of HIV-1 capsids during uncoating. Proceedings of the National Academy of Sciences 119, e2117781119 (2022)).

      (l) The authors state that "These striated patterns also demonstrate deviations from ideal lattice packing. " What does ideal lattice packing mean in this context, where hexamers are in numerous unique environments in terms of curvature? What is the structural reference point?

      The ideal lattice packing definition is provided in our previous manuscripts: 1. A. Yu et al., Strain and rupture of HIV-1 capsids during uncoating. Proceedings of the National Academy of Sciences 119, e2117781119 (2022), 2. A. Hudait, G. A. Voth, HIV-1 capsid shape, orientation, and entropic elasticity regulate translocation into the nuclear pore complex. Proceedings of the National Academy of Sciences 121, e2313737121 (2024).

      These manuscripts are cited in the previous statement. The ideal lattice packing is defined based on lattice separations in each core (in cryo-ET and atomistic simulations) using a local order parameter, which measures the near-neighbor contacts of a particle. Moreover, the ideal packing reference is calculated from all available capsid shapes (cone, ellipsoid, and tubular), and takes into account different curvatures.

      (m) If pentamer-hexamer interactions are weakened in the presence of LEN, why are differences at these interfaces not apparent in the Figure 6C data that shows stiffening of the interactions between capsomer subunits?

      We have added a statement as follows:

      “Based on our analysis, we hypothesize that LEN binding hyperstabilzes the CA hexamerhexamer interactions relative to CA hexamer-pentamer interaction.”

      (n) The authors state: "Lattice defects arising from the loss of pentamers and cracks along the weak points of the hexameric lattice drive the uncoating of the capsid." The word rupture or failure should be used here rather than uncoating; it is unclear that the authors are studying the true process of uncoating and whether the defects induced by LEN binding relate in any way to uncoating. 

      We have now changed “uncoating” to “rupture” throughout the manuscript.

      (o) The authors state: " LEN-treated broken cores are stabilized by the interaction with the disordered FG-NUP98 mesh at the NPC." But no data is presented to demonstrate that capsid stability is increased by NUP98 interaction. In fact, the presented data could suggest the opposite since capsids in contact with NUP98 in the NPC appeared to rupture faster than freely diffusing capsids.

      We have modified the statement as follows

      “We hypothesize that LEN-treated broken cores are stabilized by the interaction with the disordered FG-NUP98 mesh at the NPC.”

      (p) The authors state: "LEN binding stimulates similar changes in free capsids, but they occur with lower frequency on similar time scales, suggesting that the cores docked at the NPC are under increased stress, resulting in more frequent weakening of the hexamer-pentamer and hexamerhexamer interactions, as well as more nucleation of defects at the hexamer-hexamer Interface. ... Our results suggest that in the presence of the LEN, capsid docking into the NPC central channel will increase stress, resulting in more frequent breaks in the capsid lattice compared to free capsids." The first is a run-on sentence. The results shown support that LEN stimulates changes in free capsids to happen faster, but not more frequently. The frequency with which an event occurs is separate from the speed with which the event occurs.

      We have fixed the run-on sentence.

      The results shown support that LEN stimulates changes in free capsids to happen faster, but not more frequently. The frequency with which an event occurs is separate from the speed with which the event occurs.

      We disagree with the reviewer. The statement was intended to provide a comparison between free capsid and NPC-bound capsid.

      (q) The authors state: "A possible mechanistic pathway of capsid disassembly can be that multiple pentamers are dissociated from the capsid sequentially, and the remaining hexameric lattice remains stabilized by bound LEN molecules for a time, before the structural integrity of the remaining lattice is compromised." This statement is inconsistent with experimental studies that say LEN does not lead to capsid disassembly, and may even prevent disassembly as part of its disruption of proper uncoating (e.g., 10.1073/pnas.2420497122 previously published by the authors).

      We disagree with the interpretation of the reviewer. Our interpretation based on our results is LEN binding accelerates capsid rupture (from pentamer-rich high curvature ends), and the rest of the broken hexameric lattice is hyperstabilized. Ultimately, lattice rupture will lead to release the RNP, and hence the intended goal of the drug is achieved.

      (r) Finally, it remains a concern with the authors' work that the bottom-up solvent-free CG modeling software used in this and supporting works is not open source or even available to other researchers like other commonly used molecular dynamics software packages, raising significant questions about transparency and reproducibility.

      The simulations were performed in LAMMPS, which is open source. This software is already stated in the Methods. Input data is provided upon request.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Figure 1: In part B, it appears the middle panel was screenshotted from a ppt, given the red line underneath Lenacapavir. You can export it to an image instead.

      The figure is fixed.

      (2) Figure 6: In part A, the LEN_d in the graph is illegible. Also, in the panel next to it, it also appears to have been screenshotted from a ppt.

      The figure is fixed.

      (3) Page 6: There's an errant quotation mark at the end of a paragraph.

      Removed the errant quotation

      Reviewer #2 (Recommendations for the authors):

      The code used to perform bottom-up solvent-free CG modeling simulations is not made available.

      This is not true. LAMMPS was used as stated in Methods.

    1. eLife Assessment

      In this fundamental work Horne et al present compelling evidence that YbjP is a novel binding partner of the TolC channel protein. The YbjP is characterized using cryo-EM, and its role probed using pull-down experiments, in vivo crosslinking, functional assays along with phylogenetic analysis which are all properly performed and presented and support the main conclusions. While the study does not identify a clear role for this protein, the revised manuscript offers improved clarity and contributes invaluable insight into membrane transport and antimicrobial resistance.

    2. Reviewer #2 (Public review):

      This article focuses on the study of two E. coli tripartite efflux pumps, both using TolC as a partner in the outer membrane, namely MacAB-TolC and AcrABZ-TolC.

      By preparing MacAB-TolC in Peptidiscs rather than in detergent for cryo-EM structure determination, they visualized an extra protein localized around TolC. The resolution was sufficient to build part of the structure, and using the AlphaFold2 database and DALI topology recognition program, they identified it as the lipoprotein YbjP. This protein has an anchorage in the outer membrane, and it was suggested that it could act as a support for TolC, which is the only OMF that does not have an N-terminal extension anchored in the outer membrane, which is very puzzling for the community working in this field of research.

      Authors used a large number of different approaches to evaluate the importance of YbjP (structure, genomic evolution, microbiology, photocrosslink in vivo, proteomic profile), but did not succeed in finding it a clear role so far, even if it could be important depending on environmental stress. Nevertheless, their results, obtained with extreme rigour, are of main interest for the comprehension of the complexity of such systems and deserve publication.

      Comments on revisions:

      Thank you for clarifying the points that puzzled me concerning the crosslink experiments. This version does not need further modifications.

    3. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      The presentation and especially main-text illustrative material seem to focus disproportionately on MacAB-TolC-YbjP complex, and the AcrABZ-TolC-YbjP is relegated to supplementary data which is somewhat confusing. There is no high-resolution side view of the AcrABZ-TolC-YbjP side-by-side to MacAB-TolC-YbjP which may be helpful to spot parallels and differences in the organisation of the two systems.

      This was previously presented in Supplementary Figure S2. However, because the models were shown at a small scale, we have now included the comparison in a main manuscript (Figure 4). This figure presents AcrABZ-TolC-YbjP and MacAB-TolC-YbjP side-by-side, a structural alignment of TolC-YbjP in the two pumps, and close-up views of the interaction interface.

      Supplementary Figure 2 may also be better presented in the main text, as it shows specific displacements of residues upon binding of the YbjP relative to the apo-complexes, although this can be left at the authors' discretion.

      We added more text to describe the displacements of residues upon YbjP binding: ‘Nonetheless, the side chains of a few residues in TolC, which mainly correspond to positively charged amino acids (R18, R24, K214, R227, R234), reorient to interact with the YbjP lipoprotein partner (Figure 2B).’

      Reviewer #1 (Recommendations for the authors):

      The work is of high quality and requires minimal modifications, which are mentioned as suggestions above and are mostly connected to the illustrative material.

      One additional suggestion, which is connected to the earlier BioRxiv preprint, the data seen in Fig 6 of the preprint seems to have been edited out from the current version, and perhaps can be included in a revised version, as it seems to support the "rapid adaptation under stress" role for YbjP, which currently is only speculatively mentioned in p.11, line 365 of the manuscript.

      We acknowledge that the BioRxiv preprint Figure 6 can support the rapid adaptation under stress role for YbjP. However, upon sequencing the ΔybjP strain from the Keio collection used in the preprint, we identified a large deletion in the yecT-flhD region. We therefore generated a new ΔybjP strain without the yecT-flhD deletion and repeated the experiment. However, the results with the corrected strain did not support the previous conclusion, and these data were consequently removed in the current manuscript.

      Reviewer #2 (Public review):

      In Figure 3C, the experiment performed with AcrA is clear and the extra band appears at the proper size. On the right panel, it is clear that the crosslink doesn't work when pBPA is placed on residues too far from TolC. Only when introduced on N113 or T110 does a band appear.

      This is in accordance with an interaction in vivo. Nevertheless, 17 + 54 = 71kDa, which is more than the two bands appearing on the gel. This difference in size migration can occur, but it is not clear when looking at Figure S3. In Figure S3a, the purified proteins are highlighted at approximately the expected size (≈20kDa instead of 17 for YbjP and between 56 and 60kDa in two bands for TolC instead of 54kDa). On the right panel, it seems that the bands are present exactly at the same position, instead of an upper band as expected for the crosslinked YbjP-TolC (at 71kDa). It would be clearer if having the control of the same sample without illumination, revealed by anti-TolC, to see the difference.

      We thank the reviewer for pointing out this discrepancy. We identified an error in the molecular weight ladder, as one band was missing. This has now been corrected: YbjP migrates just below 17 kDa, consistent with Figure 3C. In addition, we previously reported a size of 54 kDa for TolC, whereas matured TolC, after signal peptide cleavage, is actually 52 kDa.

      We believe that the differences in the apparent molecular weight observed in Figures 3A, 3C and S3 (now S2) mainly result from tagging and post-translation modifications.

      In Figure 3A, we used the soluble construct His-YbjP<sub>28-1711</sub> (theoretical M<sub>w</sub> ~18 kDa), as also done for the controls in Figures 3C and S3 (now S2). However, for the crosslinking samples, we used full-length His-tagged YbjP, which carries a post-translational lipid modification (theoretical M<sub>w</sub> ~19 kDa, considering the protein lipidation). The presence of the lipid chains alters the migration as this species migrates at ~15 kDa (Fig 3A). Increased hydrophobicity, due here to YbjP lipidation, could accelerate the migration (Emmanuel et al. 2025 FEBS Open Bio).

      In Figure 3A, we used the TolC-FLAG whose apparent M<sub>w</sub> is ~52 kDa, as previously reported (Fig S3, Fitzpatrick et al. 2017). In Figure S3 (now S2), we used His-tagged TolC (theoretical M<sub>w</sub> 55 kDa) for the control, which migrates above 56 kDa. In the crosslinking samples, however, we detect tag-free, endogenous TolC, with a theoretical M<sub>w</sub> of ~51 kDa.

      In conclusion, the crosslinked complex composed of lipidated FL YbjP (~15 kDa) and endogenous TolC (~51 kDa) would be expected to migrate at ~66 kDa, which is consistent with what is observed in Figures 3C and S3 (now S2).

      A second point that could be discussed further is the comparison of the structure of the pump in the presence of the peptidoglycan with the images previously obtained by tomography. It is not totally clear to me if YbjP could have been positioned in these maps.

      There is density corresponding to YbjP in the map obtained in the presence of peptidoglycan. To improve clarity, we have specified the location of the peptidoglycan relative to the pumps in the revised Figure 4, and Supplementary Figure S4, together with the position of YbjP. In both figures, the lipoprotein appears distant from the peptidoglycan density.

      Reviewer #2 (Recommendations for the authors):

      In addition, please add explanations in the legend of Figure 3C concerning the structures.

      We added the following description of the structures: ‘As shown underneath, AcrA residues Q136 and Y137, proximal to TolC in the structure of the AcrABZ-TolC pump (PDB 5NG5), were replaced by pBPA. For YbjP, the two residues N113 and T110 proximal to TolC in the MacAB-TolC-YbjP complex (PDB 9QGY) and the three residues N43, N90 and H104 distal to TolC were mutated.’

      It would be clearer if having the control of the same sample without illumination, revealed by anti-TolC, to see the difference.

      As the amount of crosslinked material is low, samples were enriched via His-tag purification of YbjP prior to Western blotting. In the absence of illumination (see sample N113, UV-), no crosslink would be formed, and therefore TolC would not be co-purified.

      In addition, some typo errors have been noted.

      Table S1 minus is missing for the defocus range for AcrABZ-TolC-YbjP.

      Thank you for noting the typo. We have added the minus sign.

      Table S3, please specify what is N in the legend.

      N is the stoichiometry parameter, which is now specified in the table legend.

      Line 237, I suppose it has to refer to Figure S6, not S5.

      Thank you for noting the error. We have verified the text matches the figures here and in the entire manuscript.

      Several errors are present in the legend of Figure 6.

      No letters are indicated for the different panels; line 841 must be C, F and I; the indicated colors for the differentially expressed proteins do not correspond to the volcano plots.

      Thank you for suggesting the improvements for the labels. We have modified the plot accordingly.

      Reference Glavier 2020 has been cited as Glacier on line 72.

      We have modified the writing accordingly and checked the reference.

    1. eLife Assessment

      This is an important study that takes a key step towards understanding developmental disorders linked to mutations in the O-GlcNAc transferase enzyme by generating a mouse model harboring the C921Y mutation. While the mechanisms remain open, the study thoroughly examines behavioral and anatomical differences in these mice and provides convincing evidence for behavioral hyperactivity and learning/memory deficits, as well as phenotypic differences in skull and brain formation. This study will be of interest to those studying neurodevelopmental disorders and associated mechanisms.

    2. Reviewer #1 (Public review):

      This study established a C921Y OGT-ID mouse model, systematically demonstrating in mammals the pathological link between O-GlcNAc metabolic imbalance and neurodevelopmental disorders (cortical malformation, microcephaly) as well as behavioral abnormalities (hyperactivity, impulsivity, learning/memory deficits). Researchers comprehensively assessed the model phenotype through integrated multi-level analysis methods, including long-term behavioral monitoring, high-resolution brain structural imaging (micro-CT and MRI), histopathology, and quantitative proteomics.

      The core strength of this study lies in its multimodal experimental design. The evidence chain spanning in vivo behavior, brain structure, and molecular characteristics demonstrates high consistency and correlation. Of particular note is the combination of non-invasive behavioral tracking with quantitative neuroimaging techniques, providing objective validation for the observed phenotypes. The findings support the authors' core conclusion: O-GlcNAc homeostasis imbalance correlates with neurodevelopmental deficits, including structural abnormalities in specific brain regions and altered cognitive behaviors. Furthermore, this model reproduces certain clinical features observed in human patients.

      Nevertheless, several avenues remain open for further exploration. For instance, sample sizes in certain omics analyses remain relatively small, and investigations into downstream molecular mechanisms are still confined to the level of correlation-direct causal validation through genetic or pharmacological interventions is still required. Furthermore, as this model focuses on a single recurrent mutation, the generalizability of its findings to other OGT-ID variants remains to be verified.

      It provides the first actionable vertebrate model for neurodevelopmental disorders with unclear mechanisms, filling a critical gap in this field. The multidimensional research methods established in the paper-such as the digital behavioral phenotyping workflow-also offer valuable references for related disease studies.

    3. Reviewer #2 (Public review):

      Summary:

      The authors are trying to understand why certain mutants of O-GlcNAc transferase (OGT) appear to cause developmental disorders in humans. As an important step towards that goal, the authors generated a mouse model with one of these mutations that disrupts OGT activity. They then go on to test these mice for behavioral differences, finding that the mutant mice exhibit some signs of hyperactivity and differences in learning and memory. They then examine alterations to the structure of the brain and skull, and again find changes in the mutant mice that have been associated with developmental disorders. Finally, they identify proteins that are up or down regulated between the two mice as potential mechanisms to explain the observations.

      Strengths:

      The major strength of this manuscript is the creation of this mouse model, as a key step in beginning to understand how OGT mutants cause developmental disorders. This line will prove important for not only the authors but other investigators as well, enabling the testing of various hypotheses and potentially treatments. The experiments are also rigorously performed and the conclusions are well supported by the data.

      Weaknesses:

      The only weakness is a lack of mechanistic insight. However, this certainly may come in the future through more targeted experimentation using this mouse model. I do not recommend that these experiments need to be performed in this manuscript.

      Comments on revisions:

      The authors have addressed all of my suggestions proactively.

    4. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      This study established a C921Y OGT-ID mouse model, systematically demonstrating in mammals the pathological link between O-GlcNAc metabolic imbalance and neurodevelopmental disorders (cortical malformation, microcephaly) as well as behavioral abnormalities (hyperactivity, impulsivity, learning/memory deficits). However, critical flaws in the current findings require resolution to ensure scientific rigor.

      The most concerning finding appears in Figure S12. While Supplementary Figure S12 demonstrates decreased OGA expression without significant OGT level changes in C921Y mutants via Western blot/qPCR, previous reports (Florence Authier, et al., Dis Model Mech. 2023) described OGT downregulation in Western blot and an increase in qPCR in the same models. The opposite OGT expression outcomes in supposedly identical mouse models directly challenge the model's reliability. This discrepancy raises serious concerns about either the experimental execution or the interpretation of results. The authors must revalidate the data with rigorous controls or provide a molecular biology-based explanation.

      We thank the reviewer for their time and effort in improving the quality of our manuscript.

      We would like to point out that the results presented in the previous Fig. S12 (now Fig. S13) are from different ages of the mice and restricted to the prefrontal cortex, compared to the previous report (Florence Authier, et al., Dis Model Mech. 2023) where we showed OGT and OGA mRNA/protein expression in total brain homogenates. In this previous study, we observed a significant reduction in OGT protein levels while OGT mRNA levels were significantly increased in the brains of 3 months old mutant C921Y compared to WT controls. However, in our current study (Figure S12, now S13), OGA and OGT mRNA/protein expression have been a) restricted to the pre-frontal cortex and b) are from 4 months old male mice. Therefore, a direct comparison of findings from total brain vs. prefrontal cortex would be speculative. In our present work, OGT protein levels are not changed in the pre-frontal cortex, while OGT mRNA levels are increased (similarly to the total brain data), albeit not significantly.

      It is plausible that the different levels of OGT protein expression in total brain (previous study) and prefrontal cortex (current study) potentially reflect regional differences in the regulation of OGT protein levels/stability, since OGT mRNA levels are increased in both cases. This notion is also supported by additional analyses in three other brain regions (hippocampus, striatum and cerebellum) and these data are now included in Figures S13 and S14.

      A few additional comments to the author may be helpful to improve the study.

      Major

      (1) While this study systematically validated multi-dimensional phenotypes (including neuroanatomical abnormalities and behavioral deficits) in OGT C921Y mutant mice, there is a lack of relevant mechanisms and intervention experiments. For example, the absence of targeted intervention studies on key signaling pathways prevents verification of whether proteomics-identified molecular changes directly drive phenotypic manifestations.

      We agree with the reviewer that the suggested experiments would further strengthen our work. However, the extensive nature of the suggested studies would result in considerable delay in sharing this work with the scientific and patient communities. Nevertheless, we appreciate the reviewers’ comment and will continue to work along these lines, and report in a follow up manuscript in the future.

      (2) Although MRI detected nodular dysplasia and heterotopia in the cingulate cortex, the cellular basis remains undefined. Spatiotemporal immunofluorescence analysis using neuronal (NeuN), astrocytic (GFAP), and synaptic (Synaptophysin) markers is recommended to identify affected cell populations (e.g., radial glial migration defects or intermediate progenitor differentiation abnormalities).

      Following the reviewers’ suggestion, we have performed additional analyses to identify the cellular composition of the observed nodular dysplasia using neuronal and glial markers. These new analyses indicate that the nodular collections in the layers II/III were predominantly neurons, for example see cresyl violet (Fig. 6E). Moreover, we have also performed immunofluorescence imaging using NeuN and GFAP (Fig. 6G-H), which reflect that the dystrophic collections are predominantly neurons. To further corroborate these findings, we have also performed multiplex IHC analyses, presented in Fig. S12, which indicate that: i) the nodular cortical malformations were populated by neurons and oligodendrocytes and ii) predominantly affected layers II-V, as reflected by the distribution of neuronal markers Reelin and POU class 3 homeobox 2 (POU3F2), and collectively (Fig. 6 and Fig. S12) reflect neuronal disorganisation due to migration defects rather than differentiation defects. We appreciate the reviewers’ suggestion to perform spatiotemporal analyses of these cellular features; however, tissue from defined stages of development is not available. 

      (3) While proteomics revealed dysregulation in pathways including Wnt/β-catenin and mTOR signaling, two critical issues remain unresolved: a) O-GlcNAc glycoproteomic alterations remain unexamined; b) The causal relationship between pathway changes and O-GlcNAc imbalance lacks validation. It is recommended to use co-immunoprecipitation or glycosylation sequencing to confirm whether the relevant proteins undergo O-GlcNAc modification changes, identify specific modification sites, and verify their interactions with OGT.

      We agree with the referee that these experiments would further strenghten the work. However, we respectfully point out that the inference that altered proteins must themselves be O-GlcNAc modified is not necessarily correct. For instance, O-GlcNAcylation of unknown protein kinase X, E3 ligase/DUB, Y or transcription factor Z could indirectly affect these pathways/proteins. Nevertheless, we have performed further experiments to explore whether Wnt/β-catenin and mTOR signalling are functionally affected, as pointed out by the referee. In the qPCR analyses, we did not observe significant changes in expression of Wnt target genes (Cdkn1a, Ccnd1, Myc, Ramp3, Tfrc), neither in protein levels of key proteins involved in Wnt/β-catenin (non-phosphorylated β-catenin) and mTOR (phosphorylated rpS6) signalling by western blots (data not shown). These results suggest that both pathways are not functionally deregulated in prefrontal cortex of adult OGT<sup>C921Y</sup> mice to a significant extent.

      (4) Given that OGT-ID neuropathology likely originates embryonically, we recommend serial analyses from E14.5 to P7 to examine cellular dynamics during critical corticogenesis phases.

      We appreciate the reviewers’ suggestion to perform spatiotemporal analyses of these cellular dynamics; however, tissue from defined stages of development is not available. As stated above, we want to share our current findings with the scientific and patient communities in a timely manner, and the suggested experiments could form the foundation of a follow up study in the future.

      (5) The interpretation of Figure 8A constitutes overinterpretation. Current data fail to conclusively demonstrate impairment of OGT's protein interaction network and lack direct evidence supporting the proposed mechanisms of HCF1 misprocessing or OGA loss.

      Thank you for the comment. To avoid misleading the readers, we have removed panel A from the previous version of Figure 8 and updated the version of record.

      Reviewer #2 (Public review):

      Summary:

      The authors are trying to understand why certain mutants of O-GlcNAc transferase (OGT) appear to cause developmental disorders in humans. As an important step towards that goal, the authors generated a mouse model with one of these mutations that disrupts OGT activity. They then go on to test these mice for behavioral differences, finding that the mutant mice exhibit some signs of hyperactivity and differences in learning and memory. They then examine alterations to the structure of the brain and skull and again find changes in the mutant mice that have been associated with developmental disorders. Finally, they identify proteins that are up- or down-regulated between the two mice as potential mechanisms to explain the observations.

      Strengths:

      The major strength of this manuscript is the creation of this mouse model, as a key step in beginning to understand how OGT mutants cause developmental disorders. This line will prove important for not only the authors but other investigators as well, enabling the testing of various hypotheses and potentially treatments. The experiments are also rigorously performed, and the conclusions are well supported by the data.

      Weaknesses:

      The only weakness identified is a lack of mechanistic insight. However, this certainly may come in the future through more targeted experimentation using this mouse model.

      We agree with the reviewer that the suggested experiments would further strengthen our work. However, the extensive nature of the suggested studies would result in considerable delay in sharing this work with the scientific and patient communities. Nevertheless, we appreciate the reviewers’ comment and will continue to work along these lines, and report in a follow up manuscript in the future.

      Recommendations for the authors:

      Editor's note:

      Should you choose to revise your manuscript, if you have not already done so, please include full statistical reporting including exact p-values wherever possible alongside the summary statistics (test statistic and df) and, where appropriate, 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05 in the main manuscript.

      Statistics including exact p-values have been included in the main text for all key questions where appropriate.

      Reviewer #1 (Recommendations for the authors):

      (1) In Figure 1F, the y-axis labels and scale values are partially obscured by graphical elements, compromising accurate interpretation of the data range.

      Panel 1F has been adjusted to make the y-axis label visible.

      (2) Regarding the histological analyses in Figure 6, the current H&E staining and Luxol Fast Blue myelin staining results lack age-matched wild-type control samples processed in parallel, which undermines experimental comparability. To enhance methodological rigor, control group staining results should be displayed adjacent to each experimental group image.

      The original Figure 6 already contained comparison between WT and OGT<sup>C921Y</sup> tissues. The Figure has been updated with additional data from the WT and C921Y mutant groups shown side by side.

      Reviewer #2 (Recommendations for the authors):

      (1) I believe that Figures S1 and S2 were switched during the submission. The legends are correct, so the authors should just be careful with the order when they upload the final versions.

      Figures S1 and S2 have been re-ordered.

      (2) On page 18, the authors state, "Although no significant changes in the expression of OGT were observed in OGTC921Y cortex (Figure S12A, C), there was a significant increase in OGT/OGA protein ratio in OGTC921Y mice (Fig. S12D). As a functional consequence, global O-GlcNAcylation of proteins in the brain was drastically impaired in the OGTC921Y brain compared to WT (Figure S12E, F).

      To me, this statement suggests that the incorrect ratio of OGT to OGA is responsible for the altered O-GlcNAc levels. I think this is missing important information. The authors are, I'm sure, aware that OGT and OGA expression is linked to O-GlcNAc levels. I think it would be better to describe the situation here as the tissue attempting to respond to lower OGT activity by lowering OGA levels. However, the tissue is not fully successful, resulting in lower overall O-GlcNAc levels as seen by RL2. If the difference were only driven by the OGT/OGA ratio, one would expect increased O-GlcNAc levels due to decreased OGA. I think it is important to point out more details here for non-expert readers.

      Thank you for the insightful comment, we have included these aspects in the revised text, please see page 20.

      (3) I am a little surprised that the authors did not explore differences in O-GlcNAc-modified proteins through a more targeted enrichment of these proteins for analysis of potential modification differences, in addition to just changes in protein abundance.

      We agree that these experiments would further strengthen the work. However, it is not known yet whether OGT-CDG is caused by loss of O-GlcNAc modification on specific proteins or due to as yet to decipher mechanisms (e.g. OGT interactome, HCF1 processing, feedback on OGA levels) which we are not able to confirm in the current manuscript. Therefore, as a starting point, we have performed whole proteome analysis to establish candidate hypothesis which could lead to discovering cellular and molecular mechanisms underlying OGT-CDG. Lastly, we appreciate the reviewers’ comment and will continue to work along these lines, and report in a follow up manuscript in the future.

    1. eLife Assessment

      This important study presents a compelling link between nutrient signaling and chromosome regulation, demonstrating that reduced activity in a central nutrient-sensing pathway improves chromosome stability and alters gene expression through effects on cohesin. The convincing evidence from a combination of genetic, biochemical and cell biological approaches supports a model in which TORC1-dependent phosphorylation of Mis4 and the cohesin subunit Psm1/Smc1 can modulate cohesin loading to enhance faithful chromosome transmission. While the underlying mechanisms and biological importance of this newly described circuit are not yet fully known, the overall body of evidence is strong and supports the main conclusions.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, Besson et al. investigate how environmental nutrient signals regulate chromosome biology through the TORC1 signaling pathway in Schizosaccharomyces pombe. Specifically, the authors explore the impact of TORC1 on cohesin function-a protein complex essential for chromosome segregation and transcriptional regulation. Through a combination of genetic screens, biochemical analysis, phospho-proteomics, and transcriptional profiling, they uncover a functional and physical interaction between TORC1 and cohesin. The data suggest that reduced TORC1 activity enhances cohesin binding to chromosomes and improves chromosome segregation, with implications for stress-responsive gene expression, especially in subtelomeric regions.

      Strengths:

      This work presents a compelling link between nutrient sensing and chromosome regulation. The major strength of the study lies in its comprehensive and multi-disciplinary approach. The authors integrate genetic suppression screens, live-cell imaging, chromatin immunoprecipitation, co-immunoprecipitation, and mass spectrometry to uncover the functional connection between TORC1 signaling and cohesin. The use of phospho-mutant alleles of cohesin subunits and their loader provides mechanistic insight into the regulatory role of phosphorylation. The addition of transcriptomic analysis further strengthens the biological relevance of the findings and places them in a broader physiological context. Altogether, the dataset convincingly supports the authors' main conclusions and opens up new avenues of investigation.

      Points that remain open but are appropriately discussed by the authors:

      (1) The authors propose that nutrient status influences cohesin regulation. While this is not directly tested under defined nutrient conditions (e.g., by systematically examining cohesin dynamics or phosphorylation across nutrient states), the rationale is well explained in the text, and the study provides a strong foundation for addressing this question in future work.

      (2) The upstream signaling cascade downstream of TORC1 remains to be fully elucidated. In particular, the identity of the relevant kinases (e.g., whether Sck1/Sck2 or other effectors are involved) and whether TORC1 directly phosphorylates Mis4 or Psm1 are not resolved. The authors acknowledge these mechanistic gaps, which represent logical next steps rather than shortcomings of the current study.

    3. Reviewer #2 (Public review):

      Summary:

      In this study the authors follow up on a previous suppressor screen of a temperature-sensitive allele of mis4 (mis4-G1487D), the cohesin loading factor in S. pombe, and identify additional suppressor alleles tied to the S. pombe TORC1 complex. Their analysis suggests that these suppressor mutations attenuate TORC1 activity while enhanced TORC1 activity is deleterious in this context. Suppression of TORC1 activity also ameliorates chromosome segregation and spindle defects observed in the mis4-G1487D strain, although some more subtle effects are not reconstituted. The authors provide evidence that this genetic suppression is also tied to the reconstitution of cohesin loading. Moreover, disrupting TORC1 also enhances Mis4/cohesin association with chromatin (likely reflecting enhanced loading) in WT cells while rapamycin treatment can enhance the robustness of chromosome transmission. These effects likely arise directly through TORC1 or its downstream effector kinases as TORC1 co-purifies with Mis4 and Rad21; these factors are also phosphorylated in a TORC1-dependent fashion. Disrupting Sck2, a kinase downstream of TORC1, also suppresses the mis4-G1487D allele while simultaneous disruption of Sck1 and Sck2 enhances cohesin association with chromatin, albeit with differing effects on phosphorylation of Mis4 and Psm1/Scm1. Phosphomutants of Mis4 and Psm1 that mimic observed phosphorylation states identified by mass spectrometry that are TORC1-dependent also suppressed phenotypes observed in the mis4-G1487D background. Lastly, the authors provide evidence that the mis4-G1487D background and TORC1 mutant backgrounds display an overlap in the dysregulation of genes that respond to environmental conditions.

      Overall, the authors provide compelling evidence from genetics, biochemistry and cell biology to support a previously unknown mechanism by which nutrient sensing regulates cohesin loading with implications for the stress response. The technical approaches are generally sound, well-controlled, and comprehensive.

      The specific points that I raised in the first review have been addressed by changes/additions to the manuscript or have been determined to be beyond the scope of the study by the authors.

      One major question that remains open is the relationship between local changes in cohesin loading and gene expression through this TORC1 regulatory signaling pathway and the details of the underlying mechanisms.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this study, Besson et al. investigate how environmental nutrient signals regulate chromosome biology through the TORC1 signaling pathway in Schizosaccharomyces pombe. Specifically, the authors explore the impact of TORC1 on cohesin function - a protein complex essential for chromosome segregation and transcriptional regulation. Through a combination of genetic screens, biochemical analysis, phospho-proteomics, and transcriptional profiling, they uncover a functional and physical interaction between TORC1 and cohesin. The data suggest that reduced TORC1 activity enhances cohesin binding to chromosomes and improves chromosome segregation, with implications for stress-responsive gene expression, especially in subtelomeric regions.

      Strengths:

      This work presents a compelling link between nutrient sensing and chromosome regulation. The major strength of the study lies in its comprehensive and multi-disciplinary approach. The authors integrate genetic suppression screens, live-cell imaging, chromatin immunoprecipitation, co-immunoprecipitation, and mass spectrometry to uncover the functional connection between TORC1 signaling and cohesin. The use of phospho-mutant alleles of cohesin subunits and their loader provides mechanistic insight into the regulatory role of phosphorylation. The addition of transcriptomic analysis further strengthens the biological relevance of the findings and places them in a broader physiological context. Altogether, the dataset convincingly supports the authors' main conclusions and opens up new avenues of investigation.

      Weaknesses:

      While the study is strong overall, a few limitations are worth noting. The consistency of cohesin phosphorylation changes under different TORC1-inhibiting conditions (e.g., genetic mutants vs. rapamycin treatment) is unclear and could benefit from further clarification. The phosphorylation sites identified on cohesin subunits do not match known AGC kinase consensus motifs, raising the possibility that the modifications are indirect. The study relies heavily on one TORC1 mutant allele (mip1-R401G), and additional alleles could strengthen the generality of the findings. Furthermore, while the results suggest that nutrient availability influences cohesin function, this is not directly tested by comparing growth or cohesin dynamics under defined nutrient conditions.

      We thank the reviewer for his overall positive assessment and constructive criticism. We broadly agree with the few limitations he pointed out, which we will comment on below.

      (1) The consistency of cohesin phosphorylation changes under different TORC1-inhibiting conditions (e.g., genetic mutants vs. rapamycin treatment) is unclear and could benefit from further clarification.

      The basis of our study was to search for suppressor mutants, a situation in which an unviable strain becomes viable. It turns out that the suppressor mutants affect TORC1, necessarily in a partial manner given that TORC1 kinase activity is essential for proliferation. Likewise rapamycin partially inhibits TORC1 and does not prevent proliferation of wild-type S. pombe cells. TORC1 mutants cause a constitutive decrease in activity with possible adaptive effects, whereas rapamycin is applied for a single cell cycle. In addition, it is known that bona fide TORC1 substrates respond differently to rapamycin. Some phosphosites show acute sensitivity, while others are less sensitive or even insensitive (Kang et al., 2013, PMID: 23888043). Therefore, both hypomorphic TORC1 genetic mutants and rapamycin treatment result in partial inhibition of TORC1 kinase activity. While the lists of affected TORC1 substrates may overlap, they are unlikely to be identical. Furthermore, the phosphorylation level of the relevant substrates is not necessarily altered to the same extent. Nevertheless, both conditions suppress the heatsensitive phenotype of the mis4 mutant, although the suppressor effect of rapamycin is weaker. Consequently, some phosphorylation sites involved in mis4-ts suppression may behave similarly in rapamycin and TORC1 mutants (i.e. Psm1-S1022), while others (i.e. Mis4-183) may behave differently.

      It is clear that there are phenotypic differences between the suppression of mis4-ts by rapamycin treatment or by genetic alteration of TORC1. This can be seen also in our ChIP analysis of Rad21 distribution at CARs. The trend is upward, but the pattern is not identical. We have added the following text to summarize the above considerations:

      “It is important to note at this stage that, although rapamycin and TORC1 mutants both decrease TORC1 kinase activity, the two are not equivalent. The mechanisms by which TORC1 kinase activity is reduced are different, and TORC1 mutants suppress the mis4G1487D phenotype more effectively than rapamycin. It is known that bona fide TORC1 substrates respond differently to rapamycin. Some phosphosites show acute sensitivity, while others are less sensitive or even insensitive (Kang et al, 2013). TORC1 mutants cause a constitutive decrease in activity with possible adaptive effects, whereas rapamycin is applied for a single cell cycle. While the lists of affected TORC1 substrates may overlap, they are unlikely to be identical. Furthermore, the phosphorylation level of the relevant substrates is not necessarily altered to the same extent. It is therefore remarkable that negative regulation of TORC1 by rapamycin or a genetic mutation both alleviate mis4G14878D phenotypes and have a fairly similar effect on cohesin dynamics.”

      (2) The phosphorylation sites identified on cohesin subunits do not match known AGC kinase consensus motifs, raising the possibility that the modifications are indirect.

      The genetic and biochemical analyses provided in this study show that the AGC kinases Sck1 and Sck2 influence cohesin phosphorylation and function. Whether Sck1, Sck2 or TORC1 directly phosphorylates cohesin components are the next questions to address. The fact that the phosphorylation of Psm1-S1022 and Mis4-S183 were never abolished in the sck1-2 mutants may suggest they are indirectly involved. This should be taken with caution because we have been using deletion mutants. In this situation, cells adapt and other kinases may substitute, at least partially (Plank et al, 2020, PMID: 32102971). Asking whether cohesin components display consensus sites for AGC kinases is a complementary approach. The consensus site for Sck1 and Sck2 is unknown. If we assume some conservation with budding yeast SCH9, the consensus sequence would be RRxS/T. Psm1S1022 (DQMSP) and Mis4-S183 (QLCSP) do not fit the consensus. However, this kind of information should be taken with care as many SCH9-dependent phosphorylation sites did not fall within the consensus in a study using analogue-sensitive AGC kinases and phosphoproteomics (Plank et al, 2020, PMID: 32102971). Alternatively, Sck1-2 may regulate other kinases. Indeed Psm1-S1022 and Mis4-183 lie within CDK consensus sites and Psm1-S1022 phosphorylation is Pef1-dependent. In summary, yes, the changes may be indirect, that remains to be seen, but in any case they are influenced by TORC1 signalling. The following paragraph was added:

      “The consensus site for Sck1 and Sck2 is unknown. If we assume some conservation with budding yeast SCH9, the consensus sequence would be RRxS/T. Psm1-S1022 (DQMSP) and Mis4-S183 (QLCSP) do not fit the consensus. However, this should be taken with care as many SCH9-dependent phosphorylation sites did not fall within the consensus in a study using analogue-sensitive AGC kinases and phosphoproteomics (Plank et al, 2020). Alternatively, Sck1-2 may regulate other kinases. Indeed Psm1-S1022 and Mis4-183 lie within CDK consensus sites and Psm1-S1022 phosphorylation is Pef1-dependent.”

      (3) The study relies heavily on one TORC1 mutant allele (mip1-R401G), and additional alleles could strengthen the generality of the findings.

      It is true that we focused our attention on mip1-R401G, which is present in all the experiments presented. That said, other alleles were used in one or more figures. Five mip1 alleles and one tor2 allele were identified as mis4-ts suppressors (Fig. 1). We have also shown that another mip1 allele, mip1-Y533A, created by another group (Morozumi et al, 2021), is also a suppressor of mis4-ts and affects the phosphorylation of Mis4-S183 and Psm1-S1022 (Fig. 1, Figure 5—figure supplement 1). To this we can add the effect of mutants that render TORC1 hyperactive (Fig. 1E, Fig. 2H) as well as AGC kinase mutants (Figure 5—figure supplement 3.). And finally, the effect of rapamycin. So yes, mip1-R401G has been used extensively, but we have still broadly covered the TORC1 signalling pathway.

      (4) Furthermore, while the results suggest that nutrient availability influences cohesin function, this is not directly tested by comparing growth or cohesin dynamics under defined nutrient conditions

      We agree that studying the dynamics of cohesin, genome folding and gene expression in relation to nutrient availability is a very exciting topic, and we hope to address these issues in detail in the future.

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors follow up on a previous suppressor screen of a temperaturesensitive allele of mis4 (mis4-G1487D), the cohesin loading factor in S. pombe, and identify additional suppressor alleles tied to the S. pombe TORC1 complex. Their analysis suggests that these suppressor mutations attenuate TORC1 activity, while enhanced TORC1 activity is deleterious in this context. Suppression of TORC1 activity also ameliorates chromosome segregation and spindle defects observed in the mis4-G1487D strain, although some more subtle effects are not reconstituted. The authors provide evidence that this genetic suppression is also tied to the reconstitution of cohesin loading. Moreover, disrupting TORC1 also enhances Mis4/cohesin association with chromatin (likely reflecting enhanced loading) in WT cells, while rapamycin treatment can enhance the robustness of chromosome transmission. These effects likely arise directly through TORC1 or its downstream effector kinases, as TORC1 co-purifies with Mis4 and Rad21; these factors are also phosphorylated in a TORC1-dependent fashion. Disrupting Sck2, a kinase downstream of TORC1, also suppresses the mis4-G1487D allele while simultaneous disruption of Sck1 and Sck2 enhances cohesin association with chromatin, albeit with differing effects on phosphorylation of Mis4 and Psm1/Scm1. Phosphomutants of Mis4 and Psm1 that mimic observed phosphorylation states identified by mass spectrometry that are TORC1-dependent also suppressed phenotypes observed in the mis4-G1487D background. Last, the authors provide evidence that the mis4-G1487D background and TORC1 mutant backgrounds display an overlap in the dysregulation of genes that respond to environmental conditions, particularly in genes tied to meiosis or other "stress".

      Overall, the authors provide compelling evidence from genetics, biochemistry, and cell biology to support a previously unknown mechanism by which nutrient sensing regulates cohesin loading with implications for the stress response. The technical approaches are generally sound, well-controlled, and comprehensive.

      Specific Points:

      (1) While the authors favor the model that the enhanced cohesin loading upon diminished TORC1 activity helps cells to survive harsh environmental conditions, as starvation of S. pombe also drives commitment to meiosis, it seems as plausible that enhanced cohesin loading is related to preparing the chromosomes to mate.

      (2) Related to Point 1, the lab of Sophie Martin previously published that phosphorylation of Mis4 characterizes a cluster of phosphotargets during starvation/meiotic induction (PMID: 39705284). This work should be cited, and the authors should interrogate how their observations do or do not relate to these prior observations (are these the same phosphosites?).

      We agree this is a possibility and the following paragraph was added in the discussion section:

      “TORC1-based regulation of cohesin may be relevant to preparing cells for meiosis. Since nitrogen deprivation stimulates meiosis initiation, subsequent TORC1 down-regulation may regulate the cohesin complex, preparing the chromosomes for fusion and meiosis. A recent phosphoproteomic study conducted by Sophie Martin's laboratory showed that Mis4-S107 phosphorylation increases during cellular fusion (Bérard et al, 2024). It is unknown whether the phosphorylation of S107 is controlled by TORC1 signalling. As the phosphorylation of Mis4-S183 and Psm1-S1022 was not detected in these experiments, the potential involvement of the TORC1-cohesin axis in the sexual programme remains to be investigated.”

      (3) It would be useful for the authors to combine their experimental data sets to interrogate whether there is a relationship between the regions where gene expression is altered in the mis4-G1487D strain and changes in the loading of cohesin in their ChIP experiments.

      (4) Given that the genes that are affected are predominantly sub-telomeric while most genes are not affected in the mis4-G1487D strain, one possibility that the authors may wish to consider is that the regions that become dysregulated are tied to heterochromatic regions where Swi6/HP1 has been implicated in cohesin loading

      We agree that it would be interesting to see if there are correlations between cohesin positioning, heterochromatin and gene expression. That said, this would need to be done at the whole-genome level and include many other parameters (genome folding, histone modifications, Pol2 occupancy). These issues require substantial investment and may be addressed in a follow-up project.

      (5) It would be helpful to show individual data points from replicates in the bar graphs - it is not always clear what comprises the data sets, and superplots would be of great help.

      We verified that the figure captions clearly indicate the data sets considered, their mean, standard deviation, and statistical analysis method. As for the type of plot, we used the tools at our disposal.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Besson et al. investigate how the nutrient-responsive TORC1 signaling pathway modulates cohesin function in S. pombe. Using a genetic screen, the authors identify TORC1 mutants that suppress the thermosensitive growth defects of a cohesin loader mutant (mis4-G1487D). They show that reducing TORC1 activity-either genetically or pharmacologically-enhances cohesin binding to chromosomal sites (CARs), improves chromosome segregation, and alters the phosphorylation state of cohesin and its loader. They also show, through coimmunoprecipitation, that TORC1 and cohesin physically associate, and that this functional interaction extends to the transcriptional regulation of stress-responsive, subtelomeric genes. Together, the data suggest that environmental cues influence chromosome stability and gene expression via a TORC1-cohesin axis.

      Overall, the study is well-supported by thoughtful genetic epistasis analyses and a combination of genetic, biochemical, cell biological, and transcriptomic approaches. While not all data are equally strong, the cumulative evidence convincingly supports the authors' conclusions.

      Specific Concerns and Suggestions

      (1) Figure 2A - Division rates of wild-type and mip1-R401G cells are missing and should be provided for proper comparison.

      This is now done in revised Figure 2A. We also made a change in the manuscript, replacing “The mip1-R401G mutation efficiently suppressed the proliferation and viability defects (Figure 2A)” by “The mip1-R401G mutation efficiently attenuated the proliferation and viability defects (Figure 2A)”, to acknowledge the fact that the proliferation rate did not return to wild-type levels.

      (2) Figure 3 - Figure Supplement 1 - The authors claim that "Rapamycin treatment during a single cell cycle provoked a similar effect although less pronounced." However, for most CARs, the effect appears insignificant. This should be acknowledged in the text.

      The text has been changed accordingly:

      “Rapamycin treatment during a single cell cycle provoked a similar stimulation of Rad21 binding at CARs (Figure 3—figure supplement 1), albeit with noticeable differences. In mis4+ cells, both mip1-R401G and rapamycin induced a significant increase in Rad21 binding at several CARs (tRNA-left, cc2, 3323, NTS, Tel1-R). However, some CARs that exhibited increased Rad21 binding in the mip1 mutant did not respond significantly to rapamycin (dg2-R, tRNA-R). Conversely, rapamycin (but not mip1-R401G) induced a significant increase in Rad21 binding at imr2-L and CAR1806 (Figure 3D and Figure 3— figure supplement 1). In the mis4-G1487D mutant background, mip1-R401G induced a significant increase in Rad21 binding at all examined sites (Figure 3B). Similarly, rapamycin did increase Rad21 binding at all sites but only at the Tel1-R site did this reach statistical significance (Figure 3—figure supplement 1).”

      (3) Figure 4 - The analysis of interactions between TORC1 and the cohesin complex is somewhat limited. The authors may wish to test interactions between Mip1 and cohesin subunits (e.g., Rad21). More interestingly, it would be valuable to explore whether MIP1 mutations that suppress cohesin mutants affect the interaction between Tor2 and Rad21.

      We have added some additional data that answer this question (Figure 4—figure supplement 1) and a paragraph in the manuscript:

      “Tor2, the kinase subunit of TORC1, is particularly well detected in Rad21 and Mis4 coimmunoprecipitation experiments (Figure 4 and Figure 4—figure supplement 1). To determine whether the R401G mutation in Mip1 affects these interactions, coimmunoprecipitation experiments were repeated in both the mip1-R401G and mip1+ contexts. The data obtained indicate that Tor2 co-immunoprecipitation with Mis4 and Rad21 is largely unaffected by the mip1-R401G mutation (Figure 4—figure supplement 1). If mip1-R401G affects the regulation of cohesin by TORC1, this does not appear to stem from a gross defect in their interaction, at least at this level of resolution.”

      (4) Figure 5 - There appears to be a lack of correlation between cohesin subunit phosphorylation in TORC1-reducing mutants and in response to rapamycin. The reason for this discrepancy is unclear.

      This point was addressed in the previous section (Public review, reviewer 1, point 1). The response is pasted below:

      The basis of our study was to search for suppressor mutants, a situation in which an unviable strain becomes viable. It turns out that the suppressor mutants affect TORC1, necessarily in a partial manner given that TORC1 kinase activity is essential for proliferation. Likewise rapamycin partially inhibits TORC1 and does not prevent proliferation of wild-type S. pombe cells. TORC1 mutants cause a constitutive decrease in activity with possible adaptive effects, whereas rapamycin is applied for a single cell cycle. In addition, it is known that bona fide TORC1 substrates respond differently to rapamycin. Some phosphosites show acute sensitivity, while others are less sensitive or even insensitive (Kang et al., 2013, PMID: 23888043). Therefore, both hypomorphic TORC1 genetic mutants and rapamycin treatment result in partial inhibition of TORC1 kinase activity. While the lists of affected TORC1 substrates may overlap, they are unlikely to be identical. Furthermore, the phosphorylation level of the relevant substrates is not necessarily altered to the same extent. Nevertheless, both conditions suppress the heatsensitive phenotype of the mis4 mutant, although the suppressor effect of rapamycin is weaker. Consequently, some phosphorylation sites involved in mis4-ts suppression may behave similarly in rapamycin and TORC1 mutants (i.e. Psm1-S1022), while others (i.e. Mis4-183) may behave differently.

      It is clear that there are phenotypic differences between the suppression of mis4-ts by rapamycin treatment or by genetic alteration of TORC1. This can be seen also in our ChIP analysis of Rad21 distribution at CARs. The trend is upward, but the pattern is not identical. We have added the following text to summarize the above considerations:

      “It is important to note at this stage that, although rapamycin and TORC1 mutants both decrease TORC1 kinase activity, the two are not equivalent. The mechanisms by which TORC1 kinase activity is reduced are different, and TORC1 mutants suppress the mis4G1487D phenotype more effectively than rapamycin. It is known that bona fide TORC1 substrates respond differently to rapamycin. Some phosphosites show acute sensitivity, while others are less sensitive or even insensitive (Kang et al, 2013). TORC1 mutants cause a constitutive decrease in activity with possible adaptive effects, whereas rapamycin is applied for a single cell cycle. While the lists of affected TORC1 substrates may overlap, they are unlikely to be identical. Furthermore, the phosphorylation level of the relevant substrates is not necessarily altered to the same extent. It is therefore remarkable that negative regulation of TORC1 by rapamycin or a genetic mutation both alleviate mis4G14878D phenotypes and have a fairly similar effect on cohesin dynamics.”

      (5) The phosphorylation sites examined on cohesin subunits are not canonical AGC kinase consensus motifs, suggesting they are unlikely to be direct targets of Sck1 or Sck2. I suggest that this point should be mentioned in the manuscript.

      This is now done:

      “The consensus site for Sck1 and Sck2 is unknown. If we assume some conservation with budding yeast SCH9, the consensus sequence would be RRxS/T. Psm1-S1022 (DQMSP) and Mis4-S183 (QLCSP) do not fit the consensus. However, this should be taken with care as many SCH9-dependent phosphorylation sites did not fall within the consensus in a study using analogue-sensitive AGC kinases and phosphoproteomics (Plank et al, 2020). Alternatively, Sck1-2 may regulate other kinases. Indeed Psm1-S1022 and Mis4-183 lie within CDK consensus sites and Psm1-S1022 phosphorylation is Pef1-dependent.”

      (6) Figure 5 - Figure Supplement 3 - The reduction in Psm1 phosphorylation in the sck1Δ sck2Δ double mutant is not convincing without replicates and statistical analysis.

      This is now done and the data are presented in Figure 5—figure supplement 3. Panel D shows the data for Psm1-S1022p and Panel E for Mis4-S183p. Each graph shows the mean ratios +/- SD from 3 experiments.

      (7) Figure 5C - It would be helpful if the authors validated the effect of pef1 deletion on Mis4 phosphorylation by Western blotting, rather than relying solely on mass spectrometry data.

      This is now done. The data appears in Figure 5—figure supplement 2, panel B.

      (8) The statement: "The frequency of chromosome segregation defects of mis4‐G1487D was markedly reduced in a sck2‐deleted background and further decreased by the additional deletion of sck1 (Figure 5-figure supplement 3)" is not supported by the data. According to the figure, the difference between sck2Δ and sck1Δ sck2Δ is not statistically significant.

      The sentence was changed to:

      “The frequency of chromosome segregation defects in the mis4-G1487D strain remained unchanged in a sck1-deleted background, but was significantly reduced when either the sck2 or both the sck1 and sck2 genes were deleted (Figure 5—figure supplement 3).”

      (9) Figure 6A - The data shown are not convincing. The double mutants carrying the phosphomimetic and phospho-null psm1 alleles should be shown on the same plate for direct comparison.

      This is now done. The new data are shown Figure 6A.

      (10) Figure 6E - The wild-type control is missing. Including it would provide an essential reference point to assess whether the mutants rescue cohesin binding to wild-type levels.

      This is true that the effects were small when compared to wild-type but still significant when compared to mis4-G1487D. The comparison with wild-type is now available in Figure 6—figure supplement 1 and the paragraph was modified accordingly:

      “Cohesin binding to CARs as assayed by ChIP tend to increase for the mutants mimicking the non-phosphorylated state and to decrease with the phospho-mimicking forms (Figure 6E). The rescue of mis4-G1487D by the non-phosphorylatable form was modest but significant, notably within centromeric regions (imr2-L, dg2-R) and at the telomere (Tel1-R) site (Figure 6E and see Figure 6—figure supplement 1 for comparison with wild-type levels). Conversely, the mutant mimicking the phosphorylated state displayed a significant reduction of Rad21 binding at those sites as well as to several other sites at the centromere (cc2, tRNA-R), CAR2898, and at the ribosomal non-transcribed spacer site NTS).”

      Limitations of the Study (not requiring additional experiments for publication, but worth noting).

      (11) The authors suggest that nutrient status affects cohesin, but this is not directly demonstrated-e.g., by comparing growth or cohesin dynamics or phosphorylation under defined nutrient conditions. That said, the paper is sufficiently detailed to allow this question to be addressed in follow-up work.

      We agree that studying the dynamics of cohesin, genome folding and gene expression in relation to nutrient availability is a very exciting topic, and we hope to address these issues in detail in the future.

      (12) The upstream signaling cascade remains unresolved. The identity of kinases downstream of TORC1 (e.g., whether Sck1/Sck2 or other factors are responsible) and whether TORC1 directly phosphorylates Mis4 or Psm1 are not established.

      This is something we can all agree on, and it might be something we look at in a future project.

      (13) The conclusions rely heavily on one TORC1 mutant allele (mip1-R401G). While this allele is informative, additional alleles or orthogonal methods could further support the generality of the findings.

      It is true that we focused our attention on mip1-R401G, which is present in all the experiments presented. That said, other alleles were used in one or more figures. Five mip1 alleles and one tor2 allele were identified as mis4-ts suppressors (Fig. 1). We have also shown that another mip1 allele, mip1-Y533A, created by another group (Morozumi et al, 2021), is also a suppressor of mis4-ts and affects the phosphorylation of Mis4-S183 and Psm1-S1022 (Fig. 1, Figure 5—figure supplement 1). To this we can add the effect of mutants that render TORC1 hyperactive (Fig. 1E, Fig. 2H) as well as AGC kinase mutants (Figure 5—figure supplement 3.) and finally, the effect of a transient treatment with rapamycin. So yes, mip1-R401G has been used extensively, but we have still broadly covered the TORC1 signalling pathway.

      Reviewer #2 (Recommendations for the authors):

      (1) Given the lack of CTCF in fission yeast, it is worth noting that cohesin ChIP data nonetheless can predict topological domains, which reinforces its important role in dictating chromatin folding (PMID: 39543681).

      We thank the reviewer for this suggestion. We now refer to this study in the discussion section.

      (2) Providing context for the S. pombe nomenclature for the conserved cohesin subunits would help the reader navigate the manuscript, possibly using a cartoon as for the TORC complexes. For example, Psm1 (aka Smc1) is not introduced and therefore its phosphorylation comes into the manuscript without explanation.

      Cohesin subunits and their names are given in the introduction section.

    1. eLife Assessment

      This convincing study examines a novel interaction of RAB5 with VPS34 complex II. Structural data are combined with site-directed mutagenesis, sequence analysis, biochemistry, yeast mutant analysis, and prior data on RAB1-VPS34 and RAB5-VPS34 interactions to provide a new perspective on how RAB GTPases recruit related but distinct VPS34 complexes to different organelles. The judgment is that this work represents a fundamental advance in our understanding of VPS34 localization and regulation.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript presents high resolution cryoEM structures of VPS34-complex II bound to Rab5A at 3.2A resolution. The Williams group previously reported the structure of VPS34 complex II bound to Rab5A on liposomes using tomography, and therefore the previous structure, although very informative, was at lower resolution.

      The first new structure they present is of the 'REIE>AAAA' mutant complex bound to RAB5A. The structure resembles the previously determined one except an additional molecule of RAB5A was observed bound to the complex in a new position, interacting with the solenoid of VPS15.

      Although this second binding site exhibited reduced occupancy of RAB5A in the structure, the authors determined an additional structure in which the primary binding site was mutated to prevent RAB5A binding ('REIE>ERIR'). In this structure, there is no RAB5A bound to the primary binding site on VPS34, but the RAB5A bound to VPS15 now has strong density. The authors note that the way in which RAB5A interacts with each site is distinct, though both interfaces involve the switch regions. The authors confirm the location of this additional binding site using HDX-MS.

      The authors then determine multiple structures of the wild-type complex bound to RAB5A from a single sample, as they use 3D classifications to separate out versions of the complex bound to 0, 1, or 2 copies of RAB5A. Overall the structure of VPS34-Complex II does not change between the different states, and the data indicate that both RAB5A binding sites can be occupied at the same time.

      The authors then design a new mutant form of the complex (SHMIT>DDMIE) that is expected to disrupt the interaction at the secondary site between VPS15 and RAB5A. This mutation had a minor impact on the Kd for RAB5A binding, but when combined with the REIE>ERIR mutation of the primary binding site, RAB5A binding to the complex was abolished.

      Comparison of sequences across species indicated that the RAB5A binding site on VPS15 was conserved in yeast while the RAB5A binding site on VPS34 is not.

      The authors tested the impact of a correspond yeast Vps15 mutation (SHLITY>DDLIEY) predicted to disrupt interaction with yeast Rab5/Vps21, and found this mutant Vps15 protein was mislocalized and caused defective CPY processing.

      The authors then compare these structures of the RAB5A-class II complex to recently published structures from the Hurley group of the RAB1A-class I complex, and find that in both complexes the Rab protein is bound to the VPS34 binding site in a somewhat similar manner. However, a key difference is the position of VPS34 is slightly different in the two complexes because of the unique ATL14L and UVRAG subunits in the class I and class II complexes, respectively. This difference creates a different RAB binding pocket that explains the difference in RAB specificity between the two complexes.

      Finally, the higher resolution structures enable the authors to now model portions of BECLIN1 and UVRAG that were not previously modeled in the cryoET structure.

      Strengths:

      Overall I found this to be an interesting and comprehensive study of the structural basis for interaction of RAB5A with VPS34-complex II. The authors have performed experiments to validate their structural interpretations, and they present a clear and thorough comparative analysis of the Rab binding sites in the two different VPS34 complexes. The result is a much better understanding of how two different Rab GTPases specifically recruit two different, but highly similar complexes to the membrane surface.

      Weaknesses:

      No significant weaknesses noted.

    3. Reviewer #2 (Public review):

      The work by Spokaite et al describes the discovery of a novel Rab5 binding site present in complex II of class III PI3K using a combination of HDX and Cryo EM. Extensive mutational and sequence analysis define this as the primordial Rab5 interface. The data presented are convincing that this is indeed a biologically relevant interface, and is important in defining mechanistically how vps34 complexes are regulated.

      This paper is a very nice expansion of their previous cryo-ET work from 2021, and is an excellent companion piece on high resolution cryo-EM of the complex I class III complex bound to Rab1 from the Hurley lab in 2025. Overall, this work is of excellent technical quality, and answers important unexplained observations on some unexpected mutational analysis from the previous work.

      They used their increased affinity vps34 mutant to determine the 3.2 ang structure of Rab5 bound to vps34-CII. Clear density was seen for the original Rab5 interface, but an additional site was observed. Based on this structure they mutated out the vps34 interface, allowing for a high resolution structure of the Rab5 bound at the Vps15 interface.

      They extensively validated the vps15 interface in the yeast variant of vps34, showing that the Vp215-Rab5 (Vps21) interface identified is critical in controlling complex II vps34 recruitment.

      The major strengths of this paper are that the experiments appear to be done carefully and rigorously and I have very few experimental suggestions.

      Here is what I recommend based on some very minor weaknesses I observed

      (1) My main concern has to do a little bit with presentation. My main issue is how the authors use mutant description. They clearly indicate the mutant sequence in the human isoform (for example see Fig 2A, Vps15 described as 579-SHMIT-583>DDMIE), however, when they shift to the yeast version they shift to saying vps15 mutant, but don't define the mutant, Fig 2G). I would recommend they just include the same sequence numbering and WT to mutant replacement every time a new mutant (or species) is described. It is always easier to interpret what is being shown when the authors are jumping between species when the exact mutant is included. This is particularly important in this paper, where we are jumping between both different subunits and different species, so clear description in figure/figure legends makes it much easier to read for non-specialists.

      (2) The HDX data very clearly shows that Rab5 is likely able to bind at both sites, which back ups the cryo EM data nicely. I am slightly confused by some of the HDX statements described in the methods.

      (3) The authors state "Only statistically significant peptides showing a difference greater than 0.25 Da and greater than 5% for at least two timepoints were kept." This seems to be confusing why they required multiple timepoints, and before they also describe that they required a p value of less than 0.05. It might be clearer to state that significant differences required a 0.25 Da, 5%, and p value of <0.05 (n=3). Also what do they mean by kept? Does this mean that they only fully processed the peptides with differences.

      (4) They show peptide traces for a selection in the supplement, but it would be ideal to include the full set of HDX data as an excel file, including peptides with no differences as there is a lot of additional information (deuteration levels for everything) that would be useful to share, as recommended from the Masson et al 2019 recommendations paper. This may be attached but this reviewer could not see an example of it in the shared data dropbox folder.

      Comments on revisions:

      The authors have addressed all of my issues.

    4. Reviewer #3 (Public review):

      Summary:

      The manuscript of Spokaite et al. focuses on the Vps34 complex involved in PI3P production. This complex exists in two variants, one (class I) specific for autophagy, and a second one (class II) specific for the endocytic system. Both differ only in one subunit. The authors previously showed that the Vps34 complexes interact with Rab GTPases, Rab1 or Rab5 (for class II), and the identified site was found at Vps34. Now, the authors identify a conserved and overlooked Rab5 binding site in Vps15, which is required for the function of the Class II complex. In support of this, they show cryo-EM data with a second Rab5 bound to Vps15, identify the corresponding residues, and show by mutant analysis that impaired Rab5 binding also results in defects using yeast as a model system.

      Overall, this is a most complete study with little to criticize. The paper shows convincingly that the two Rab5 binding sites are required for Vps34 complex II function, with the Vps15 binding site being critical for endosomal localization. The structural data is very much complete. What I am missing are a few controls that show that the mutations in Vps15 do not affect autophagy. I also found the last paragraph of the results section a bit out of place, even though this is a nice observation that the N-terminal part of BECLIN has these domains. However, what does it add to the story?

      Comments on revisions:

      The authors answered all my questions. I have no further requests.

    5. Author Response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript presents high-resolution cryoEM structures of VPS34-complex II bound to Rab5A at 3.2A resolution. The Williams group previously reported the structure of VPS34 complex II bound to Rab5A on liposomes using tomography, and therefore, the previous structure, although very informative, was at lower resolution.

      The first new structure they present is of the 'REIE>AAAA' mutant complex bound to RAB5A. The structure resembles the previously determined one, except that an additional molecule of RAB5A was observed bound to the complex in a new position, interacting with the solenoid of VPS15.

      Although this second binding site exhibited reduced occupancy of RAB5A in the structure, the authors determined an additional structure in which the primary binding site was mutated to prevent RAB5A binding ('REIE>ERIR'). In this structure, there is no RAB5A bound to the primary binding site on VPS34, but the RAB5A bound to VPS15 now has strong density. The authors note that the way in which RAB5A interacts with each site is distinct, though both interfaces involve the switch regions. The authors confirm the location of this additional binding site using HDX-MS.

      The authors then determine multiple structures of the wild-type complex bound to RAB5A from a single sample, as they use 3D classifications to separate out versions of the complex bound to 0, 1, or 2 copies of RAB5A. Overall, the structure of VPS34-Complex II does not change between the different states, and the data indicate that both RAB5A binding sites can be occupied at the same time.

      The authors then design a new mutant form of the complex (SHMIT>DDMIE) that is expected to disrupt the interaction at the secondary site between VPS15 and RAB5A. This mutation had a minor impact on the Kd for RAB5A binding, but when combined with the REIE>ERIR mutation of the primary binding site, RAB5A binding to the complex was abolished.

      Comparison of sequences across species indicated that the RAB5A binding site on VPS15 was conserved in yeast,while the RAB5A binding site on VPS34 is not.

      The authors tested the impact of a corresponding yeast Vps15 mutation (SHLITY>DDLIEY) predicted to disrupt interaction with yeast Rab5/Vps21, and found that this mutant Vps15 protein was mislocalized and caused defective CPY processing.

      The authors then compare these structures of the RAB5A-class II complex to recently published structures from the Hurley group of the RAB1A-class I complex, and find that in both complexes the Rab protein is bound to the VPS34 binding site in a somewhat similar manner. However, a key difference is that the position of VPS34 is slightly different in the two complexes because of the unique ATL14L and UVRAG subunits in the class I and class II complexes, respectively. This difference creates a different RAB binding pocket that explains the difference in RAB specificity between the two complexes.

      Finally, the higher resolution structures enable the authors to now model portions of BECLIN1 and UVRAG that were not previously modeled in the cryoET structure.

      Strengths:

      Overall, I found this to be an interesting and comprehensive study of the structural basis for the interaction of RAB5A with VPS34-complex II. The authors have performed experiments to validate their structural interpretations, and they present a clear and thorough comparative analysis of the Rab binding sites in the two different VPS34 complexes. The result is a much better understanding of how two different Rab GTPases specifically recruit two different, but highly similar complexes to the membrane surface.

      Weaknesses:

      No significant weaknesses were noted.

      Reviewer #2 (Public review):

      Summary:

      The work by Spokaite et al describes the discovery of a novel Rab5 binding site present in complex II of class III PI3K using a combination of HDX and Cryo EM. Extensive mutational and sequence analysis define this as the primordial Rab5 interface. The data presented are convincing that this is indeed a biologically relevant interface, and is important in defining mechanistically how VPS34 complexes are regulated.

      This paper is a very nice expansion of their previous cryo-ET work from 2021, and is an excellent companion piece on high-resolution cryo-EM of the complex I class III complex bound to Rab1 from the Hurley lab in 2025. Overall, this work is of excellent technical quality and answers important unexplained observations on some unexpected mutational analysis from the previous work.

      They used their increased affinity VPS34 mutant to determine the 3.2 ang structure of Rab5 bound to VPS34-CII. Clear density was seen for the original Rab5 interface, but an additional site was observed. Based on this structure, they mutated out the VPS34 interface, allowing for a high-resolution structure of the Rab5 bound at the VPS15 interface.

      They extensively validated the VPS15 interface in the yeast variant of VPS34, showing that the Vp215-Rab5 (VPS21) interface identified is critical in controlling complex II VPS34 recruitment.

      The major strengths of this paper are that the experiments appear to be done carefully and rigorously, and I have very few experimental suggestions.

      Here is what I recommend based on some very minor weaknesses I observed

      (1) My main concern has to do a little bit with presentation. My main issue is how the authors use mutant description. They clearly indicate the mutant sequence in the human isoform (for example, see Figure 2A, VPS15 described as 579-SHMIT-583>DDMIE); however, when they shift to the yeast version, they shift to saying VPS15 mutant, but don't define the mutant, Figure 2G). I would recommend they just include the same sequence numbering and WT to mutant replacement every time a new mutant (or species) is described. It is always easier to interpret what is being shown when the authors are jumping between species, when the exact mutant is included. This is particularly important in this paper, where we are jumping between different subunits and different species, so a clear description in the figure/figure legends makes it much easier to read for non-specialists.

      The reviewer has made an excellent point here. To clarify the yeast mutation, we have revised the manuscript main text to refer to the yeast mutant as SHLITY>DDLIEY, and we have added this to the legend for Figs. 2F,G.

      (2) The HDX data very clearly shows that Rab5 is likely able to bind at both sites, which back ups the cryo EM data nicely. I am slightly confused by some of the HDX statements described in the methods.

      (3) The authors state, "Only statistically significant peptides showing a difference greater than 0.25 Da and greater than 5% for at least two timepoints were kept." This seems to be confusing as to why they required multiple timepoints, and before they also describe that they required a p-value of less than 0.05. It might be clearer to state that significant differences required a 0.25 Da, 5%, and p-value of <0.05 (n=3). Also, what do they mean by kept? Does this mean that they only fully processed the peptides with differences?

      (4) They show peptide traces for a selection in the supplement, but it would be ideal to include the full set of HDX data as an Excel file, including peptides with no differences, as there is a lot of additional information (deuteration levels for everything) that would be useful to share, as recommended from the Masson et al 2019 recommendations paper. This may be attached, but this reviewer could not see an example of it in the shared data dropbox folder.

      We have revised the HDX method description to clarify. All peptides were kept and fully processed. However, for the results displayed, we have illustrated only peptides meeting the criteria described.

      The Excel file for all peptides (as recommended by Masson et al) was deposited with PRIDE, with the identifier with the dataset identifier PXD061277, in addition, we have included this excel file in our supplementary material.

      Reviewer #3 (Public review):

      Summary:

      The manuscript of Spokaite et al. focuses on the Vps34 complex involved in PI3P production. This complex exists in two variants, one (class I) specific for autophagy, and a second one (class II) specific for the endocytic system. Both differ only in one subunit. The authors previously showed that the Vps34 complexes interact with Rab GTPases, Rab1 or Rab5 (for class II), and the identified site was found at Vps34. Now, the authors identify a conserved and overlooked Rab5 binding site in Vps15, which is required for the function of the Class II complex. In support of this, they show cryo-EM data with a second Rab5 bound to Vps15, identify the corresponding residues, and show by mutant analysis that impaired Rab5 binding also results in defects using yeast as a model system.

      Overall, this is a most complete study with little to criticize. The paper shows convincingly that the two Rab5 binding sites are required for Vps34 complex II function, with the Vps15 binding site being critical for endosomal localization. The structural data is very much complete.

      Weaknesses:

      What I am missing are a few controls that show that the mutations in Vps15 do not affect autophagy. I am wondering if this mutant is still functional in autophagy. This can be simply tested by sorting of Atg8 to the vacuole lumen using established assays or by following PhoΔ60 sorting. This analysis would reveal that the corresponding mutant is specific for the Class II complex.

      One of the first noted features of the VPS34 complexes was that the ATG14-containing complex (VPS34-CI) is important for autophagy, while the VPS38 (yeast orthologue of UVRAG) subunit characteristic of VPS34-CII is important for endocytic sorting (PMID 11157979). However, the VPS34, VPS15 and BECLIN1 subunits are required are present in both complexes, as such, mutations of them may affect both processes.

      We agree with the reviewer that is an important undertaking to examine the effect of the SHLITY>DDLIEY mutation in yeast Vps15 on autophagy. However, the focus of the current manuscript is VPS34-complex II and RAB5 interaction/activation. An autophagy effect would be more relevant for VPS34 complex I and RAB1. We have not presented any results for human VPS34-complex I - RAB1 nor yeast Vps34-complex I – Ypt1 (yeast RAB1 orthologue). We are preparing another manuscript focusing entirely on this, and it is not a simple story. While we think this is an important question, we believe that this is beyond the scope of the current manuscript.

      It would be helpful if the authors could clarify whether they believe that Vps34 kinase activity is stimulated by Rab binding or whether this stimulation is a consequence of better membrane localization of Vps34. In other words, is the complex active with soluble PI3P in solution, and does the activity change if Rab5 is added to the complex? This might have been addressed in the past, but I did not see evidence for this, as the authors only addressed the activity of the Vps34 complexes on membranes.

      The reviewer has raised an excellent question, which was addressed briefly in the introduction to the manuscript. We have now somewhat expanded on these issues near the end of the discussion in the revised manuscript. In our previously published study, we found that soluble RAB5-GTP did not stimulate the complex II activity (supplementary figure 2b of PMID: 33692360). This is consistent with our finding in this manuscript showing that RAB5 did not cause large conformational changes in solution. However, our previous single-molecule study showed that once complex II is recruited to the membrane by RAB5, and RAB5 increases the turnover rate on membranes, indicating an additional allosteric activation (Figure 7 of PMID: 33137306). This study indicated that the primary the role of RAB5 is to anchor complex II on the membrane. Once the complex is anchored on the membrane by RAB5, the kinase domain is in the vicinity of its substrate, PI, leading to higher turnover.

      The Echelon Class III PI3K ELISA Kit (Echelon, K-3000) comes with a soluble PI, diC8 to measure the VPS34 activity, and it is certainly active with this soluble substrate. However, if the substrate is in membranes, the VPS34 activity is greatly dependent on the character of the membrane.

      I also found the last paragraph of the results section a bit out of place, even though this is a nice observation that the N-terminal part of BECLIN has these domains. However, what does it add to the story?

      The reviewer is correct that the high-resolution features of BECLIN1 at the base of the V-shaped complex that we observed are not related to RAB5 binding, but they are characteristic of VPS34-CII and likely to be important for the specific role of VPS34-CII. This is the first high-resolution structure of the VPS34-CII that has been reported, and we believe it would be irresponsible not to briefly describe them, since they are unique to VPS34-CII. For this reason, we have placed this section at the end of the results, and we now clarify that we do not see a relevance to RAB5 function, but we describe the arrangement of a region (the BH3) that has been functionally noted in many previous studies, in the absence of a structure.

      Reviewing Editor Comments:

      Please address the following suggestions for minor changes to the manuscript. Use your best scientific judgment in addressing the comments and describe the modifications together with your reasoning in a cover letter. We look forward to seeing the revised version of this very nice study.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I found a portion of the description of the cryoEM complexes on the top of page 9 to be redundant with similar descriptions near the top of page 7, and it was not clear to me at first that these were describing the same structures. Part of my confusion was due to the redundancy, including the statement near the bottom of page 7: 'Models were built and refined for all RAB5associated VPS34-CII assemblies', and then the similar statement on page 9: 'We fit and refined atomic models into both densities'. I believe these are describing the same models? To clarify for the reader, perhaps on page 9, the authors could begin this part with a statement such as "as described above", and eliminate the redundant descriptions.

      The reviewer is correct. Both sections describe the same set of cryo-EM classes from the same sample. The only difference is what we analysed in the two sections: number of RAB5s bound in the first section and the effect of RAB5 binding in the second section. We have revised the text to make this clear, and to make the second section more succinct.

      Reviewer #3 (Recommendations for the authors):

      (1) The authors show nicely that a mutation in Vps15 disrupts binding to Vps21 in vivo, with defects in the endocytic pathway as analyzed by CPY sorting. I am wondering if this mutant is still functional in autophagy. This can be simply tested by sorting of Atg8 to the vacuole lumen using established assays or by following Pho∆60 sorting. This analysis would reveal that the corresponding mutant is specific for the Class II complex. If the authors were to find evidence that this Vps15 mutant also affects autophagy, it would indicate that there is possibly also another Rab1 binding site in Vps15.

      As we stated above, an autophagy effect would be more relevant for VPS34 complex I and RAB1. We have not presented any results for human VPS34-complex I - RAB1 nor yeast Vps34-complex I – Ypt1 (yeast RAB1 orthologue). We are preparing another manuscript focusing entirely on this, and it is not a simple story. While we think this is an important question, we believe that this is beyond the scope of the current manuscript.

      (2) It would be helpful if the authors could clarify whether they believe that Vps34 kinase activity is stimulated by Rab binding or whether this stimulation is a consequence of better membrane localization of Vps34. In other words, is the complex active with soluble PI3P in solution, and does the activity change if Rab5 is added to the complex? This might have been addressed in the past, but I did not see evidence for this, as the authors only addressed the activity of the Vps34 complexes on membranes.

      As in our response to reviewer #3 above, this point was addressed in previous publications and was described in the introduction to our manuscript.

    1. eLife Assessment

      This important study provides compelling evidence that fever-like temperatures enhance the export of Plasmodium falciparum transmembrane proteins, including the cytoadherence protein PfEMP1 and the nutrient channel PSAC, to the red blood cell surface, thereby increasing cytoadhesion. Using rigorous and well-controlled experiments, the authors convincingly demonstrate that this effect results from accelerated protein trafficking rather than changes in protein production or parasite development. These findings significantly advance our understanding of parasite virulence mechanisms and offer insights into how febrile episodes may exacerbate malaria severity.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript from Jones and colleagues investigates a previously described phenomenon in which P. falciparum malaria parasites display increased trafficking of proteins displayed on the surface of infected RBCs as well as increased cytoadherence in response to febrile temperatures. While this parasite response was previously described, it was not uniformly accepted, and conflicting reports can be found in the literature. This variability likely arises due to differences in the methods employed and the degree of temperature increase that the parasites were exposed to. Here the authors are very careful to employ a temperature shift that likely reflects what is happening in infected humans and that they demonstrate is not detrimental to parasite viability or replication. In addition, they go on to investigate what steps in protein trafficking are affected by exposure to increased temperature and show that the effect is not specific to PfEMP1 but rather likely affects all transmembrane domain containing proteins that are trafficked to the RBC. They also detect increased rates of phosphorylation of trafficked proteins, consistent with overall increased protein export.

      Strengths:

      The authors used a relatively mild increase in temperature (39 degrees) that they demonstrate is not detrimental to parasite viability or replication. This enabled them to avoid potential complications of more severe heat shock that might have affected previously published studies. They employed a clever method of fractionation of RBCs infected with a var2csa-nanoluc fusion protein expressing parasite line to determine which step in the export pathway was likely accelerating in response to increased temperature. This enabled them to determine that export across the PVM is being affected. They also explored changes in phosphorylation of exported proteins and demonstrated that the effect is not limited to PfEMP1 but appears to affect numerous (or potentially all) exported transmembrane domain containing proteins.

      Impact and conclusions:

      The study shows that protein export, including PfEMP1 and PSAC, are accelerated in response to mild heat shock. This has implications for disease severity as well as our understanding of protein trafficking in these unique organisms. There is increasing interest in asymptomatic infections, which have been proposed to be a major reservoir for transmission and generally are not associated with fever. It will be interesting to consider whether reduced (or slower) trafficking of these proteins has a selective advantage for parasites in asymptomatic infections.

    3. Reviewer #2 (Public review):

      This manuscript describes experiments characterising how malaria parasites respond to physiologically relevant heat-shock conditions. The authors show, quite convincingly, that moderate heat-shock appears to increase cytoadherance, likely by increasing trafficking of surface proteins involved in this process.

      While generally of a high quality and including a lot of data, I have a few small questions and comments, mainly regarding data interpretation.

      (1) The authors use sorbitol lysis as a proxy for trafficking of PSAC components. This is a very roundabout way of doing things and does not, I think, really show what they claim. There could be a myriad of other reasons for this increased activity (indeed, the authors note potential PSAC activation under these conditions). One further reason could be a difference in the membrane stability following heat shock, which may affect sorbitol uptake, or the fragility of the erythrocytes to hypotonic shock. I really suggest that the authors stick to what they show (increased PSAC) without trying to use this as evidence for increased trafficking of a number of non-specified proteins that they cannot follow directly.

      (2) Supplementary Figure 6C/D: The KAHRP signal does not look like it should. In fact, it doesn't look like anything specific. The HSP70-X signal is also blurry and overexposed. These pictures cannot be used to justify the authors' statements about a lack of colocalisation in any way.

      (3) Figure 6: This experiment confuses me. The authors purport to fractionate proteins using differential lysis, but the proteins they detect are supposed to be transmembrane proteins and thus should always be found associated with the pellet, whether lysis is done using equinatoxin or saponin. Have they discovered a currently unknown trafficking pathway to tell us about? Whilst there is a lot of discussion about the trafficking pathways for TM proteins through the host cell, a number of studies have shown that these proteins are generally found in a membrane-bound state. The authors should elaborate, or choose an experiment that is capable of showing compartment-specific localisation of membrane-bound proteins (protease protection, for example).

      (4) The red blood cell contains, in addition to HSP70-X, a number of human HSPs (HSP70 and HSP90 are significant in this current case). As the name suggests, these proteins non-specifically shield exposed hydrophobic domains revealed upon partial protein unfolding following thermal insult. I would thus have expected to find significantly more enrichment following heat shock, but this is not the case. Is it possible that the physiological heat shock conditions used in this current study are not high enough to cause a real heat shock?

      Comments on Revision:

      Although in any study there are going to be residual weaknesses, this reviewer is happy to see that the authors have gone to lengths to address many of my main concerns, and also those of other reviewers.

    4. Reviewer #3 (Public review):

      Summary:

      In this paper it is established that high fever-like 39oC temperatures cause parasite infected red blood cells become stickier. It is thought that high temperatures might help the spleen to destroy parasite infected cells, so they become stickier to remain trapping in blood vessels, so they stop passing through the spleen.

      Strengths:

      The strength of this research is that it shows that fever-like temperatures can cause parasite infected red blood cells to stick to surfaces designed to mimic the walls of small blood vessels. In a natural infection this would cause parasite infected red blood cells to stop circulating through the spleen where the parasites would be destroyed by the immune system. It is thought that fevers could lead to infected red blood cells becoming stiffer and therefore more easily destroyed in the spleen. Parasites respond to fevers by making their red blood cells stickier, so they stop flowing around the body and into the spleen. The experiments here prove fever temperatures increase the export of Velcro-like sticky proteins onto the surface of the infected red blood cells and are very thorough and convincing.

      Weaknesses:

      Minor weaknesses in the original version have now been satisfactorily addressed with additional work which is very convincing.

    5. Author Response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This important study provides compelling evidence that fever-like temperatures enhance the export of Plasmodium falciparum transmembrane proteins, including the cytoadherence protein PfEMP1 and the nutrient channel PSAC, to the red blood cell surface, thereby increasing cytoadhesion. Using rigorous and well-controlled experiments, the authors convincingly demonstrate that this effect results from accelerated protein trafficking rather than changes in protein production or parasite development. These findings significantly advance our understanding of parasite virulence mechanisms and offer insights into how febrile episodes may exacerbate malaria severity.

      We thank all reviewers for their constructive feedback on our manuscript.

      We believe we have addressed all the questions in the rebuttal below in writing, including planned experiments we will perform to strengthen the conclusions of the manuscript.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript from Jones and colleagues investigates a previously described phenomenon in which P. falciparum malaria parasites display increased trafficking of proteins displayed on the surface of infected RBCs, as well as increased cytoadherence in response to febrile temperatures. While this parasite response was previously described, it was not uniformly accepted, and conflicting reports can be found in the literature. This variability likely arises due to differences in the methods employed and the degree of temperature increase to which the parasites were exposed. Here, the authors are very careful to employ a temperature shift that likely reflects what is happening in infected humans and that they demonstrate is not detrimental to parasite viability or replication. In addition, they go on to investigate what steps in protein trafficking are affected by exposure to increased temperature and show that the effect is not specific to PfEMP1 but rather likely affects all transmembrane domain-containing proteins that are trafficked to the RBC. They also detect increased rates of phosphorylation of trafficked proteins, consistent with overall increased protein export.

      Strengths:

      The authors used a relatively mild increase in temperature (39 degrees), which they demonstrate is not detrimental to parasite viability or replication. This enabled them to avoid potential complications of a more severe heat shock that might have affected previously published studies. They employed a clever method of fractionation of RBCs infected with a var2csa-nanoluc fusion protein expressing parasite line to determine which step in the export pathway was likely accelerating in response to increased temperature. This enabled them to determine that export across the PVM is being affected. They also explored changes in phosphorylation of exported proteins and demonstrated that the effect is not limited to PfEMP1 but appears to affect numerous (or potentially all) exported transmembrane domain-containing proteins.

      Weaknesses:

      All the experiments investigating changes resulting from increased temperature were conducted after an increase in temperature from 16 to 24 hours, with sampling or assays conducted at the 24 hr mark. While this provided consistency throughout the study, this is a time point relatively early in the export of proteins to the RBC surface, as shown in Figure 1E. At 24 hrs, only approximately 50% of wildtype parasites are positive for PfEMP1, while at 32 hrs this approaches 80%. Since the authors only checked the effect of heat stress at 24 hrs, it is not possible to determine if the changes they observe reflect an overall increase in protein trafficking or instead a shift to earlier (or an accelerated) trafficking. In other words, if a second time point had been considered (for example, 32 hrs or later), would the parasites grown in the absence of heat stress catch up?

      We did not assess cytoadhesion at later stages, but in the supplementary figures we show that at 40 hours post infection both heat stress and control conditions have comparable proportions of VAR2CSA-positive iRBCs, whilst they differ at 24h. This is true for the DMSO (control wildtype resembling) HA-tagged lines of HSP70x and PF3D7_072500 (Supplementary Figures 9 and 12 respectively). In the light that protein levels appear not changed, we conclude that trafficking is accelerated during these earlier timepoints, but remains comparable at later stages. This would still increase the overall bound parasite mass as parasites start to adhere earlier during or after a heat stress.

      Reviewer #2 (Public review):

      This manuscript describes experiments characterising how malaria parasites respond to physiologically relevant heat-shock conditions. The authors show, quite convincingly, that moderate heat-shock appears to increase cytoadherance, likely by increasing trafficking of surface proteins involved in this process.

      While generally of a high quality and including a lot of data, I have a few small questions and comments, mainly regarding data interpretation.

      (1) The authors use sorbitol lysis as a proxy for trafficking of PSAC components. This is a very roundabout way of doing things and does not, I think, really show what they claim. There could be a myriad of other reasons for this increased activity (indeed, the authors note potential PSAC activation under these conditions). One further reason could be a difference in the membrane stability following heat shock, which may affect sorbitol uptake, or the fragility of the erythrocytes to hypotonic shock. I really suggest that the authors stick to what they show (increased PSAC) without trying to use this as evidence for increased trafficking of a number of non-specified proteins that they cannot follow directly.

      This is a valid point, however, uninfected RBCs do not lyse following heat stress, nor do much younger iRBCs, indicating that the observed effect is specific to infected RBCs at a defined stage. The sorbitol sensitivity assay is performed at 37°C under normal conditions after cells are returned to non–heat stress temperatures, so the effect is not due to transient changes in membrane permeability at elevated temperature.

      Planned experiment: However, to increase the strength of our conclusions and further test our hypothesis, we will perform sorbitol sensitivity assays on >20 hours post infection iRBCs following heat stress in the presence and absence of furosemide, a PSAC inhibitor. If iRBC lysis is abolished with furosemide present, this would confirm that the effect is PSAC-dependent. However, the effect could also possibly be due to altered PSAC activity during heat stress which is maintained at lower temperatures, as outlined in the discussion.

      New Results:

      We performed sorbitol sensitivity assays on >20 hours post-infection iRBCs following heat stress in the presence and absence of the PSAC inhibitor furosemide. These additional experiments were added to the supplementary figures (Supplementary Figure 3). Importantly, sorbitol-mediated lysis of iRBCs, with or without prior heat stress, was reduced when furosemide was present, demonstrating that the observed effect is likely PSAC-dependent. We also observed that uninfected RBCs did not lyse with sorbitol, regardless of heat stress, confirming that the effect is specific to infected cells.

      (2) Supplementary Figure 6C/D: The KAHRP signal does not look like it should. In fact, it doesn't look like anything specific. The HSP70-X signal is also blurry and overexposed. These pictures cannot be used to justify the authors' statements about a lack of colocalisation in any way.

      Planned experiment: We agree that the IFAs are not the best as presented and will include better quality supplementary images in a revised version.

      New Results:

      Immunofluorescence microscopy, including the localisation of the two HA-tagged proteins (PF3D7_1039000 and PF3D7_0702500), has been repeated and higher-quality images are now included in the updated manuscript (Supplementary Figures 9 and 11). These images include co-staining with the P. falciparum proteins KAHRP and SPB1 to assess possible co-localisations. Furthermore, following the reviewer’s suggestion, we have softened the statement regarding PF3D7_1039000-HA to better reflect the data, changing “...does not colocalise” to “...does not strongly colocalise”.

      (3) Figure 6: This experiment confuses me. The authors purport to fractionate proteins using differential lysis, but the proteins they detect are supposed to be transmembrane proteins and thus should always be found associated with the pellet, whether lysis is done using equinatoxin or saponin. Have they discovered a currently unknown trafficking pathway to tell us about? Whilst there is a lot of discussion about the trafficking pathways for TM proteins through the host cell, a number of studies have shown that these proteins are generally found in a membrane-bound state. The authors should elaborate, or choose an experiment that is capable of showing compartment-specific localisation of membrane-bound proteins (protease protection, for example).

      We do not believe we identified a novel trafficking pathway, but that we capture trafficking intermediates of PfEMP1 between the PVM and the RBC periphery, in either small vesicles, and possibly including Maurer’s clefts. These would still be membrane embedded, but because of their small size, not be pelleted using the centrifugation speeds in our study (we did not use ultracentrifugation). This explanation, we believe, is in line with the current hypothesis of PfEMP1 and other exported TMD protein trafficking to the periphery or the Maurer’s clefts.

      (4) The red blood cell contains, in addition to HSP70-X, a number of human HSPs (HSP70 and HSP90 are significant in this current case). As the name suggests, these proteins non-specifically shield exposed hydrophobic domains revealed upon partial protein unfolding following thermal insult. I would thus have expected to find significantly more enrichment following heat shock, but this is not the case. Is it possible that the physiological heat shock conditions used in this current study are not high enough to cause a real heat shock?

      As noted by the reviewer, we do not see enrichment of red blood cell heat shock proteins following heat stress, either with FIKK10.2-TurboID or in the phosphoproteome. We used a physiologically relevant heat stress that significantly modifies the iRBC, as shown by our functional assays. While a higher temperature might induce an association of red blood cell heat shock proteins, such conditions may not accurately reflect the most commonly found in the context of malaria infection.

      Reviewer #3 (Public review):

      Summary:

      In this paper, it is established that high fever-like 39 C temperatures cause parasite-infected red blood cells to become stickier. It is thought that high temperatures might help the spleen to destroy parasite-infected cells, and they become stickier in order to remain trapped in blood vessels, so they stop passing through the spleen.

      Strengths:

      The strength of this research is that it shows that fever-like temperatures can cause parasite-infected red blood cells to stick to surfaces designed to mimic the walls of small blood vessels. In a natural infection, this would cause parasite-infected red blood cells to stop circulating through the spleen, where the parasites would be destroyed by the immune system. It is thought that fevers could lead to infected red blood cells becoming stiffer and therefore more easily destroyed in the spleen. Parasites respond to fevers by making their red blood cells stickier, so they stop flowing around the body and into the spleen. The experiments here prove that fever temperatures increase the export of Velcro-like sticky proteins onto the surface of the infected red blood cells and are very thorough and convincing.

      Weaknesses:

      A minor weakness of the paper is that the effects of fever on the stiffness of infected red blood cells were not measured. This can be easily done in the laboratory by measuring how the passage of infected red blood cells through a bed of tiny metal balls is delayed under fever-like temperatures.

      Previous work by Marinkovic et al. (cited in this manuscript) reported that all RBCs, both infected and uninfected, increase in stiffness at 41 °C compared with 37 °C, with trophozoites and schizonts exhibiting a particularly pronounced increase. We agree that it would be interesting to determine whether similar changes occur at physiological fever-like temperatures, and whether this increase in stiffness coincides with the period of elevated protein trafficking. However, here we focused on enhanced protein export using multiple complementary approaches, and have chosen to address rigidity questions in a different study.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      As mentioned above, a second time point in many of the assays (for example, 36 hrs or later) would be useful to determine if heat stress simply accelerates trafficking of proteins to the RBC or if instead it results in an overall increase in trafficking.

      As mentioned earlier: We did not assess cytoadhesion at later stages, but in the supplementary figures we show that at 40 hours post infection both heat stress and control conditions have comparable proportions of VAR2CSA-positive iRBCs. This is true for the DMSO (control wildtype resembling) HA-tagged lines of HSP70x and PF3D7_072500 (Supplementary Figures 9 and 12 respectively). The end level of VAR2CSA is the same in both conditions, but at 24 hours post infection it is higher following heat stress, indicating that trafficking is accelerated.

      In the text, the authors frequently mention changes in the parasites' phenotype in response to heat stress; however, the way it is described is a bit ambiguous and can be confusing. For example, on page 3, they state that "Following heat stress, significantly more iRBCs (57.6% +/-19.4%) cytoadhered.....". From this sentence, it is not initially clear if the end result is cytoadherence of 57.6% of iRBCs or if this refers to an increase of 57.6%. This could be stated explicitly (e.g., "an increase of 57.6% +/- 19.4%") to avoid confusion. Similar descriptions of the results are found throughout the paper.

      We agree this is confusing and altered the text accordingly.

      The authors might consider citing and discussing the paper from Andrade et al (Nat Med, 2020, 26:1929-1940), which describes longer circulation times (less cytoadherence) by parasites in the dry season (asymptomatic patients) than in febrile patients in the wet season (stronger cytoadhesion of younger stages). This would seem to be consistent with the data presented here.

      We are aware of the Andrade study, but chose not to cite it in this context since the reported differences in cytoadhesion appear more consistent with PfEMP1 expression levels, as hypothesized by the authors, than with altered trafficking.

      Reviewer #2 (Recommendations for the authors):

      General comments on the text:

      (1) "Approximately 10% of the proteins encoded by P. falciparum are predicted to be exported beyond the parasite plasma membrane (PPM) into the parasitophorous vacuole lumen (PVL) and subsequently across the parasitophorous vacuole membrane (PVM) into the RBC cytosol."

      To my knowledge, it has not been really demonstrated that all exported proteins take this route (transfer step in the PVL), and how transmembrane proteins transfer from the parasite to the erythrocyte is still poorly understood. I recommend that the authors rephrase this for precision.

      We agree with this reviewer and will change the statement.

      Changes:

      We have clarified these statements to accurately reflect the current understanding of protein export. Approximately 10% of P. falciparum encoded proteins are predicted to be exported beyond the parasite plasma membrane, with many thought to pass through the parasitophorous vacuole lumen (PVL) and parasitophorous vacuole membrane (PVM) into the RBC cytosol, although the exact routes for transmembrane proteins are not fully understood.”

      (2) "Charnaud et al. 25, but not Cobb et al. 26, found HSP70x to be essential for normal PfEMP1 trafficking, although both studies concluded that HSP70x is dispensable for intraerythrocytic parasite growth at 37 {degree sign}C."

      The trafficking block in Charnaud is likely due to a delay in parasite development and cannot thus really be directly related to PfEMP1 trafficking.

      Charnaud et al., report: “Microscopy of Giemsa stained IE indicated that ΔHsp70-x appeared similar to CS2 with no obvious abnormalities (Fig 2c). To more accurately quantify changes in maturation through the cell cycle, the DNA content of parasites stained with ethidium bromide was measured by flow cytometry (Fig 2d). This indicated that most parasites had the same DNA content at each timepoint and were maturing at the same rate.”

      Thus, we cannot conclude that the trafficking phenotype reported in the Charnaud study can be attributed to a growth delay. This is also supported by only minor changes in the transcriptome, which would likely be more widely perturbed if there was a significant growth delay. However, we will change the statement “Charnaud et al., found HSP70x to be essential for normal PfEMP1 trafficking”, to ”…important for PfEMP1 trafficking” to more precisely reflect the data.

      (3) "NanoLuciferase (NanoLuc) fusion proteins and compartment-specific isolation confirmed a greater abundance of PfEMP1 in the RBC cytosol following heat stress."

      Please see my comments about the differentiation between soluble and TM-containing proteins. One would expect that PfEMP1 is membrane-integrated, and thus should not be found in the cytosol (implying a soluble form).

      See our response above.

      (4) "Importantly, heat stress did not accelerate parasite development through the asexual life cycle (Supplementary Figure 1)."

      The authors should constrain this statement to the time frame in which the heat-shock was given. Previous publications have shown a speeded-up development only in younger-stage parasites, which the authors did not study.

      We will re-phrase.

      Changes:

      We have rephrased the sentence to clarify the time window of heat stress: ”Importantly, heat stress between 16-24 hours post-invasion did not accelerate parasite development through the asexual life cycle (Supplementary Figure 1).” The supplementary figure title has also been updated to match.

      (5) I recommend that the authors include line numbers. This makes the reviewers' lives much easier.

      We agree and apologize for this oversight.

      We now added line numbers.

      Reviewer #3 (Recommendations for the authors):

      (1) All the experiments have been performed to a very high standard, and I have no major questions about the results. However, the paper would go up to the next level if the effect of fever temperatures on the stiffness of the iRBCs had been investigated by measuring the passage of iRBCs through an artificial spleen where a bed of metal spheres mimics interendothelial splenic slits.

      See our comment from above.

      (2) With respect to Figures 5E, 6C, and 6E, why was there not a decrease in bioluminescence levels at 39 {degree sign}C for Sap and NP40 to match the increase in EqtII?

      The assay is not performed as a sequence of permeabilisation steps. Instead, samples are split into three parallel treatments: one with EqtII, one with Saponin, and one with NP40. The protein measured in each case reflects the total released under that specific condition rather than being cumulative. Therefore, the NP40 fraction includes proteins from the Saponin-accessible compartment, the EqtII-accessible compartment, and the parasite cytosol.

      (3) In the Supplementary gene maps, I could not read the white text on the black gene boxes.

      We apologize: these have not converted well and will be altered with the revised version.

      Changes

      We have significantly increased the size of all fonts within the gene maps and improved the resolution of the figures to improve readability.

      (4) In Figure S6, why does HSP70-x look different between parts C and D IFAs, with the latter showing much more export?

      We agree these IFAs are not optimal and we will provide better images.

      New Results:

      Immunofluorescence microscopy, including the localisation of the two HA-tagged proteins (PF3D7_1039000 and PF3D7_0702500), has been repeated and higher-quality images are now included in the updated manuscript (Supplementary Figures 9 and 11). These figures now include multiple images of HA-tagged staining to more accurately represent the observed localisation and export patterns.

      (5) Would the authors care to comment on what kinase might be additionally phosphorylating at 39 {degree sign}C?

      We presume these are Maurer’s clefts FIKK kinases as most of the hyperphosphorylated proteins are MC residents. However, without directly testing for this using conditional KO parasite lines, we cannot exclude that host kinases are also playing a role.

      (6) Could the additional assembly of PSAC at the iRBC membrane be important for survival at 39 {degree sign}C?

      We have tested to see if nutrient uptake helps parasite survival during heat stress in the presence of furosemide and lower nutrient concentrations, but did not see a difference in growth following heat stress compared to control temperature conditions.

      New Results:

      We have added a new supplementary figure (Supplementary Figure 4) detailing experiments testing parasite growth under altered nutrient availability using two approaches (sub-lethal furosemide concentrations or reduced-nutrient RPMI) and with or without a 40°C heat stress applied between 16-24 hpi.

      The main text now references this data: “Culturing parasites in sub-lethal furosemide concentrations or in reduced nutrient media lead to reduced parasitaemia (Supplementary Figure 4). However, the parasitaemia is not further reduced following heat stress. This shows that increased PSAC levels/activity do not enhance parasite survival under conditions of limited nutrient availability either from furosemide-induced nutrient deprivation or a reduced nutrient media composition.”

      These experiments show that nutrient uptake does not improve parasite survival during heat stress compared to control temperature conditions.

      (7) Would the authors like to speculate on how higher temperatures increase the transport of exported proteins with TMDs?

      There are many possible explanations, one of which is that unfolding of the hydrophobic TMD domains is favoured at elevated temperatures. However, we have no data to support this hypothesis and therefore refrained from particularly stating this possibility.

    1. eLife Assessment

      Du et al. present a valuable study examining neural activation in medial prefrontal cortex (mPFC) subpopulations projecting to the basolateral amygdala (BLA) and nucleus accumbens (NAc) during behavioral tasks assessing anxiety, social preference, and social dominance. The strength of the evidence linking in vivo neural physiology to behavioral outcomes was considered solid; however, the electrophysiology data and their interpretation were less well received. Overall, the reviewers felt that the revised work provides insight into how distinct mPFC→BLA and mPFC→NAc pathways influence anxiety, exploration, and social behaviors.

    2. Reviewer #1 (Public review):

      Summary:

      It is well known that neurons in the medial prefrontal cortex (mPFC) are involved in higher cognitive functions such as executive planning, motivational processing and internal state mediated decision-making. These internal states often correlate with the emotional states of the brain. While several studies point to the role of mPFC in regulating behavior based on such emotional states, the diversity of information processing in its sub-populations remains a less explored territory. In this study, the authors try to address this gap by identifying and characterizing some of these sub-populations in mice using a combination of projection-specific imaging, function-based tagging of neurons, multiple behavioral assays and ex-vivo patch clamp recordings.

      Strengths:

      The authors targeted mPFC projections to the nucleus accumbens (NAc) and basolateral amygdala (BLA). Using the open field task (OFT), the authors identified four relevant behavioral states as well as neurons active while the animal was in the center region ("center-ON neurons"). By characterizing single unit activity and using dimensionality reduction, the authors show differentiated coding of behavioral events at both the projection and functional levels. They further substantiate this effect by showing higher sensitivity of mPFC-BLA center-ON neurons during time spent in the open arms of the elevated plus maze (EPM). The authors then pivoted to the three-chamber social interaction (SI) assay to show the different subsets of neurons encode preference of social stimulus over non-social. This reveals an interesting diversity in the function of these sub-populations on multiple levels. Lastly, the authors used the tube test as a manipulation of the anxiety state of mice and compared behavioral differences before/after in the OFT and social interaction tasks. This experiment revealed that "losers" of the tube test spend less time in the center of the open field while "winners" show a stronger preference for the familiar mouse over the object. Using patch-clamp experiments, the authors also found that "winners" exhibit stronger synaptic transmission in the mPFC-NAc projection while "losers" exhibit stronger synaptic transmission in the mPFC-BLA projection. Given the popularity of the tube test assay in rank determination, this provides useful insights into possible effects on anxiety levels and synaptic plasticity. Overall, the many experiments performed by the authors reveal interesting differences in mPFC neurons relative to their involvement in high or low anxiety behaviors, social preference and social rank.

      Weaknesses:

      The authors focused primarily on female mice limiting generalizability and leaving the readers with questions about the impact of sex differences on their results. The tube test is used as a manipulation of the "emotional state" in several of the experiments. While the authors show the changes to corticosterone levels as a consequence of win/loss in the tube test, stronger claims might be made with comparisons to other gold standard stressors such as forced social defeat or social isolation.

    3. Reviewer #2 (Public review):

      Summary:

      The goal of this proposal was to understand how two separate projection neurons from the medial prefrontal cortex, those innervating the basolateral amygdala (BLA) and nucleus accumbens (NAc), contribute to the encoding of emotional behaviors. The authors record the activity of these different neuron classes across three different behavioral environments. They propose that, although both populations are involved in emotional behavior, the two populations have diverging activity patterns in certain contexts. A subset of projections to the NAc appear particularly important for social behavior. They then attempt to link these changes to the emotional state of the animal and changes in synaptic connectivity.

      Strengths:

      The behavioral data builds on previous studies of these projection neurons supporting distinct roles in behavior and extend upon previous work by looking at the heterogeneity within different projection neurons across contexts, this is important to understand the "neural code" within the PFC that contributes to such behaviours and how it is relayed to other brain structures.

      Weaknesses:

      The diversity of neurons mediating these projections and their targeting within the BLA and NAc is not explored. These are not homogeneous structures and so one possibility is that some of the diversity within their findings may relate to targeting of different sub-structures within BLA or NAc or the diversity of projection neuron subtypes that mediate these pathways. This is an important future direction for this work but does not detract from the main finding as reported. The electrophysiological data in Figure 7 have significant experimental confounds that makes their interpretation challenging.

    4. Author Response:

      The following is the authors’ response to the original reviews.

      Public review:

      Reviewer #1 (Public review):

      Weaknesses:

      The authors focused primarily on female mice without commenting on the effect that sex differences would have on their results.

      We agree that sex is an important biological variable. Our experiments were performed primarily in female mice to align with the higher prevalence of affective disorders in females and to maintain consistency across experiments. We now explicitly acknowledge this as a limitation in the Discussion and note that future studies will be needed to determine whether the projection-specific coding principles identified here generalize to male animals. Relevant literature on sex-specific mPFC→BLA/NAc function has also been incorporated.

      While the authors have identified relevant behavioral states across the various behavioral tasks, there is still a missing link between them and "emotional states" - the phrase used by them emphatically throughout the manuscript. The authors have neither provided adequate references to satisfy this gap nor shared any data pertaining to relevant readouts such as cortisol levels.

      We appreciate the reviewer’s concern regarding the use of the term “emotional states.” In the revised manuscript, we have clarified our terminology and now use “behavioral states associated with affective valence” where appropriate. We have also added references supporting the use of open field center vs. corner occupancy, elevated plus maze performance, and social interaction assays as established proxies for anxiety-like and affect-related behaviors.

      Importantly, to provide physiological support for these interpretations, we now include data showing that repeated win/loss outcomes in the tube test are associated with increased corticosterone levels in loser mice. These results indicate that the behavioral manipulations used in this study are accompanied by measurable physiological changes linked to stress-related processes.

      Both the projection-specific recordings and patch-clamp experiments, including histology reports in the manuscript, would provide essential information for anyone trying to replicate the results, especially since it's known that sub-populations in the BLA and NAc can have vastly different functions.

      We agree that detailed reporting of projection targeting is important for reproducibility. We have expanded the Methods and Results to more clearly describe viral targeting, recording locations, and histological verification of mPFC projections to the lateral BLA and NAc shell. We also now explicitly acknowledge the anatomical and cellular heterogeneity within these regions as a limitation and discuss this as an important direction for future work.

      The population-level analysis in the manuscript requires more rigor to reduce bias and statistical controls for establishing the significance of their results.

      We have strengthened the statistical analyses throughout the manuscript. Specifically, we have incorporated permutation-based controls for key analyses, clarified how behavioral and neural features were defined, and provided additional details on dimensionality reduction and clustering approaches. Exact p values, sample sizes, and statistical tests are now reported throughout the manuscript and figure legends.

      Lastly, the tube test is used as a manipulation of the "emotional state" in several of the experiments. While the tube test can cause a temporary spike in anxiety of the participating mice, it is not known to produce a sustained effect - unless there are additional interventions such as forced social defeat. Thus, additional controls for these experiments are essential to support claims based on changes in the emotional state of mice.

      We agree that the tube test is not a classical chronic stress paradigm such as social defeat. In our study, the tube test was used to establish social hierarchy rather than to model sustained stress. We have revised the manuscript to clarify this point and have tempered our language accordingly. At the same time, our corticosterone measurements indicate that repeated social competition induces measurable physiological changes, suggesting that the paradigm captures aspects of social hierarchy–related stress. We now frame these effects conservatively and acknowledge the need for future studies using additional stress paradigms.

      Apart from the methodology, the manuscript could also be improved with the addition of clear scatter points in all the plots along with detailed measures of the statistical tests such as exact p values and size of groups being compared.

      We have revised all figures to include individual data points (scatter overlays) wherever appropriate and have improved reporting of statistical details, including exact p values and group sizes, to enhance transparency and reproducibility.

      Taken together, these revisions clarify our interpretations, improve methodological transparency, and strengthen the rigor of the analyses while preserving the main conclusions of the study.

      Reviewer #2 (Public Review):

      Weaknesses:

      The diversity of neurons mediating these projections and their targeting within the BLA and NAc is not explored. These are not homogeneous structures and so one possibility is that some of the diversity within their findings may relate to targeting of different sub-structures within each region.

      We agree that both the basolateral amygdala (BLA) and nucleus accumbens (NAc) are highly heterogeneous. Our study was designed to focus on projection-defined mPFC outputs (presynaptic activity) rather than resolving postsynaptic subregional or cell-type diversity. We have now:

      - Clarified targeting strategies (PL→NAc shell and PL→BLA basal region)

      - Added histological descriptions of injection and recording sites

      - Expanded the Discussion to acknowledge how subregional and cellular heterogeneity may contribute to the observed variability

      We also highlight this as an important direction for future work.

      The electrophysiological data have significant experimental confounds and more methodological information is required to support other conclusions related to these data.

      We have significantly strengthened the electrophysiological component by:

      - Providing detailed recording conditions (access resistance, membrane properties, inclusion criteria)

      - Clarifying stimulus protocols and normalization procedures

      - Including representative traces and quantification of exclusion rates

      - Addressing potential confounds such as viral expression variability and stimulation parameters

      These revisions improve both interpretability and reproducibility of the electrophysiological findings.

      Reviewer #3 (Public Review):

      Major Weaknesses:

      (1) The manuscript does not clearly and consistently specify the sex of the mice used for behavioral and imaging experiments. Given the known influence of sex on emotional behaviors and neural activity, this omission raises concerns about the generalizability of the findings. The authors should make clear throughout the manuscript whether male, female, or mixed-sex cohorts were used and provide a rationale for their choice. If only one sex was used, the potential limitations of this approach should be explicitly discussed.

      We agree that sex is an important biological variable. We have now clearly specified throughout the manuscript that experiments were performed primarily in female mice and have added a rationale for this choice in the Methods. Briefly, we focused on females to align with the higher prevalence of affective disorders in females and to maintain consistency across experiments. We now explicitly acknowledge this as a limitation in the Discussion and note that future studies will be needed to determine whether these findings generalize to male animals.

      (2) Mice lacking "center-ON" neurons were excluded from analysis, yet the manuscript draws broad conclusions about the encoding of emotional states by mPFC pathways. It is critical to justify this exclusion and discuss how it may limit the generalizability of the findings. The inclusion of data or contextualization for animals without center-ON neurons would strengthen the interpretation.

      We thank the reviewer for raising this important point. Mice lacking identifiable center-ON neurons were excluded from analyses that specifically relied on this functional classification, as inclusion of such datasets would preclude meaningful comparison of this neuronal population. We have now clarified this criterion in the Methods and Results. Importantly, this exclusion does not affect analyses performed at the population level or those not dependent on center-ON classification. We now explicitly discuss this limitation and note that variability in the presence of center-ON neurons may reflect biological heterogeneity across animals.

      (3) The manuscript lacks baseline activity comparisons for mPFC→BLA and mPFC→NAc pathways across subjects. Providing baseline data would contextualize the observed activity changes during behavior testing and help rule out inter-individual variability as a confounding factor.

      We have added baseline comparisons of mPFC→BLA and mPFC→NAc activity across subjects to control for inter-individual variability and better contextualize behavior-related changes.

      (4) Extensive behavioral testing across multiple paradigms may introduce stress and fatigue in the animals, which could confound the induction of emotional states. The authors should describe the measures taken to minimize these effects (e.g., recovery periods, randomized testing order) and discuss their potential impact on the results.

      We now provide detailed descriptions of experimental design, including habituation, randomized testing order, and recovery periods between assays. We also discuss potential cumulative stress effects as a limitation.

      (5) Grooming is described as a "non-anxiety" behavior, which conflicts with its established role as a stress-relieving behavior that may indicate anxiety. This discrepancy requires clarification, as the distinction is central to the conclusions about the mPFC→BLA pathway's role in differentiating anxiety-related and non-anxiety behaviors.

      We thank the reviewer for this important clarification. We agree that grooming can be associated with both stress-related and self-soothing behaviors. In the revised manuscript, we have clarified that grooming is not strictly a “non-anxiety” behavior but instead represents a distinct behavioral state that may reflect stress regulation or internal state transitions. We have revised the text accordingly to avoid oversimplification and to better align with the literature.

      (6) While the study highlights pathway-specific neural activity, it lacks a cohesive integration of these findings with the behavioral data. Quantifying the overlap or decorrelation of neuronal activity patterns across tasks would solidify claims about the specialization of mPFC→NAc and mPFC→BLA pathways. Likewise, the discussion should be expanded to place these findings in light of prior studies that have probed the roles of these pathways in social/emotion/valence-related behaviors.

      We agree that stronger integration between neural and behavioral findings would strengthen the manuscript. In the revised version, we have added quantitative analyses examining the similarity and divergence of activity patterns across behavioral contexts (e.g., cross-context comparisons and correlation-based analyses). We have also expanded the Discussion to better integrate our findings with prior studies on mPFC→NAc and mPFC→BLA pathways in reward, aversion, and social behavior, thereby providing a more cohesive interpretation of pathway-specific functions.

      Minor Weaknesses:

      (1) The manuscript does not explicitly state whether the same mice were used across all behavioral assays. This information is critical for evaluating the validity of group comparisons. Additionally, more detail on sample sizes per assay would improve the manuscript's transparency.

      (2) In Figure 2G, the difference between BLA and NAc activity during exploratory behaviors (sniffing) is difficult to discern. Adjusting the scale or reformatting the figure would better illustrate the findings.

      (3) While the characteristics of the first social stimulus (M1) are specified, there is no information about the second social stimulus (M2). This omission makes it difficult to fully interpret the findings from the three-chamber test.

      (4) The methods section lacks detailed information about statistical approaches and animal selection criteria. Explicitly outlining these procedures would improve reproducibility and clarity.

      We have addressed all these minor concerns, including:

      - Clarifying whether the same mice were used across assays

      - Reporting sample sizes for each experiment

      - Improving figure clarity (e.g., scaling, labeling, scatter points)

      - Providing details for social stimuli (M1 vs. M2)

      - Expanding statistical methods and animal selection criteria

      Summary

      In summary, we have made substantial revisions to:

      - Improve conceptual precision (behavior vs. emotional state)

      - Increase methodological transparency and statistical rigor

      - Strengthen physiological validation

      - Clarify experimental design and limitations

      - Enhance integration with existing literature

      We believe these revisions significantly improve the clarity, rigor, and interpretability of the manuscript, and we are grateful for the reviewers’ guidance in strengthening this work.

    1. eLife Assessment

      This is a detailed and well-designed simulation study of the utility of replication metrics in animal-to-human study translations in bridging the gap between laboratory discoveries and health practice, a critical consideration in turning laboratory scientific research findings into tangible, real-world applications, to directly help human health. The study approaches are solid, and the findings are important, as they offer insights into clinical research translations to advance health decision-making. There is some potential for the strength and applicability of the presented evidence to be improved upon revision.

    2. Reviewer #1 (Public review):

      A well-designed and preregistered simulation study investigating whether replication-success metrics can be applied to assess animal-to-human translation. The study is comprehensive, uses realistic parameter settings, and provides valuable insights into how different metrics behave under varied conditions.

      Strengths:

      (1) Methodologically rigorous and transparently preregistered.

      (2) Comprehensive simulation design covering a wide range of plausible scenarios.

      (3) Clear description of metrics and decision rules.

      (4) Valuable contribution to understanding the limitations of applying replication metrics to translation questions.

      Weaknesses:

      (1) The conceptual distinction between replication and translation could be more clearly emphasized.

      (2) Interpretation of results is dense and can be challenging to follow without a clear and summarized.

      (3) Some simulation parameters (effect sizes, heterogeneity, and number of animal studies) require more substantial justification.

      (4) Practical recommendations could be more explicit to guide applied researchers.