10,000 Matching Annotations
  1. Last 7 days
    1. eLife Assessment

      This important work describes wing mechanosensory neurons in detail, extending our understanding of sensorimotor processing in the fruit fly. The evidence presented convincingly supports the authors' identification of these neurons and leverages state-of-the-art methods to generate a near-complete map of wing mechanosensory circuitry. Overall, this study provides new hypotheses and invaluable tools for investigating proprioceptive motor control of the wing in Drosophila.

    2. Reviewer #1 (Public review):

      Summary:

      Lesser et al provide a comprehensive description of Drosophila wing proprioceptive sensory neurons at the electron microscopy resolution. This "tour-de-force", provides a strong foundation for future structural and functional research aimed at understanding wing motor control in Drosophila with implications to understanding wing control across other insects.

      Strengths:

      (1) Authors leverage previous research that described many of the fly wing proprioceptors, and combine this knowledge with EM connectome data such that they now provide a near-complete morphological description of all wing proprioceptors.

      (2) Authors cleverly leverage genetic tools and EM connectome data to tie the location of proprioceptors on the wings with axonal projections in the connectome. This enables them to both align with previous literature as well as make some novel claims.

      (3) In addition to providing a full description of wing proprioceptors, authors also identified a novel population of sensors on the wing tegula that make direct connections with the B1 wing motor neurons implicating the role of tegula in wing movements that was previously underappreciated.

      (4) Despite being the most comprehensive description so far, it is reassuring that authors clearly state the missing elements in the discussion.

      Weaknesses:

      (1) Authors do their main analysis on data from FANC connectome but provide corresponding IDs for sensory neurons in the MANC connectome. I wonder how the connectivity matrix compares across FANC and MANC if the authors perform similar analysis as they have done in Fig. 2. This could be a valuable addition and potentially also pick up any sexual dimorphism.

      (2) Authors speculate about presence of gap junctions based on density of mitochondria. I'm not convinced about this given mitochondrial densities could reflect other things that correlate with energy demands in sub-compartments.

      Overall, I consider this an exceptional analysis which will be extremely valuable to the community.

    3. Reviewer #2 (Public review):

      Summary:

      Lesser et al. present an atlas of Drosophila wing sensory neurons. They proofread the axons of all sensory neurons in the wing nerve of an existing electron microscopy dataset, the female adult fly nerve cord (FANC) connectome. These reconstructed sensory axons were linked with light microscopy images of full-scale morphology to identify their origin in the periphery of the wing and encoded sensory modalities. The authors described the morphology and postsynaptic targets of proprioceptive neurons as well as previously unknown sensory neurons.

      Strengths:

      The authors present a valuable catalogue of wing sensory neurons, including previously undescribed sensory axons in the Drosophila wing. By providing both connectivity information with linked genetic drive lines, this research facilitates future work on the wing motor-sensory network and applications relating to Drosophila flight. The findings were linked to previous research as well as their putative role in the proprioceptive and nerve cord circuitry, providing testable hypotheses for future studies.

      Weaknesses:

      With future use as an atlas, it should be noted that the evidence is based on sensory neurons on only one side of the nerve cord. Fruit flies have stereotyped left/right hemispheres in the brain and left/right hemisegments in the nerve cord. Comparison of left and right neurons of the nervous system can give a sense of how robust the morphological and connectivity findings are. Unfortunately, this dataset has damage to the right side, making such comparisons unreliable.

    4. Reviewer #3 (Public review):

      Summary:

      The authors aim to identify the peripheral end organ origin in the fly's wing of all sensory neurons in the Anterior Dorsal Mesothoracic nerve. They reconstruct the neurons and their downstream partners in an electron microscopy volume of a female ventral nerve cord, analyse the resulting connectome and identify their origin with review of the literature and imaging of genetic driver lines. While some of the neurons were already known through previous work, the authors expand on the identification and create a near complete map of the wing mechanosensory neurons at synapse resolution.

      Strengths:

      The authors elegantly combine electron microscopy neuron morphology, connectomics and light microscopy methods to bridge the gap between fly wing sensory neuron anatomy and ventral nerve cord morphology. Further, they use EM ultrastructural observations to make predictions on the signaling modality of some of the sensory neurons and thus their function in flight.

      The work is as comprehensive as state of the art methods allow to create a near complete map of the wing mechanosensory neurons. This work will be of importance to the field of fly connectomics and modelling of fly behavior as well as a useful resource to the Drosophila research community.

      Through this comprehensive mapping of neurons to the connectome the authors create a lot of hypotheses on neuronal function partially already confirmed with the literature and partially to be tested in the future. The authors achieved their aim of mapping the periphery of the fly's wing to axonal projections in the ventral nerve cord, beautifully laying out their results to support their mapping.

      The authors identify the neurons in a previously published connectome of a male fly ventral nerve cord to enable cross-individual analysis of connections and find no indication of sexual dimorphism at the sensory neuron level. Further, together with their companion paper Dhawan et al., 2025 describing the haltere sensory neurons in the same EM dataset, they cover the entire mechanosensory space involved in Drosophila flight.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Lesser et al provide a comprehensive description of Drosophila wing proprioceptive sensory neurons at the electron microscopy resolution. This “tour-de-force” provides a strong foundation for future structural and functional research aimed at understanding wing motor control in Drosophila with implications for understanding wing control across other insects.

      Strengths:

      (1) The authors leverage previous research that described many of the fly wing proprioceptors, and combine this knowledge with EM connectome data such that they now provide a near-complete morphological description of all wing proprioceptors.

      (2) The authors cleverly leverage genetic tools and EM connectome data to tie the location of proprioceptors on the wings with axonal projections in the connectome. This enables them to both align with previous literature as well as make some novel claims.

      (3) In addition to providing a full description of wing proprioceptors, the authors also identified a novel population of sensors on the wing tegula that make direct connections with the B1 wing motor neurons, implicating the role of the tegula in wing movements that was previously underappreciated.

      (4) Despite being the most comprehensive description so far, it is reassuring that the authors clearly state the missing elements in the discussion.

      Weaknesses:

      (1) The authors do their main analysis on data from the FANC connectome but provide corresponding IDs for sensory neurons in the MANC connectome. I wonder how the connectivity matrix compares across FANC and MANC if the authors perform a similar analysis to the one they have done in Figure 2. This could be a valuable addition and potentially also pick up any sexual dimorphism.

      We agree that systematic comparisons will provide valuable insights as more connectome datasets become available. However, the primary goal of this study was to link central axon morphology with peripheral structures in the wing. We deliberately omitted more detailed and quantitative analyses of the downstream VNC circuitry, apart from providing a global view of the connectivity matrix and using it to cluster the sensory axon types. A more detailed and systematic comparison of wing sensorimotor circuit connectivity across different connectome datasets (FANC, MANC, BANC, IMAC) is the subject of ongoing work in our lab, which we feel is beyond the scope of this study. Here, we chose to match the wing proprioceptors to axons in MANC to demonstrate their stereotypy across individuals and to make them more accessible to other researchers. We found no obvious sexual dimorphism at the level of wing sensory neurons. We now note this in the Discussion.

      (2) The authors speculate about the presence of gap junctions based on the density of mitochondria. I’m not convinced about this, given that mitochondrial densities could reflect other things that correlate with energy demands in sub-compartments.

      We have moved speculation about mitochondria and gap junctions to the Discussion.

      (3) I’m intrigued by how the tegula CO is negative for iav. I wonder if authors tried other CO labeling genes like nompc. And what does this mean for the nature of this CO. Some more discussion on this anomaly would be helpful.

      Based on this suggestion, we have added an image showing that tegula CO neurons are labeled by nompC-Gal4.

      (4) The authors conclude there are no proprioceptive neurons in sclerite pterale C based on Chat-Gal4 expression analysis. It would be much more rigorous if authors also tried a pan-neuronal driver like nsyb/elav or other neurotransmitter drivers (Vglut, GAD, etc) to really rule this out. (I hope I didn’t miss this somewhere.)

      To address this, we imaged OK371-GFP, which labels glutamatergic neurons, in the wing and wing hinge. We saw expression in the wing, as others have reported (Neukomm et. al., 2014), but we saw no expression at the wing hinge. Apart from a handful of glutamatergic gustatory neurons in the leg, we are not aware of any other sensory neurons in the fly that are not labeled by Chat-Gal4.

      Overall, I consider this an exceptional analysis that will be extremely valuable to the community.

      We sincerely appreciate the reviewer’s positive feedback.

      Reviewer #2 (Public review):

      Summary:

      Lesser et al. present an atlas of Drosophila wing sensory neurons. They proofread the axons of all sensory neurons in the wing nerve of an existing electron microscopy dataset, the female adult fly nerve cord (FANC) connectome. These reconstructed sensory axons were linked with light microscopy images of full-scale morphology to identify their origin in the periphery of the wing and encoded sensory modalities. The authors described the morphology and postsynaptic targets of proprioceptive neurons as well as previously unknown sensory neurons.

      Strengths:

      The authors present a valuable catalogue of wing sensory neurons, including previously undescribed sensory axons in the Drosophila wing. By providing both connectivity information with linked genetic drive lines, this research facilitates future work on the wing motor-sensory network and applications relating to Drosophila flight. The findings were linked to previous research as well as their putative role in the proprioceptive and nerve cord circuitry, providing testable hypotheses for future studies.

      Weaknesses:

      (1) With future use as an atlas, it should be noted that the evidence is based on sensory neurons on only one side of the nerve cord. Fruit flies have stereotyped left/right hemispheres in the brain and left/right hemisegments in the nerve cord. The comparison of left and right neurons of the nervous system can give a sense of how robust the morphological and connectivity findings are. Here, the authors have not compared the left and right side sensory axons from the wing nerve, leaving potential for developmental variability across samples and left/right hemisegments.

      The right ADMN nerve in the FANC dataset is partially severed, making left/right comparisons unreliable (see Azevedo 2024, Extended Data Figure 4). We have updated the text to explain this within the Methods section of the paper.

      (2) Not all links between the EM reconstructions and driver lines are convincing. To strengthen these, for all EM-LM matches in Figures 3-7, rotated views of the driver line (matching the rotated EM views) should be shown to provide a clearer comparison of the data. In particular, Figure 3G and Figure 7B are not very convincing based on the images shown. MCFO imaging of the driver lines in Figure 3G and 7B would make this position stronger if a clone that matches the EM reconstruction could be identified.

      Many of the z-stack images in the paper are from the Janelia FlyLight collection, and unfortunately their imaging parameters were not optimized for orthogonal views. Rotated views are blurry and not especially helpful for comparison to EM reconstruction. We now point out in the text that interested readers can access the z-stacks from FlyLight to see the dorsal-ventral projections.

      Regarding Figure 3G and 7B, we have added markers to the image with corresponding descriptions in the legend to guide the reader through the image of the busy driver line. Although these lines label many cells in the VNC as a whole, they sparsely label cells in the ADMN, making them nonetheless useful for identifying peripheral sensory neurons.

      (3) Figure 7B looks like the driver line might have stochastic expression in the sensory neuron, which further reduces confidence in the result shown in Figure 7C. Is this expression pattern in the wing consistently seen? Many split-GAL4s have stochastic expressions. The evidence would be strengthened if the authors presented multiple examples (~4-5) of each driver line’s expression pattern in the supplement.

      Figure 7B shows sparse labeling of the driver line using the MCFO technique, as specified in the legend. Its unilateral expression is therefore not due to stochastic expression of the Gal4 line. We have added the “MFCO” label to the image to clarify.

      (4) Certain claims in this work lack quantitative evidence. On line 128, for instance, “Overall, our comprehensive reconstruction revealed many morphological subgroups with overlapping postsynaptic partners, suggesting a high degree of integration within wing sensorimotor circuits.” If a claim of subgroups having shared postsynaptic partners is being made, there should have been quantitative evidence. For example, cosine similar amongst members of each group compared to the cosine similarity of shuffled/randomised sets of axons from different groups. The heat map of cosine similarity in Figure 2B alone is not sufficient.

      We agree that illustrating the extent of shared postsynaptic partners across subgroups strengthens this point. We added a visualization showing pairwise similarity scores for within- and between-cluster neuron pairs (Figure 2B inset). We also performed a permutation test to determine that within-cluster similarity is significantly higher than between clusters, and we report the test in the results as well as the figure legend. This analysis provides a more quantitative summary of the qualitative trends in connectivity that are summarized in Figure 2B.

      (5) Similarly, claims about putative electrical connections to b1 motor neurons are very speculative. The authors state that “their terminals contain very densely packed mitochondria compared to other cells”, without providing a quantitative comparison to other sensory axons. There is also no quantitative comparison to the one example of another putative electrical connection from the literature. Further, it should be noted that this connection from Trimarchi and Murphey, 1997, is also stated as putative on line 167, which further weakens this evidence. Quantification would strongly strengthen this position. Identification of an example of high mitochondrial density at a confirmed electrical connection would be even better. In the related discussion section “A potential metabolic specialization for flight circuitry”, it should be more clearly noted that the dense mitochondria could be unrelated to a putative electrical connection. If the authors have an alternative hypothesis about the mitochondria density, this should be stated as well.

      We agree with the reviewer that the link between mitochondrial density and metabolic specialization is purely speculative in this context. Based on reviewer feedback, we have moved all mention of the relationship between mitochondrial density and gap junction coupling to the Discussion. We acknowledge that this may seem like a somewhat random and not quantitatively supported observation. However, we found the coincidence striking and worthy of mention, though it is only tangentially relevant to the rest of the paper. From conversations with colleagues, we have also heard that this relationship is consistent with as yet unpublished work in other model organisms (e.g., zebrafish, mouse).

      The electrical coupling to b1 motor neurons is well-established (Fayyazuddin and Dickinson, 1999), and we have updated the text to state this more clearly. However, we agree that whether the specific neurons we have identified based on their anatomy are the same ones functionally identified through whole-nerve recordings remains unknown.

      (6) It would be appropriate to cite previous work using a similar strategy to match sensory axons to their cell bodies/dendrites at the periphery using driver lines and connectomics (see Figure 5 for example in the following paper: https://doi.org/10.7554/eLife.40247 ).

      At this point, there are now dozens of papers that match the axons of sensory neurons to their cell bodies/dendrites in the periphery by comparing light microscopy and connectomics. When we dug in, we found examples in C. elegans, Ciona intestinalis, zebrafish, and mouse, all published prior to the study cited above. For basically every animal for which scientists have acquired EM volumes of neural tissue, they have used other anatomical labeling methods to determine cell types inside and outside the imaged volume. In summary, we found it difficult to establish a single primary citation for this approach. In lieu of this, we have added a citation to an earlier review by a pioneer in EM connectomics that discusses the general approach of matching cells across different labeling/imaging modalities (Meinertzhagen et al., 2009).

      The methods section is very sparse. For the sake of replicability, all sections should be expanded upon.

      We have expanded the methods section, and also now include a STAR methods table.

      Reviewer #3 (Public review):

      Summary:

      The authors aim to identify the peripheral end-organ origin in the fly’s wing of all sensory neurons in the anterior dorsomedial nerve. They reconstruct the neurons and their downstream partners in an electron microscopy volume of a female ventral nerve cord, analyse the resulting connectome, and identify their origin with a review of the literature and imaging of genetic driver lines. While some of the neurons were already known through previous work, the authors expand on the identification and create a near-complete map of the wing mechanosensory neurons at synapse resolution.

      Strengths:

      The authors elegantly combine electron microscopy, neuron morphology, connectomics, and light microscopy methods to bridge the gap between fly wing sensory neuron anatomy and ventral nerve cord morphology. Further, they use EM ultrastructural observations to make predictions on the signaling modality of some of the sensory neurons and thus their function in flight.

      The work is as comprehensive as state-of-the-art methods allow to create a near-complete mapof the wing mechanosensory neurons. This work will be of importance to the field of fly connectomics and modelling of fly behavior, as well as a useful resource to the Drosophila research community.

      Through this comprehensive mapping of neurons to the connectome, the authors create a lot of hypotheses on neuronal function, partially already confirmed with the literature and partially to be tested in the future. The authors achieved their aim of mapping the periphery of the fly’s wing to axonal projections in the ventral nerve cord, beautifully laying out their results to support their mapping.

      The authors identify the neurons in a previously published connectome of a male fly ventral nerve cord to enable cross-individual analysis of connections. Further, together with their companion paper, Dhawan et al. 2025, describing the haltere sensory neurons in the same EM dataset, they cover the entire mechanosensory space involved in Drosophila flight.

      Weaknesses:

      The connectomic data are only available upon request; the inclusion of a connectivity table of the reconstructed neurons would aid analysis reproducibility and cross-dataset comparisons.

      We have added a connectivity table as well as analysis scripts in the github repository for the paper (https://github.com/EllenLesser/Lesser_eLife_2025).

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      The methods section should be expanded in every aspect. Most pressing sections are:

      (1) Data and Code availability: All code should be included as a Zenodo database, the suggestion to ask authors for code upon request is inappropriate.

      We have added all code to a public github repository, which is now linked in the Methods section.

      (2) Samples: Standard cornmeal and molasses medium should have a reference, as many institutes use different recipes.

      The recipe used by the University of Washington fly kitchen is based on the Bloomington standard Cornmeal, Molasses and Yeast Medium recipe, which can be found at https://bdsc.indiana.edu/information/recipes/molassesfood.html. The UW recipe is slightly modified for different antifungal ingredients and includes tegosept, propionic acid, and phosophoric acid.

      (3) Table 3: Driver lines labelling wing sensory neurons: The genetic driver lines should have associated Bloomington stock centre numbers. Additionally, relevant information for effector lines used should be included in the methods.

      We now include the Bloomington stock numbers and more information on effector lines in the STAR methods table.

      Minor corrections:

      (1) Lines 119-120: “Notably, many of the axons do not form crisp cluster boundaries, suggesting that multimodal sensory information is integrated at early stages of sensory processing.” We do not follow the logic of this statement and suspect it is a bit too speculative.

      We removed this sentence from the manuscript.

      (2) Figure 1: The ADMN is missing in the schematics and would be helpful to depict for non-experts. Is this what is highlighted in Figure 1D?

      Yes, and we now label 1D as the ADMN wing nerve.

      (3) Figure 1B: Which driver lines are being depicted here? Looking at Table 3 does not clarify. It should be specified at least in the figure legend.

      As stated in the legend, we include a table of all of the driver lines we screened and which sensory structures they label.

      (4) Figure 1C: There are some minor placement issues with the text in the schematic. There is an arrow very close to the “CO” on the top right, which makes the “O” look like the symbol for male. “ax ii” is a bit too close to the wing hinge

      We updated the figure to address this issue.

      (5) Figure 1D: The outlined grey masks are not clear. The use of colour would be very useful for the reader to help understand what the authors are referring to here

      We now use color for the masks.

      (6) Figure 2A: It is unclear if the descending neuron and non-motor efferent neuron are not shown because they are under the described threshold, or to simplify the plot. They should be included in the plot if over the threshold.

      We have updated the legend to specify that the exclusion of the descending and non-motor efferent neurons are to visually simplify the plot. We include % of sensory output to each of these neurons in the legend, and they are included in the connectivity matrix data in the public  GitHub repository associated with the paper, included in the Methods.

      (7) Figure 2B: What clustering is used specifically? The method says it’s from Scikit-learn, but there are many types of clustering available in this package.

      We now include the specific clustering type used in the Methods section, which is agglomerative clustering.

      (8) Figure 3A: What does the green box behind the plot represent?

      The green box represents the tegula CO axons, which we now specify in the legend.

      (9) Figure 3C: the “C” is clipped at the top.

      We updated the figure to address this issue.

      (10) Figure 4A: the main text says a “group of four axons” (line 203) while the figure says 5 axons.

      We updated the text to address this issue.

      (11) Line 360: “We found that the campaniform sensilla on the tegula provide the most direct feedback onto wing steering motor neurons”. We struggled to find where this was directly shown, because several sensory axon types directly synapse onto motor neurons.

      We now specify in the text that this finding is shown in Figure 3.

      Reviewer #3 (Recommendations for the authors):

      I would like to congratulate the authors on their beautiful, easy-to-read, and easy-to-comprehend manuscript, with clear figures and nice visualizations. This work provides a valuable resource that will contribute to the interpretability of connectomic data and further to connectome-based modeling of fly behavior.

      We sincerely appreciate the reviewer’s positive feedback.

    1. eLife Assessment

      This important work examines the effects of side-wall confinement on chemotaxis of swimming bacteria in a shallow microfluidic channel. The authors present convincing experimental evidence, combined with geometric analysis and numerical simulations of simplified models, showing that chemotaxis is enhanced when the distance between the side walls is comparable to the intrinsic radius of chiral circular swimming near open surfaces. This study should be of interest to scientists specializing in bacteria-surface interactions.

    2. Reviewer #1 (Public review):

      The authors show experimentally that, in 2D, bacteria swim up a chemotactic gradient much more effectively when they are in the presence of lateral walls. Systematic experiments identify an optimum for chemotaxis for a channel width of ~8µm, a value close to the average radius of the circle trajectories of the unconfined bacteria in 2D. These chiral circles impose that the bacteria swim preferentially along the right-side wall, which indeed yields chemotaxis in the presence of a chemotactic gradient. These observations are backed by numerical simulations and a geometrical analysis.

    3. Reviewer #3 (Public review):

      This paper addresses, through experiment and simulation, the combined effects of bacterial circular swimming near no-slip surfaces and chemotaxis in simple linear gradients. The authors have constructed a microfluidic device in which a gradient of L-aspartate is established, to which bacteria respond while swimming while confined in channels of different widths. There is a clear effect that the chemotactic drift velocity reaches a maximum in channel widths of about 8 microns, similar in size to the circular orbits that would prevail in the absence of side walls. Numerical studies of simplified models confirm this connection.

      The experimental aspects of this study are well executed. The design of the microfluidic system is clever in that it allows a kind of "multiplexing" in which all the different channel widths are available to a given sample of bacteria.<br /> The authors have included a useful intuitive explanation of their results via a geometric model of the trajectories. In future work it would be interesting to analyze further the voluminous data on the trajectories of cells by formulating the mathematical problem in terms of a suitable Fokker-Planck equation for the probability distribution of swimming directions. In particular, this might help understand how incipient circular trajectories are interrupted by collisions with the walls and how this relates to enhanced chemotaxis.

      The authors argue that these findings may have relevance to a number of physiological and ecological contexts. As these would be characterized by significant heterogeneity in pore sizes and geometries, further work will be necessary to translate the present results to those situations.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      This article deals with the chemotactic behavior of E coli bacteria in thin channels (a situation close to 2D). It combines experiments and simulations.

      The authors show experimentally that, in 2D, bacteria swim up a chemotactic gradient much more effectively when they are in the presence of lateral walls. Systematic experiments identify an optimum for chemotaxis for a channel width of ~8µm, close to the average radius of the circle trajectories of the unconfined bacteria in 2D. It is known that these circles are chiral and impose that the bacteria swim preferentially along the right-side wall when there is no chemotactic gradient. In the presence of a chemotactic gradient, this larger proportion of bacteria swimming on the right wall yields chemotaxis. This effect is backed by numerical simulations and a geometrical analysis.

      If the conclusions drawn from the experiments presented in this article seem clear and interesting, I find that the key elements of the mechanism of this wall-directed chemotaxis are not sufficiently emphasized. Moreover, the paper would be clearer with more details on the hypotheses and the essential ingredients of the analyses.

      We thank the reviewer for these constructive suggestions. We agree that emphasizing the underlying mechanism is crucial for the clarity of our findings. In the revised manuscript, we have now explicitly highlighted the critical roles of chiral circular motion and the alignment effect following side-wall collisions in both the Abstract (lines 25-27) and the Discussion (lines 391-393). Furthermore, we have added a new analysis of bacterial trajectories post-collision (Fig. S2), which demonstrates that cells predominantly align with and swim along the sidewalls. We have also clarified the assumptions in our numerical simulations, specifically how the radius of circular trajectories and the alignment effect are incorporated into the equations of motion. Please refer to our detailed responses in the "Recommendations for the authors" section for further specifics.

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors investigated the chemotaxis of E. coli swimming close to the bottom surface in gradients of attractant in channels of increasingly smaller width but fixed height = 30 µm and length ~160 µm. In relatively large channels, they find that on average the cells drift in response to the gradient, despite cells close to the surface away from the walls being known to not be chemotactic because they swim in circles.

      They find that this average drift is due to the cell localization close to the side walls, where they slide along the wall. Whereas the bacteria away from the walls have no chemotaxis (as shown before), the ones on the left side wall go down-gradient on average, but the ones on the right-side wall go up-gradient faster, hence the average drift. They then study the effect of reducing channel width. They find that chemotaxis is higher in channels with a width of about 8 µm, which approximately corresponds to the radius of the circular swimming R. This higher chemotactic drift is concomitant to an increased density of cells on the RSW. They do simulations and modeling to suggest that the disruption of circular swimming upon collision with the wall increases the density of cells on the RSW, with a maximal effect at w = ~ 2/3 R, which is a good match for their experiments.

      Strengths:

      The overall result that confinement at the edge stabilises bacterial motion and allows chemotaxis is very interesting although not entirely unexpected. It is also important for understanding bacterial motility and chemotaxis under ecologically relevant conditions, where bacteria frequently swim under confinement (although its relevance for controlling infections could be questioned). The experimental part of the study is nicely supported by the model.

      Weaknesses:

      Several points of this study, in particular the interpretation of the width effect, need better clarification:

      (1) Context:

      There are a number of highly relevant previous publications that should have been acknowledged and discussed in relation to the current work:

      https://pubs.rsc.org/en/content/articlehtml/2023/sm/d3sm00286a

      https://link.springer.com/article/10.1140/epje/s10189-024-00450-7

      https://doi.org/10.1016/j.bpj.2022.04.008

      https://doi.org/10.1073/pnas.1816315116

      https://www.pnas.org/doi/full/10.1073/pnas.0907542106

      https://doi.org/10.1038/s41467-020-15711-0

      http://doi.org/10.1038/s41467-020-15711-0

      http://doi.org/10.1039/c5sm00939a

      We appreciate the reviewer bringing these important publications to our attention. We have now cited and discussed these works in the Introduction (lines 55-62 and 76-85) to better contextualize our study regarding bacterial motility and chemotaxis in confined geometries.

      (2) Experimental setup:

      a) The channels are built with asymmetric entrances (Figure 1), which could trigger a ratchet effect (because bacteria swim in circle) that could bias the rate at which cells enter into the channel, and which side they follow preferentially, especially for the narrow channel. Since the channel is short (160 µm), that would reflect on the statistics of cell distribution. Controls with straight entrances or with a reversed symmetry of the channel need to be performed to ensure that the reported results are not affected by this asymmetry.

      We appreciate the reviewer's insight regarding the potential ratchet effect caused by asymmetric entrances. To rule this out, we fabricated a control device with straight entrances and repeated the measurements. As shown in Figure S3, the chemotactic drift velocity follows the same trend as observed in the original setup, confirming an optimal width of ~9 mm. These results demonstrate that the entrance geometry does not bias the reported statistics. We have updated the manuscript text at lines 233-235.

      b) The authors say the motile bacteria accumulate mostly at the bottom surface. This is strange, for a small height of 30 µm, the bacteria should be more-or-less evenly spread between the top and bottom surface. How can this be explained?

      We apologize for not explaining this clearly in the text. As shown by Wei et al., Phys. Rev. Lett. 135, 188401 (2025), significant surface accumulation occurs in channels with heights exceeding 20 µm. In our specific experimental setup, we did not use Percoll to counteract gravity. Therefore, the bacteria accumulated mostly at the bottom surface under the combined influence of gravity and hydrodynamic attraction. This bottom-surface localization is supported by our observation that the bacterial trajectories were predominantly clockwise (characteristic of the bottom surface) rather than counter-clockwise (characteristic of the top surface). We have added this explanation to Line 141.

      c) At the edge, some of the bacteria could escape up in the third dimension (http://doi.org/10.1039/c5sm00939a). What is the magnitude of this phenomenon in the current setup? Does it have an effect?

      We thank the reviewer for raising this important point regarding 3D escape. We have quantified this phenomenon and found the escape rate from the edge into the third dimension to be 0.127 s<sup>-1</sup>. This corresponds to a mean residence time that allows a cell moving at 20 mm/s to travel approximately 157.5 mm along the edge. Since this distance is comparable to the full length of our lanes (~160 mm), most cells traverse the entire edge without escaping. Furthermore, our analysis is based on the average drift of the surface trajectories per unit of time; this metric is independent of the absolute number of cells present. Therefore, the escape phenomenon does not significantly impact our conclusions. We have added a statement clarifying this at line 154.

      d) What is the cell density in the device? Should we expect cell-cell interactions to play a role here? If not, I would suggest to de-emphasize the connection to chemotaxis in the swarming paper in the introduction and discussion, which doesn't feel very relevant here, and rather focus on the other papers mentioned in point 1.

      The cell density in our experiments was approximately 1.3×10<sup>-3</sup> μm<sup>-2</sup>. Given this low density, we do not expect cell-cell interactions to play a role in the observed behaviors.

      Regarding the connection to swarming chemotaxis: We agree that our low-density setup differs from a high-density swarm; however, we believe the comparison remains relevant for two reasons. First, it provides a necessary contrast to studies showing surface inhibition of chemotaxis. Second, while we eliminate cell-cell interactions, we isolate the geometric aspect of swarming. In a swarm, cells move within narrow lanes created by their neighbors. Our device mimics this specific physical confinement by replacing neighboring cells with PDMS sidewalls. This allows us to decouple the effects of physical confinement from cell-cell interactions. We have added the text (Line 370) to clarify this rationale and have incorporated the additional references in introduction as suggested in point 1.

      e) We are not entirely convinced by the interpretation of the results in narrow channels. What is the causal relationship between the increased density on the RSW and the higher chemotactic drift? The authors seem to attribute higher drift to this increased RSW density, which emerges due to the geometric reasons. But if there is no initial bias, the same geometric argument would induce the same increased density of down-gradient swimmers on the LSW, and so, no imbalance between RSW and LSW density. Could it be the opposite that the increased RSW density results from chemotaxis (and maybe reinforces it), not the other way around? Confinement could then deplete one wall due to the proximity of the other, and/or modify the swimming pattern - 8 µm is very close to the size of the body + flagellum. To clarify this point, we suggest measuring the bacterial distributions in the absence of a gradient for all channel widths as a control.

      We thank the reviewer for this insightful comment regarding the causal relationship between cell density and chemotactic drift. We apologize if the initial explanation was unclear.

      Regarding the no-gradient control: Without an attractant gradient (and no initial bias), there is no breaking of symmetry and the labels of "LSW" and "RSW" are arbitrary. Therefore, there will be no asymmetry in the bacterial distributions on both sides (within experimental fluctuations) in the absence of a gradient for any channel width.

      Regarding the causality and density imbalance: We agree that the increased RSW density is a result of chemotaxis, which is then reinforced by the lane geometry especially at narrow lane width. The mechanism relies on the coupling of chemotactic bias with surface circularity. The angle ranges that lead to RSW-UG accumulation (Fig. 6A-C) coincide with the up-gradient direction. Because these cells experience suppressed tumbling (longer runs), they can maintain the steady circular trajectories required to reach and align with the RSW. Conversely, while pure geometric analysis suggests a similar potential for LSW-DG accumulation, these trajectories coincide with the down-gradient direction. These cells experience enhanced tumbling, which distorts the circular trajectories. This prevents them from effectively reaching the LSW and also increases the probability of them leaving the wall. Therefore, the causality is indeed a positive feedback loop: the attractant gradient creates an initial bias that allows the RSW-UG fraction to form stable trajectories; the optimal lane width (matching the swimming radius) then maximizes this capture efficiency, further enriching the RSW fraction and enhancing the overall drift.

      We have added clarifications regarding these points in the revised manuscript (the last paragraph of “Results”).

      (3) Simulations:

      The simulations treat the wall interaction very crudely. We would suggest treating it as a mechanical object that exerts elastic or "hard sphere" forces and torques on the bacteria for more realistic modeling.

      We appreciate the reviewer's suggestion to incorporate more detailed mechanical interactions, such as elastic or hard-sphere forces, for the wall collisions. While we agree that a full hydrodynamic or mechanical model would offer higher fidelity, our experimental observations suggest that a simplified kinematic approach is sufficient for the specific phenomena studied here.

      As shown in the new Fig. S2, our analysis of cell trajectories in the 44-µm-wide channels reveals that cells colliding with the sidewalls tend to align with the surface almost instantaneously. The timescale required for this alignment is negligible compared to the typical wall residence time (see also Ref. 6). Consequently, to maintain computational efficiency without sacrificing the essential physics of the accumulation effect, we employed a coarse-grained phenomenological model where a bacterium immediately aligns parallel to the wall upon contact, similar to approaches used previously (Ref. 43). We have added relevant text to the manuscript on lines 168-171.

      Notably, the simulations have a constant (chemotaxis independent) rate of wall escape by tumbling. We would expect that reduced tumbling due to up-gradient motility induces a longer dwell time at the wall.

      We apologize for the confusion. The chemotaxis effect is indeed fully integrated into our simulation. Specifically, the simulated cells sense the chemical gradient and adjust their motor CW bias (B) accordingly. This adjustment directly modulates the tumble rate (k), calculated as k \= B/0.31 s<sup>-1</sup>. Consequently, the wall escape rate is not constant but varies with the chemotactic response. We also imposed a maximum detention time limit which, when combined with the variable tumble rate, results in an average wall residence time of approximately 2 s, consistent with our experimental observations (Fig. S6B). We have clarified these details in the final section of 'Materials and Methods'.

      Reviewer #3 (Public review):

      This paper addresses through experiment and simulation the combined effects of bacterial circular swimming near no-slip surfaces and chemotaxis in simple linear gradients. The authors have constructed a microfluidic device in which a gradient of L-aspartate is established to which bacteria respond while swimming while confined in channels of different widths. There is a clear effect that the chemotactic drift velocity reaches a maximum in channel widths of about 8 microns, similar in size to the circular orbits that would prevail in the absence of side walls. Numerical studies of simplified models confirm this connection.

      The experimental aspects of this study are well executed. The design of the microfluidic system is clever in that it allows a kind of "multiplexing" in which all the different channel widths are available to a given sample of bacteria.

      While the data analysis is reasonably convincing, I think that the authors could make much better use of what must be voluminous data on the trajectories of cells by formulating the mathematical problem in terms of a suitable Fokker-Planck equation for the probability distribution of swimming directions. In particular, I would like to see much more analysis of how incipient circular trajectories are interrupted by collisions with the walls and how this relates to enhanced chemotaxis. In essence, there needs to be a much clearer control analysis of trajectories without sidewalls to understand the mechanism in their presence.

      We thank the reviewer for this insightful suggestion. We agree that understanding how circular trajectories are interrupted by wall collisions is central to explaining the enhanced chemotaxis. While we did not explicitly formulate a Fokker-Planck equation, we have addressed the reviewer's core point by employing two complementary mathematical approaches that model the probability distribution of swimming directions and wall interactions:

      (1) Stochastic simulations (Langevin approach): As detailed in the "Simulation of E. coli chemotaxis within lane confinements" subsection of “Results” and Figure 5, we modeled cells as self-propelled particles performing random walks. This model explicitly accounts for the "interruption" of circular trajectories by incorporating a constant angular velocity (circular swimming) and an alignment effect upon collision with sidewalls. These simulations successfully reproduced the experimental trends, confirming that the interplay between circular radius and lane width determines the optimal drift velocity.

      (2) Geometric probability analysis: To provide the "intuitive understanding", we included a specific Geometrical Analysis section (the last subsection of “Results”) and Figure 6. This analysis mathematically formulates the problem by calculating the exact proportion of swimming angles that allow a cell to transition from a circular trajectory in the bulk to an up-gradient trajectory along the Right Sidewall (RSW). By integrating over the possible swimming directions, we derived the probability of wall interception as a function of lane width (w) and swimming radius (r). This analysis reveals that the interruption of circular paths is most favorable for chemotaxis when w » (0.7-0.8)´r.

      (3) Control analysis: regarding the "control analysis of trajectories without sidewalls," we utilized the cells in the Middle Area (MA) of the wide lanes as an internal control. As shown in Fig. 2B and 4A, these cells exhibit typical surface-associated circular swimming (Fig. 3B) but generate zero net drift. This serves as the baseline "no sidewall" condition, demonstrating that the chemotactic enhancement is strictly driven by the rectification of circular swimming into wall-aligned motion at the boundaries.

      The authors argue that these findings may have relevance to a number of physiological and ecological contexts. Yet, each of these would be characterized by significant heterogeneity in pore sizes and geometries, and thus it is very unclear whether or how the findings in this work would carry over to those situations.

      We thank the reviewer for this important observation regarding environmental heterogeneity. We agree that we should be cautious about directly extrapolating to complex ecological contexts without qualification. We have revised the last sentence of the abstract to adopt a more measured tone: "Our results may offer insights into bacterial navigation in complex biological environments such as host tissues and biofilms, providing a preliminary step toward exploring microbial ecology in confined habitats and potential strategies for controlling bacterial infections."

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Key elements of the mechanism of wall-directed chemotaxis are not sufficiently emphasized:

      For instance, the chirality of the trajectories is an essential part of the analysis but is mentioned only briefly in the introduction. In the geometrical analysis, I understand that one of the critical parameters is the angle at which bacteria "collide" with the walls. But, again, this remains largely implicit in the discussion. This comes to the point that these ideas are not even mentioned in the abstract which doesn't provide any hint of a mechanism. An analysis of the actual trajectories of the cells after they hit the walls, as a function of their initial angle would be helpful in comparison with the simulations and the geometrical analysis.

      We appreciate the reviewer's insightful comment regarding the need to better emphasize the mechanism of wall-directed chemotaxis. We agree that the chirality of trajectories and the geometry of wall collisions are central to our analysis and were previously under-emphasized.

      To address this, we have made the following revisions:

      (1) We have revised the Abstract (lines 25-27) and the Discussion (lines 391-393) to explicitly highlight the crucial role of chiral circular motion and the alignment effect following sidewall collisions.

      (2) We further analyzed bacterial trajectories at different collision angles. Typical examples are shown in Supplementary Fig. S2. We observed that cells tend to align with and swim along the sidewalls regardless of their initial collision angles. This finding is now described in the main text at lines 168-171.

      The motion of the bacteria is modelled as run-and-tumble at several places in the manuscript, and in particular in the simulations. Yet, the trajectories of the bacteria seem to be smooth in this almost 2D geometry, except of course when they directly interact with the walls (I hardly see tumbles in the MA region in Figure 1B). Can the authors elaborate on the assumptions made in the numerical simulations? In particular, how is the radius of the trajectories included in these equations of motion (line 514)?

      We apologize for the lack of clarity regarding the bacterial motion model. It has been established that while bacteria do tumble near solid surfaces, they exhibit a smaller reorientation angle compared to bulk fluids; in fact, the most probable reorientation angle on a surface is zero (Ref. 41). Consequently, tumbles are often difficult to distinguish from runs with the naked eye. Additionally, the trajectories in Figure 1B are plotted on a 44 mm ´ 150 mm canvas with unequal coordinate scales, which may further obscure the visual distinctness of tumbling events.

      Regarding the equations of motion: We modeled the bacteria as self-propelled particles governed by the internal chemotaxis pathway, alternating between run and tumble states. As noted in the equations on lines 286 & 578, we incorporated the circular motion by introducing a constant angular velocity, −ν<sub>0</sub>/r, during the run state. Here, ν<sub>0</sub> represents the swimming speed, r denotes the radius of circular swimming, and the negative sign indicates clockwise chirality. Furthermore, to model the hydrodynamic interaction with the boundaries, we assumed that when a cell collides with a sidewall, its velocity vector instantly aligns parallel to that wall.

      The comparison of Figure 5B (simulations) with Figure 4B (experiments) does not strike me as so "similar". Why are the points at small widths so noisy (Figure 5AB)? Figure 5C is cut at these widths, it should be plotted over the entire scale.

      We acknowledge that the agreement between simulation and experiment is less robust in the narrowest channels. The discrepancy and "noise" at small widths in Figure 5 arise from the limitations of the self-propelled particle model in highly confined geometries. Specifically, our simulation treats bacteria as point particles and does not explicitly calculate the physical exclusion (steric effects) caused by the finite size of the flagella and cell body.

      In the experimental setup, steric constraints within narrow channels (comparable to the cell size) restrict the cells' ability to turn freely, effectively stabilizing their motion. However, because our model allows particles to reorient more freely than actual cells would in such confined spaces, it produces fluctuations and an overestimation of the drift velocity at small widths. If these confinement effects were fully incorporated, the cell density mismatch between the left and right sidewalls would be reduced, leading to lower drift velocities that match the experimental data more closely.

      Regarding Figure 5C: Since the "active particle" assumption loses physical validity in channels narrower than the scale of the bacterium, the simulation results in this regime are not representative of biological reality. Plotting these non-physical points would distort the analysis. Therefore, we have maintained the truncation of Figure 5C at 4 mm to ensure the data presented is physically meaningful. We have added a clear discussion of these model limitations to the manuscript at lines 310-314.

      These important precisions should be added to the text or in a supplementary section. A validated mechanism describing in detail the impact of the walls on the cell trajectories would greatly improve the conclusions.

      We thank the reviewer for the suggestions. As noted in the responses above, we have incorporated the details concerning the simulation assumptions and the model limitations at narrow widths into the revised manuscript. We have performed further analysis of the collision trajectories between bacteria and the sidewalls. As illustrated in the new Fig. S2, the data confirms that cells tend to align with and swim along the sidewalls following a collision, regardless of the initial impact angle.

      Reviewer #2 (Recommendations for the authors):

      Minor points

      (1) Related to swimming in 3D: The authors should specify the depth of field of the objective in their setup.

      We thank the reviewer for pointing this out. We have calculated the depth of field (DOF) of our objective to be approximately 3.7 µm. This estimate is based on the standard formula:

      where l = 610 nm (emission wavelength), n = 1.0 (refractive index), NA = 0.45 (numeric aperture), M = 20 (magnification), and e = 6.5 µm (camera resolution). We have added this specification to the "Microscopy and Data Acquisition" section of “Materials and Methods”.

      (2) Related to the interpretation of the width effect: We think plotting the cell enrichment, ie the probabilities P in Figure 4B normalized to the expected value if cells were homogeneously distributed ((3µm)/w for the side walls, (w - 6µm)/w for the middle) would help understand the strength of the wall 'siphoning' effect.

      We thank the reviewer for the suggestion. We have calculated the cell enrichment by normalizing the observed probabilities against the expected values for a homogeneous distribution, as suggested. The resulting relationship between cell enrichment and lane width is presented in Figure S4.

      Related to simulations:

      (1) Showing vd for the 3 regions in Figure S5 would be helpful also to understand the underlying mechanism.

      We thank the reviewer for the suggestion. The V<sub>d</sub> values for the three regions are shown in Fig. S5.

      (2) Figure 5B vs 4B: There is a mismatch in the right vs left side density at w=6µm in the simulations that is not here in the experiments. What could explain this difference?

      We appreciate the reviewer pointing this out. The mismatch in the simulations is due to the simplified treatment of cells as self-propelled particles, which overlooks the physical volume of the cell body and flagella. In narrow channels (w\=6 mm), these physical constraints would restrict the cells' ability to change direction freely - a factor not fully captured in the simulation. Accounting for these steric effects would trap cells more effectively against the walls, reducing the density asymmetry between the LSW and RSW and lowering the drift velocity. This would bring the simulation results closer to the experimental observations. We have added a discussion of these limitations and effects to the revised manuscript (lines 310-314).

      (3) The simulations essentially assume that the density of motile cells is homogeneous and equal at both x=0 and x=L open ends of the channel. Is it the case in the experiments, even with the gradient, and the walls creating some cell transport?

      We thank the reviewer for pointing this out. The simulation assumption is consistent with our experimental observations. Our data were recorded within 160-μm-long lanes located in the center of the wider (400 μm) cell channel. In this central region, the cells maintain a continuous flux. Furthermore, experiments were performed within 8 min of flow, limiting the time for significant cell density gradients to establish. As illustrated in Author response image 11, the inhomogeneity in the measured cell density distribution is insignificant across the length of the observation window, indicating that the walls and gradient do not create significant heterogeneity at the boundaries of the region of interest.

      Author response image 1.

      The cell density distribution along the gradient field from the data of 44-μm-wide lane.

      (4) Line 506: There is something strange with the definition of the bias. B cannot be the tumbling bias if k=B/0.31 s<sup>-1</sup> and the tumble-to-run rate is 5/s, because then the tumbling bias is B/0.31 / (B/0.31 + 5). Please clarify.

      We apologize for the confusion caused by the notation. In our model, B represents the CW bias of the individual flagellar motor, not the macroscopic tumbling bias of the cell. We assume the run-to-tumble rate is equivalent to the motor CCW-to-CW switching rate (k). Previous studies have shown that this rate increases linearly with the motor CW bias according to k=B/t, where t is a characteristic time (Ref. 50).

      Based on experimental data for wildtype cells, the average run time in the near-surface region is ~2.0 s (corresponding to a run-to-tumble rate of ~0.5 s<sup>-1</sup>) (Ref. 11), and the steady-state wildtype CW bias is ~0.15. Using these values, we determined t ~ 0.31 s. Consequently, the switching rate is defined as k=B/0.31 s<sup>-1</sup>. Since the tumble duration is constant (0.2 s) (Ref. 51), the tumble-to-run rate is fixed at 5 s<sup>-1</sup>. We have clarified these definitions and parameter values in lines 569-573.

      Other minor comments:

      (1) Line 20 and lines 34-35: We think that the connection to infection is questionable here and should be toned down.

      Thank you for the suggestion. We have revised Line 20 to read: “Understanding bacterial behavior in confined environments is helpful to elucidating microbial ecology and developing strategies to manage bacterial infections.” Additionally, we modified lines 34-35 to state: “Our results may offer insights into bacterial navigation in complex biological environments such as host tissues and biofilms, providing a preliminary step toward exploring microbial ecology in confined habitats and potential strategies for controlling bacterial infections.”

      (2) Line 49: Consider highlighting the change in the sense of rotation at the air-liquid interface.

      Thank you for the suggestion. We have now highlighted the difference in chirality between trajectories at the air-liquid interface and those at the liquid-solid interface. The text has been updated to read: “For example, E. coli swim clockwise when observed from above a solid surface, whereas Caulobacter crescentus move in tight, counter-clockwise circles when viewed from the liquid side.”

      (3) Lines 58-59: The sentence should be better formulated, explaining what is CheY-P and that its concentration changes because of a change in phosphorylation (P).

      Thank you for the suggestion. We have reformulated this section to explicitly define CheY-P and explain how its concentration is regulated through phosphorylation. The revised text reads: “The transmembrane chemoreceptors detect attractants or repellents and transmit signals into the cell by modulating the autophosphorylation of the histidine kinase CheA. Attractant binding suppresses CheA autophosphorylation, while repellent binding promotes it. This modulation alters the concentration of the phosphorylated response regulator protein, CheY-P.”

      (4) Lines 63-64: CheR CheB do a bit more than "facilitating" adaptation, they mediate it. The notation CheB(p) may be confusing, since "-P" was used above for CheY.

      Thank you for pointing this out. We have corrected the notation and strengthened the description of the enzymes' roles. The revised text is: “The adaptation enzymes CheR and CheB methylate and demethylate the receptors, respectively, mediating sensory adaptation.”

      (5) Line 130: there must be a typo in the formula.

      We have replaced the ambiguous lag time variable in Fig. 1C with _n_Δt to ensure mathematical consistency.

      (6) Additionally, \Delta t is both the time between the frame here and the lag time in Figure 1.

      Thank you for highlighting this ambiguity. We have updated the notation to distinguish these two values. The lag time in Figure 1 is now explicitly denoted as _n_Δt, while Δt remains the time interval between individual frames.

      (7) Line 162: "Consistent with previous reports," a reference to said reports is missing.

      Thank you for pointing this out. We have now added the reference (Ref. 41) to support this statement.

      (8) Figure 1B: Are these tracks in the presence of a gradient? Same as used in panel C? This needs to be explained.

      Response: Thank you for this question. We confirm that the tracks shown in Figure 1B were indeed recorded in the presence of a gradient and represent a subset of the data used in Figure 1C. We have clarified this in the figure legend as follows: "Thirty bacterial trajectories selected from the data of the 44-mm-wide lane in gradient assays. These represent a subset of the trajectories analyzed in panel C."

      (9) Simulations: the equation for x(t) should also be given for completeness.

      Thank you for the suggestion. For completeness, we have added the position updating equations for the run state to the Materials and Methods section (lines 579-580). The equations are defined as:

      (10) Figure S2: For the swimming directions that are more unstable due to the surface friction torque, RSW-DG, and LSW-UG, one would have expected that the Up-gradient motion is more persistent than the down gradient one. It seems to be the opposite. Is it significant, and what could be the reason for this?

      We apologize for the lack of clarity in our original explanation. While we would generally expect up-gradient motion to be more persistent than down-gradient motion in bulk fluid, our measurements near the surface show a different trend due to the specific contributions of run and tumble states to the escape rate. Cells swimming up-gradient (UG) in the LSW experience higher probability of running. Consequently, they are subjected to the destabilizing surface friction torque for a greater proportion of time compared to cells swimming down-gradient (DG) in the RSW. This can be explained mathematically. The escape rates for RSW-DG and LSW-UG can be expressed as:

      Where B<sup>+</sup> and B<sup>−</sup> represent the tumble bias (probability of tumbling) when swimming up-gradient and down-gradient, respectively, and k<sub>T</sub> and k<sub>R</sub> denote the escape rates during a tumble and a run, respectively. Due to the chemotactic response, 0≤ B<sup>+</sup>< B<sup>−</sup> ≤1. Crucially, our system is characterized by k<sub>R</sub>>k<sub>T</sub> (the escape rate is higher during a run than a tumble). Therefore, the lower tumble bias during up-gradient swimming (B<sup>+</sup>< B<sup>−</sup>) increases the weight of the run-state escape term((1−B<sup>+</sup>)k<sub>R</sub>), leading to a higher overall escape rate for LSW-UG compared to RSW-DG. We have added an intuitive understanding of k<sub>R</sub>>k<sub>T</sub> in the Supplemental text.

    1. eLife Assessment

      This important work establishes a connection between PRMT1 and SFPQ by identifying common phenotypes downstream of their inactivation. In the resubmission, authors now include NMD as a contributor to aberrant gene expression underpinning craniofacial development. The complementary experiments help strengthen some solid conclusions. This paper describes an interesting mechanism for the regulation of RNA levels, which is of interest to the readers of eLife.

    2. Reviewer #1 (Public review):

      The current manuscript investigates a regulatory axis containing Prmt1, which methylates RNA binding proteins and alters intron splicing outcomes and expression of matrix genes. Authors test the effects of deficient Prmt1, Sfpq, and various other factors, using a combination of bioinformatic analyses and wet-lab validation approaches. Authors show that intron retention often triggers NMD, contributing to aberrant gene expression regulation and craniofacial development. The revised manuscript introduces several complementary experiments that help to strengthen conclusions. For example, authors directly investigate NMD-mediated transcript turnover to better understand how retention contributes to expression changes in genes of interest, and they assess several additional factors downstream of Prmt1 to justify a centralized interested in the PRMT1/SFPQ axis.

      Weaknesses:

      However, some points remain unaddressed or unexplored, which could bolster conclusions. For example, the transcriptome data from knockdown experiments indicate robust exon skipping, suggesting that analysis of these patterns in parallel with intron retention could provide additional insights into the responsive gene programs. Given that SFPQ is known to have multiple regulatory roles, a more thorough investigation of its possible mechanisms of action during craniofacial development would allow for definitive conclusions about the isolated impact of SFPQ-dependent splicing. Although authors employ CUT&Tag analysis of Pol II binding at the promoters and across the gene body, at the current scope, no change in Pol II association (i.e., absence of transcriptional repression) does not directly indicate a lack of transcriptional regulation by other means (pause release, elongation rate or processivity, transcription termination, etc.). Without a more thorough investigation of these mechanisms, this confounds definitive claims about their relative contributions to the gene expression landscape.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by Lima et al examines the role of Prmt1 and SFPQ in craniofacial development. Specifically, the authors test the idea that Prmt1 directly methylates specific proteins that results in intron retention in matrix proteins. The protein SFPQ is methylated by Prmt1 and functions downstream to mediate Prmt1 activity. The genes with retained introns activate the NMD pathway to reduce the RNA levels. This paper describes an interesting mechanism for the regulation of RNA levels during development.

      Strengths:

      The phenotypes support what the authors claim that Prmt1 is involved in craniofacial development and splicing. They use of state of the art sequencing to determine the specific genes that have intron retention and changes in gene expression is a strength.

      Weaknesses:

      The results now support the conclusions;however, it is still unclear how direct the relationship is between Prmt1 and SFPQ.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 ( Public review):

      The strength of the current study lies in their establishing the molecular mechanism through which PRMT1 could alter craniofacial development through regulation of the transcriptome, but the data presented to support the claim that a PRMT1-SFPQ axis directly regulates intron retention of the relevant gene networks should be robust and with multiple forms of clear validation. For example, elevated intron retention findings are based on the intron retention index, and according to the manuscript, are assessed considering the relative expression of exons and introns from a given transcript. However, delineating between intron retention and other forms of alternative splicing (i.e., cryptic splice site recognition) requires a more comprehensive consideration of the intron splicing defects that could be represented in data. A certain threshold of intron read coverage (i.e., the percent of an intron that is covered by mapped reads) is needed to ascertain if those that are proximal to exons could represent alternative introns ends rather than full intron retention events. In other words, intron retention is a type of alternative splicing that can be difficult to analyze in isolation given the confounding influence of cryptic splicing and cryptic exon inclusion. If other forms of alternative splicing were assessed and not detected, more confident retention calls can be made.

      This manuscript is a mechanistic exploration that follows previous work we published on the role of Prmt1  in craniofacial development, in which genetic deletion of Prmt1 in CNCCs leads to cleft palate and mandibular hypoplasia (PMID: 29986157).

      As the reviewer pointed out, a certain threshold of intron read coverage is needed to assess intron retention events. We employed IRTools to assess the collective changes of intron retention between cell-states associated with certain biological function or pathway. IRTools incorporated considerations for intron read coverage by checking the evenness of read distribution in an intron. Specifically, every constitutive intronic regions (CIR) is divided into 10 equally sized bins and the proportion of reads that map to each bin is calculated. CIRs are then ranked according to their imbalance in bin-wise reads distribution, represented by the proportion of reads in its most populated bin. Those among top 1% are considered to contain potentially false IR events and excluded. We further addressed this question by developing another measure of intron retention, intron retention coefficient (IRC), which assesses IR events using the junction reads (Supplemental Figure-S8). Junction reads that straddle two exons are called exon-exon junction reads (spliced reads), and those that straddle an exon and a neighboring intron are called exon-intron junction reads (retained reads). The IRC of an intron is defined as the fraction of junction reads that are exon-intron junction reads: IRC = exon-intron read-count / (exon-exon read-count + exon-intron read-count), where exon-intron read-count = (5’ exon-intron read-count + 3’ exon-intron read-count) / 2. The IRC of a gene is defined as the exon-intron fraction of all junction reads overlapping or over the constitutive introns of this gene. In the calculation of the IRC, only exon-intron junction reads that cover the junction point and overlap both of each side for at least 8 bps were counted, and only exon-exon junction reads that jump over the relevant junction points and overlap each of the respective exons for at least 8 bps were counted. In this process, evenness of the proportion of exon-intron junction reads that are 5’ or 3’ exon-intron junction reads are taken into account. As shown in the Supplemental Figure S7A and S7B, IRC analysis generated consistent results with those obtained from using IRI (Figure 3A and 3I).

      In addition, as the reviewer pointed out, intron retention can be difficult to analyze in isolation. We followed the reviewer’s suggestion that “If other forms of alternative splicing were assessed and not detected, more confident retention calls can be made “ and analyzed other forms of alternative splicing for all ECM and GAG genes with significant IRI increase (genes highlighted in Figure-3A and 3I) using rMATS (Supplemental Figure-S9). Among these genes, only 5 genes (Cthcr1, Mmp23, Adamts10, Ccdc80 and Col25a1) showed statistically significant changes in skipped exon, 1 gene (Bmp7) showed significant changes in mutually exclusive exons, and none showed significant changes in alternative 5’ or 3’ splicing. SE and MXE changes detected were marginal (Supplemental figure S8), while the majority of matrix genes with significant intron retention didn’t exhibit other forms of alternative splicing, further supporting the confidence of intron retention calls.

      While data presented to support the PRMT1-SFPQ activation axis is quite compelling, that this is directly responsible for the elevated intron retention remains enigmatic. First, in characterizing their PRMT1 knockout model, it is unclear whether the elevated intron retention events directly correspond to downregulated genes.

      In the revised manuscript, we demonstrate IR-triggered NMD as a mechanism for transcript decay and downregulation of matrix genes. When IR-triggered NMD was blocked by chemical inhibitor NMDI14, the intron-retaining transcripts showed significant accumulation (new Figure-4). NMD is the RNA surveillance system to degrade aberrant RNAs. Intron retention-triggered NMD in cancer has both promotive and suppressive roles and NMD inhibitors has been tested for cancer therapy including immunotherapy. During embryonic development, the functional significance of NMD machinery is suggested by human genetic findings and mouse genetic models. NMD is driven by a protein complex composed of SMG and UPF proteins. Smg6, Upf1, Upf2 and Upf3a knockout mouse die at early embryonic stages (E5.5-E9.5), and Smg1 gene trap mutant mice die at E12.5 (PMID: 29272451). SMG9 mutation in human patients causes malformation in the face, hand, heart and brain (PMID: 27018474).

      We show that in CNCCs NMD functions both as a physiological mechanism and invoked by molecular insult. Blocking NMD in CNCCs caused significant accumulation of intron-retaining Adamts2, Alpl, Eln, Matn2, Loxl1 and Bgn transcripts, suggesting a basal role for NMD to degrade intron-retaining transcripts (Figure-4Ba-4Bf). We further demonstrated the accumulation of Adamts2 and Fbln5 using semi-quantitative PCR with the detection of a longer product from Adamts2 intron 19 and Fbln5 intron 7 (Figure-4Ca-4Ch). In CNCCs and ST2 cells, NMD is further invoked by Prmt1 and Sfpq deficiency. In Prmt1  deficient CNCCs, NMD blockage led to higher accumulation of intron-retaining Adamts2 and Alpl transcripts, suggesting that Prmt1 deficiency triggers NMD to reduce intron-containing transcripts (Figure-4Aa, 4Ab). In Sfpq-depleted ST2 cells, blocking NMD caused accumulation of intron-retaining transcripts Col4a2, St6galnac3 and Ptk7 (Figure-9B, 9C).

      Moreover, intron splicing is a well-documented node for gene regulation during embryogenesis and in other proliferation models, and craniofacial defects are known to be associated with 'spliceosomopathies'. However, reproduction of this phenotype does not suggest that the targets of interest are inherently splicing factors, and a more robust assessment is needed to determine the exact nature of alternative splicing in this system. Because there are several known splicing factors downstream of PRMT1 and presented in the supplemental data, the specific attribution of retention to SFPQ would be additionally served by separating its splicing footprint from that of other factors that are primed to cause alternative splicing.

      We have previously shown that a group of splicing factors depends on Prmt1  for arginine methylation, including SFPQ (PMID: 31451547). We tested additional splicing factors that are highly expressed in CNCCs and depends on PRMT1 for arginine methylation: SRSF1, EWSR1, TAF15, TRA2B and G3BP1 (Figure-5, 6 and 10). Among these factors, EWSR1 and TRA2B are both methylated in CNCCs and depend on PRMT1 for methylation (Fig. 5 and Supplemental Figure-S3B, S3C). We weren’t able to assess TAF15 methylation because of lack of efficient antibody for the PLA assay. We also demonstrated that their protein expression or subcellular localization was not altered by Prmt1 deletion in CNCCs, unlike SFPQ (Supplemental Figure-S4). To define their splicing footprint, we performed siRNA-mediated knockdown in ST2 cells, followed by RNA-seq and IRI analysis to define differentially regulated genes and introns, which revealed distinct biological pathways regulated by SFPQ, EWSR1, TRA2B and TAF15, but minimal roles of EWSR1, TRA2B and TAF15 on intron retention when compared to SFPQ (Fig. 10F-10S, Supplemental Figure S7A-S7F, Supplemental Tables S4-S6). ECM genes are significantly downregulated by all four splicing factors (Fig. 10F-10I), but EWSR1, TRA2B and TAF15 function through IR-independent mechanisms, such as exon skipping, as exemplified by Postn (Fig. 10J-10S).

      Clarifying the relationship between SFPQ and splicing regulation is important given that the observed splicing defects are incongruous with published data presented by Takeuchi et al., (2018) regarding SFPQ control of neuronal apoptosis in mice. In this system, SFPQ was more specifically attributed to the regulation of transcription elongation over long introns and its knockout did not result in significant splicing changes. Thus, to establish the specificity for the SFPQ in regulating these retention events, authors would need to show that the same phenotype is not achieved by mis-regulation of other splicing factors. That the authors chose SFPQ based on its binding profile is understandable but potentially confounding given its mechanism of action in transcription of long introns (Takeuchi 2018). Because mechanisms and rates of transcription can influence splicing and exon definition interactions, the role of SFPQ as a transcription elongation factor versus a splicing factor is inadequately disentangled by authors.

      To test whether SFPQ acts as a transcription elongation factor, we performed Pol II Cut&Tag in ST2 cells and demonstrated that depletion of SFPQ only caused marginal changes in either the promoter region or gene body of ECM genes, suggesting that the role of SFPQ as a transcriptional activator or elongation factor is minimal (Fig. 7G, 7H). This finding is distinct from SFPQ function in neurons (PMID: 29719248), suggesting that the activation or recruitment of SFPQ in transcriptional regulation may involve tissue-specific factors in neurons.

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Lima et al examines the role of Prmt1 and SFPQ in craniofacial development. Specifically, the authors test the idea that Prmt1 directly methylates specific proteins that results in intron retention in matrix proteins. The protein SFPQ is methylated by Prmt1 and functions downstream to mediate Prmt1 activity. The genes with retained introns activate the NMD pathway to reduce the RNA levels. This paper describes an interesting mechanism for the regulation of RNA levels during development.

      Strengths:

      The phenotypes support what the authors claim that Prmt1 is involved in craniofacial development and splicing. The use of state-of-the-art sequencing to determine the specific genes that have intron retention and changes in gene expression is a strength.

      Weaknesses:

      Some of the data seems to contradict the conclusions. And it is unclear how direct the relationships are between Prmt1 and SFPQ.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      First, the claims regarding the effect of PRMT1 loss on splicing are unclear by the section title. In other words, does loss PRMT1 change the incidence of baseline alternative splicing events, or does it introduce new retention events that are responsible for underwriting the craniofacial phenotype? Consistent with this idea, the narrative could benefit from more cellular and/or histological validations of the transcriptomic defects discovered in the RNAseq, which could help contextualize the bioinformatics data with the developmental defects. Moreover, the conclusions drawn about intron retention could be clarified in terms of how applicable the mechanism is likely to be outside of this tissue-specific set of responsive introns.

      Loss of Prmt1 did not cause a global shift in intron retention, as shown in Supplemental Figure S2. Instead, Prmt1 deletion caused increase of intron retention specifically in genes enriched in cartilage development, glycosaminoglycan biology, dendrite and axon, and decreased intron retention in mitochondria and metabolism genes (Table. S1). We also tested matrix protein expression by histology to confirm that transcriptomic defects revealed at the RNA level resulted in lower protein production. The new data are in Figure 3E-3H.

      Additionally, invoking NMD to align splicing and differential gene expression data understandable but lacking sufficient controls to be conclusive, such as positive control genes to confirm inhibition of NMD.

      To validate the blockage of NMD, glutathione peroxidase 1 (Gpx1) intron 1, a well-documented substrate for NMD, is tested as positive control (Fig 4Ac, 4Ad, 9B).

      Additionally, it should be clarified whether NMD is a basal mechanism for the regulation of these introns or whether it is an induced mechanism that is invoked by the molecular insult.

      In CNCCs, NMD functions both as a physiological mechanism and invoked by molecular insult. Please refer to responses to Reviewer 1’s public review for detailed explanations.

      Further, authors present data downstream of two siRNAs for the same gene target, but it remains unclear how siRNAs for the same gene target produce different effects. It may be helpful for authors to clarify how many of the transcriptomic defects are shared versus unique between the siRNAs.

      To address this question, we used bioinformatic analysis of the whole genome data to the similarity in changes caused by the two SFPQ-targeting siRNAs. As shown in the new Fig. 7Ba & 7Bb,  transcriptomic and intron changes are consistent between the two siRNAs, suggesting that genes targeted by the two siRNA predominantly overlap. This overlap is illustrated by scatter plot analysis of RNAseq DEG and IRI data from each siRNA against SFPQ.

      Finally, we stress the importance of presenting the full conceptual basis for SFPQ's potential role in splicing and gene expression. It is significant to note that SFPQ has been previously studied as a splicing factor and was instead determined to function in support of the transcription elongation rather than in splicing. Thus, if authors are confident that the SFPQ manifests directly in splicing changes they encumber the burden of proof to show that its role in transcription, nor another splicing factor, are driving splicing changes.

      We demonstrated that depletion of SFPQ only caused marginal changes in either the promoter region or gene body of ECM genes, suggesting that the role of SFPQ as a transcriptional activator or elongation factor is minimal (Fig. 7G, 7H). Please refer to responses to Reviewer 1’s public review for detailed explanations.

      Reviewer #2 (Recommendations for the authors):

      (1) It is not clear why the authors focused on intron retention targets vs the other possibilities. Skipped Exon is much higher in terms of the number of changes, please clarify. For the intron retention how is this quantified? The traces are nice, but it is hard to tell which part is retained at this magnification. Also, because the focus is on extracellular matrix (ECM) and NMD it would be nice to show some of those targets here. In the tbx1 trace, some are up and some are down. What does that mean for the gene expression?

      We have investigated SE initially and found that genes with significant changes in Prmt1 CKO CNCCs fall into diverse functional pathways. Among them, a few genes are critical for skeletal formation, including Postn and Fn, and the function of their exon skipping has been documented. For example, the two exons that are skipped in Postn, Exon17 and 21, have been shown to regulate craniofacial skeleton shape and mandibular condyle hypertrophic zone thickness using transgenic mouse models (PMID: 36859617). As illustrated by Figure 10, the skipped exon of Postn is regulated by multiple splicing factors that may perform overlapping functions in vivo.

      Intron retention of each gene is quantified by the ratio of the overall read density of its constitutive intronic regions (CIRs) to the overall read density of its constitutive exonic regions (CERs) and defined as the intron retention index (IRI). In the first section of Response to Reviewer 1’s comments, we explained additional bioinformatic analysis that was performed to address reviewers’ questions, support the confidence of intron event calls and rule out the possibility of other alternative splicing mechanisms, such as by SE, MXE, A5SS or A3SS (Supplemental Figure S5, S6, Table S7).

      (2) RNA-Sequencing of Prmt1 mutants nicely shows gene expression changes, including in ECM and GAG genes. While validation of the sequencing results is not necessarily required, it would be very interesting to show the expression in situ. In addition, the heat map shows both downregulated but also upregulated transcripts. This is expected since this protein regulates many genes. However, the volcano plot shows a significant number of genes upregulated. It would be interesting to show what the upregulated genes are. And what is the proposed mechanism for Prmt1 regulation of upregulated genes?

      Validation for the transcriptomic changes is shown in Fig. 3E-3H using immunostaining.

      As for upregulated genes in Prmt1 mutant, top pathways include cytokine-mediated signaling pathway, signal transduction by p53 signaling pathway and cell morphogenesis (Figure 2E), which are consistent with our previous reports that Prmt1 deletion induces cytokine production in oral epithelium and leads to p53 accumulation in embryonic epicardium (PMID: 32521264, 29420098). Besides these pathways, Prmt1 deletion also caused upregulation of genes involved in adult behavior, postsynaptic organization and apoptotic process, which is consistent with findings from other labs on PRMT1 function in neuronal and cancer cells (PMID: 34619150, 33127433).

      (3) Specific transcripts were shown to have elevated intron retention involved in the ECM and GAG pathway. However in Figure 3D it seems to show the opposite with intronic expression decreased and exonic increases and intronic decrease. This is very important to the final conclusion of the paper. In addition, is there a direct relationship between increased intron and downregulation of this specific gene expression? It seems a bit correlational as it could also be an indirect mechanism. One way to test this is to do in vitro translation with and without the specific intron to test if it results in lower expression.

      We apologize for the mis-labeling in previous version of Figure 3D, which is now corrected. We also tried to test the direct relationship between intron and downregulation of matrix genes such as Adamts2 using in vitro experiments, however, the introns of matrix genes with high retention tends to be long, many 10 to 50kb in length, making it challenging to generate mini-gene constructs for molecular analysis. We used a different approach and demonstrated that inhibition of NMD with a chemical inhibitor NMDI14 caused dramatic accumulation of the Adamts2, Alpl, Eln, Matn2, Loxl1 and Bgn transcripts, suggesting that retained introns triggered NMD to regulate gene expression and this mechanism acts as a physiological level in CNCCs (Fig. 4). We also blocked NMD in control and Prmt1 null CNCCs, where NMD blockage led to higher accumulation of Adamts2 and Alpl transcripts, suggesting that upon Prmt1 deficiency, NMD is further utilized to degrade intron-containing transcripts (Fig. 4). Similarly, in Sfpq-depleted ST2 cells, blocking NMD caused accumulation of intron-retaining transcripts Col4a2, St6galnac3 and Ptk7 (Fig. 9A, 9B).

      (4) While Figure 4 nicely shows the methylation of SFPQ is reduced in Prmt1 CKO cells, it is unclear which reside this methylation occurs. Also the overall expression of SFPQ is also down so it is possible that the methylation is indirect ie Prmt1 regulates some other methyltransferase that regulates SFPQ. Or that because the overall level of SFPQ is down, there is no protein to methylate. How do the authors differentiate between these possibilities?

      Previously, arginine methylation of SFPQ has been characterized using in vitro reaction and cell lines with biochemical assays by Snijders., et al in 2015 (PMID: 25605962). Among all PRMTs that catalyze asymmetric arginine dimethylation (ADMA), SFPQ is methylated by only PRMT1 and PRMT3, with PRMT1 showing higher efficiency while PRMT3 showing a lower efficiency. However, PRMT3 is mainly cytosolic. Its expression in CNCCs is about 100-fold lower than PRMT1 (Fig. 1). Based on these knowledges, PRMT1 is the primary arginine methyltransferase for SFPQ, a nuclear protein in CNCCs. We and others have shown in a previous publication that SFPQ methylation on arginine 7 and 9 depends on PRMT1 (PMID: 31451547).

      To investigate SFPQ protein degradation in CNCCs, we used MG132 to block proteasomal degradation and observed a partial rescue of SFPQ protein degradation in Prmt1 mutant embryos, suggesting that SFPQ is degraded through proteasomal-mediated mechanism. To address the relationship between SFPQ methylation and protein expression, we assessed arginine methylation of SFPQ that accumulated after MG132 treatment. The accumulated SFPQ was not methylated, confirming the absence of methylation even when SFPQ protein expression is restored.

      Snijders., et al, also shown that citrullination induced by PADI4 regulate SFPQ stability (Snijders 2015). We considered this possibility and assessed the expression levels of PADIs. In E13.5 and E15.5 CNCCs, PADI1-4 mRNA expression levels are very low (TPM<5), suggesting that PADIs may not regulate SFPQ stability in CNCCs. A detailed mechanism as to how PRMT1-mediated SFPQ methylation controls stability awaits further investigation.

      (5) For the Sfpq deleted experiment, it seems that the two knockdowns are not similar in the gene targets and GO terms different except Wnt signaling. This makes this data difficult to interpret. The genes identified as intron retention are different than the ones identified in Prmt1 deletion and not reduced as much. How does this fit in with the Prmt1 story? If working through Sfpq, it assumes that the targets will be similar and more the 8% would be in common.

      To address the first concern, we used bioinformatic analysis of the whole genome data to the similarity in changes caused by the two SFPQ-targeting siRNAs. As shown in the new Fig. 7Ba & 7Bb,  transcriptomic and intron changes are consistent between the two siRNAs, suggesting that genes targeted by the two siRNA predominantly overlap. This overlap is illustrated by scatter plot analysis of RNAseq DEG and IRI data from each siRNA against SFPQ.

      We have previously identified a group of splicing factors that depends on PRMT1 for arginine methylation, including SFPQ (PMID: 31451547). In the new data in Figures 5, 6 and 10, we tested an additional five PRMT1-dependent splicing factors that are highly expressed in CNCCs: SRSF1, EWSR1, TAF15, TRA2B and G3BP1 (Fig. 5, 6 and 10). Among these factors, SRSF1 and G3BP1 are predominantly expressed in the cytosol of NCCs at E13.5. As splicing activity in the nucleus is needed for pre-mRNA splicing, we excluded these two and focused on the other three proteins. EWSR1 and TRA2B are both methylated in CNCCs and depend on PRMT1 for methylation (Fig. 5). We weren’t able to assess TAF15 methylation because of lack of efficient antibody for the PLA assay. We also demonstrated that their protein expression or subcellular localization was not altered by Prmt1 deletion in CNCCs, unlike SFPQ (Fig. S2). To define their splicing footprint, we performed siRNA-mediated knockdown in ST2 cells, followed by RNA-seq and IRI analysis to define differentially regulated genes and introns, which revealed distinct biological pathways regulated by SFPQ, EWSR1, TRA2B and TAF15, but minimal roles of EWSR1, TRA2B and TAF15 on intron retention when compared to SFPQ (Fig. 10F-10I, Supplemental Figure S7A-S7F). ECM genes are significantly downregulated by all four splicing factors (Fig. 10J-10M), but EWSR1, TRA2B and TAF15 regulate transcription or exon skipping instead of IR, as exemplified by Alpl and Postn (Fig. 10N-10T).

      (6) The addition of an NMD mechanism is interesting but not surprising that when inhibiting the pathway broadly, there is an increase in gene expression in the mesoderm cell line. How specific is this to craniofacial development?

      NMD is driven by a protein complex composed of SMG and UPF proteins. We show in the revised manuscript that NMD is both a physiological mechanism in CNCCs and triggered by genetic disturbance (Fig. 4). These data are in line with human patient reports where SMG9 mutation in human causes malformation in the face, hand, heart and brain (PMID: 27018474). Mouse genetic studies also demonstrated roles of NMD components during embryonic development.Smg6, Upf1, Upf2 and Upf3a knockout mouse die at early embryonic stages (E5.5-E9.5), and Smg1 gene trap mutant mice die at E12.5 (Han 2018). Additionally, intron retention-triggered NMD in cancer has both promotive and suppressive roles and NMD inhibitors has been tested for cancer therapy and recently cancer immunotherapy. Our findings highlight matrix genes as one of the key targets for NMD during craniofacial development.

      Minor:

      (1) The supplemental figures are difficult to understand. In the first upload there are many figures and tables, some excel files that are separate uploads and some not. Please upload as separate files so it is clear. And also put them in order that they are in the manuscript.

      (2) For the heat map in figure 2B, it would be good to show all the genes or none at all. It seems a bit like cherry-picking to highly only a few. And they are not labeled where they are located in the graph. Are these the top lines if so please label.

      (3) Gene names in Figure 3A are difficult to read. I would also not consider BMP7 an ECM gene.

      (4) A summary diagram of the interactions proposed will help to make this more understandable.

      The supplemental figures are reorganized and uploaded as separate word and excel documents. For Heat map in Fig. 2B, we have removed the gene names. For Fig. 3A, only the most significantly changed gene are labeled in red dots with names. We didn’t label all the genes because of the large number of genes. For the new Figure 3B, we have replaced BMP7. A schematic summary is also added to Supplemental Fig. S9 to illustrate the PRMT1-SFPQ pathway.

    1. eLife Assessment

      This fundamental work reveals that the accessibility of the unstructured C-terminal tail of α-tubulin differs with the state of the microtubule lattice. Accessibility increases with the expansion of the lattice induced by GTP and certain MAPs, which can then dictate the subsequent interactions between MAPs and microtubules, and post-translational modifications of tubulin tails. The evidence supporting the conclusion is compelling, although the characterisation of the probes does not answer whether they directly affect the lattice or expose the C-terminal tail of α-tubulin. The probes can be used as tools in the future to study differences in microtubule lattice regulation under different conditions both in vitro and in vivo. This work will be of great interest to the cytoskeleton field.

    2. Reviewer #1 (Public review):

      Summary:

      This is a careful and comprehensive study demonstrating that effector-dependent conformational switching of the MT lattice from compacted to expanded deploys the alpha tubulin C-terminal tails so as to enhance their ability to bind interactors.

      Strengths:

      The authors use 3 different sensors for the exposure of the alpha CTTs. They show that all 3 sensors report exposure of the alpha CTTs when the lattice is expanded by GMPCPP, or KIF1C, or a hydrolysis-deficient tubulin. They demonstrate that expansion-dependent exposure of the alpha CTTs works in tissue culture cells as well as in vitro.

      Appraisal:

      The authors have gone to considerable lengths to test their hypothesis that microtubule expansion favours deployment of the alpha tubulin C-terminal tail, allowing its interactors, including detyrosinase enzymes, to bind. There is a real prospect that this will change thinking in the field. One very interesting possibility, touched on by the authors, is that the requirement for MAP7 to engage kinesin with the MT might include a direct effect of MAP7 on lattice expansion.

      Impact:

      The possibility that the interactions of MAPS and motors with a particular MT or region feed forward to determine its future interaction patterns is made much more real. Genuinely exciting.

    3. Reviewer #2 (Public review):

      The unstructured α- and β-tubulin C-terminal tails (CTTs), which differ between tubulin isoforms, extend from the surface of the microtubule, are post-translationally modified, and help regulate the function of MAPs and motors. Their dynamics and extent of interactions with the microtubule lattice are not well understood. Hotta et al. explore this using a set of three distinct probes that bind to the CTTs of tyrosinated (native) α-tubulin. Under normal cellular conditions, these probes associate with microtubules only to a limited extent, but this binding can be enhanced by various manipulations thought to alter the tubulin lattice conformation (expanded or compact). These include small-molecule treatment (Taxol), changes in nucleotide state, and the binding of microtubule-associated proteins and motors. Overall, the authors conclude that microtubule lattice "expanders" promote probe binding, suggesting that the CTT is generally more accessible under these conditions. Consistent with this, detyrosination is enhanced. Mechanistically, molecular dynamics simulations indicate that the CTT may interact with the microtubule lattice at several sites, and that these interactions are affected by the tubulin nucleotide state.

      Strengths and weaknesses:

      Key strengths of the work include the use of three distinct probes that yield broadly consistent findings, and a wide variety of experimental manipulations (drugs, motors, MAPs) that collectively support the authors' conclusions, alongside a careful quantitative approach.

      The challenges of studying the dynamics of a short, intrinsically disordered protein region within the complex environment of the cellular microtubule lattice, amid numerous other binders and regulators, should not be understated. While it is very plausible that the probes report on CTT accessibility as proposed, the possibility of confounding factors (e.g., effects on MAP or motor binding) cannot be ruled out. Sensitivity to the expression level clearly introduces additional complications. Likewise, for each individual "expander" or "compactor" manipulation, one must consider indirect consequences (e.g., masking of binding sites) in addition to direct effects on the lattice; however, this risk is mitigated by the collective observations all pointing in the same direction.

      The discussion does a good job of placing the findings in context and acknowledging relevant caveats and limitations. Overall, this study introduces an interesting and provocative concept, well supported by experimental data, and provides a strong foundation for future work. This will be a valuable contribution to the field.

    4. Reviewer #3 (Public review):

      Summary:

      In this study, the authors investigate how the structural state of the microtubule lattice influences the accessibility of the α-tubulin C-terminal tail (CTT). By developing and applying new biosensors, they reveal that the tyrosinated CTT is largely inaccessible under normal conditions but becomes more accessible upon changes to the tubulin conformational state induced by taxol treatment, MAP expression, or GTP-hydrolysis-deficient tubulin. The combination of live imaging, biochemical assays, and simulations suggests that the lattice conformation regulates the exposure of the CTT, providing a potential mechanism for modulating interactions with microtubule-associated proteins. The work addresses a highly topical question in the microtubule field and proposes a new conceptual link between lattice spacing and tail accessibility for tubulin post-translational modification. Future work is required to distinguish CTT exposure in the microtubule lattice is sensitive to additional factors present in vivo but not in vitro.

      Strengths:

      (1) The study targets a highly relevant and emerging topic-the structural plasticity of the microtubule lattice and its regulatory implications.

      (2) The biosensor design represents a methodological advance, enabling direct visualization of CTT accessibility in living cells.

      (3) Integration of imaging, biochemical assays, and simulations provides a multi-scale perspective on lattice regulation.

      (4) The conceptual framework proposed lattice conformation as a determinant of post-translational modification accessibility is novel and potentially impactful for understanding microtubule regulation.

      [Editors' note: the authors have responded to the reviewers and this version was assessed by the editors.]

    5. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This is a careful and comprehensive study demonstrating that effector-dependent conformational switching of the MT lattice from compacted to expanded deploys the alpha tubulin C-terminal tails so as to enhance their ability to bind interactors.

      Strengths:

      The authors use 3 different sensors for the exposure of the alpha CTTs. They show that all 3 sensors report exposure of the alpha CTTs when the lattice is expanded by GMPCPP, or KIF1C, or a hydrolysis-deficient tubulin. They demonstrate that expansion-dependent exposure of the alpha CTTs works in tissue culture cells as well as in vitro.

      Weaknesses:

      There is no information on the status of the beta tubulin CTTs. The study is done with mixed isotype microtubules, both in cells and in vitro. It remains unclear whether all the alpha tubulins in a mixed isotype microtubule lattice behave equivalently, or whether the effect is tubulin isotype-dependent. It remains unclear whether local binding of effectors can locally expand the lattice and locally expose the alpha CTTs.

      Appraisal:

      The authors have gone to considerable lengths to test their hypothesis that microtubule expansion favours deployment of the alpha tubulin C-terminal tail, allowing its interactors, including detyrosinase enzymes, to bind. There is a real prospect that this will change thinking in the field. One very interesting possibility, touched on by the authors, is that the requirement for MAP7 to engage kinesin with the MT might include a direct effect of MAP7 on lattice expansion.

      Impact:

      The possibility that the interactions of MAPS and motors with a particular MT or region feed forward to determine its future interaction patterns is made much more real. Genuinely exciting.

      We thank the reviewer for their positive response to our work. We agree that it will be important to determine if the bCTT is subject to regulation similar to the aCTT. However, this will first require the development of sensors that report on the accessibility of the bCTT, which is a significant undertaking for future work.

      We also agree that it will be important to examine whether all tubulin isotypes behave equivalently in terms of exposure of the aCTT in response to conformational switching of the microtubule lattice.

      We thank the reviewer for the comment about local expansion of the microtubule lattice. We believe that Figure 3 does show that local binding of effectors can locally expand the lattice and locally expose the alpha-CTTs. We have added text to clarify this.

      Reviewer #2 (Public review):

      The unstructured α- and β-tubulin C-terminal tails (CTTs), which differ between tubulin isoforms, extend from the surface of the microtubule, are post-translationally modified, and help regulate the function of MAPs and motors. Their dynamics and extent of interactions with the microtubule lattice are not well understood. Hotta et al. explore this using a set of three distinct probes that bind to the CTTs of tyrosinated (native) α-tubulin. Under normal cellular conditions, these probes associate with microtubules only to a limited extent, but this binding can be enhanced by various manipulations thought to alter the tubulin lattice conformation (expanded or compact). These include small-molecule treatment (Taxol), changes in nucleotide state, and the binding of microtubule-associated proteins and motors. Overall, the authors conclude that microtubule lattice "expanders" promote probe binding, suggesting that the CTT is generally more accessible under these conditions. Consistent with this, detyrosination is enhanced. Mechanistically, molecular dynamics simulations indicate that the CTT may interact with the microtubule lattice at several sites, and that these interactions are affected by the tubulin nucleotide state.

      Strengths:

      Key strengths of the work include the use of three distinct probes that yield broadly consistent findings, and a wide variety of experimental manipulations (drugs, motors, MAPs) that collectively support the authors' conclusions, alongside a careful quantitative approach.

      Weaknesses:

      The challenges of studying the dynamics of a short, intrinsically disordered protein region within the complex environment of the cellular microtubule lattice, amid numerous other binders and regulators, should not be understated. While it is very plausible that the probes report on CTT accessibility as proposed, the possibility of confounding factors (e.g., effects on MAP or motor binding) cannot be ruled out. Sensitivity to the expression level clearly introduces additional complications. Likewise, for each individual "expander" or "compactor" manipulation, one must consider indirect consequences (e.g., masking of binding sites) in addition to direct effects on the lattice; however, this risk is mitigated by the collective observations all pointing in the same direction.

      The discussion does a good job of placing the findings in context and acknowledging relevant caveats and limitations. Overall, this study introduces an interesting and provocative concept, well supported by experimental data, and provides a strong foundation for future work. This will be a valuable contribution to the field.

      We thank the reviewer for their positive response to our work. We are encouraged that the reviewer feels that the Discussion section does a good job of putting the findings, challenges, and possibility of confounding factors and indirect effects in context. 

      Reviewer #3 (Public review):

      Summary:

      In this study, the authors investigate how the structural state of the microtubule lattice influences the accessibility of the α-tubulin C-terminal tail (CTT). By developing and applying new biosensors, they reveal that the tyrosinated CTT is largely inaccessible under normal conditions but becomes more accessible upon changes to the tubulin conformational state induced by taxol treatment, MAP expression, or GTP-hydrolysis-deficient tubulin. The combination of live imaging, biochemical assays, and simulations suggests that the lattice conformation regulates the exposure of the CTT, providing a potential mechanism for modulating interactions with microtubule-associated proteins. The work addresses a highly topical question in the microtubule field and proposes a new conceptual link between lattice spacing and tail accessibility for tubulin post-translational modification.

      Strengths:

      (1) The study targets a highly relevant and emerging topic-the structural plasticity of the microtubule lattice and its regulatory implications.

      (2) The biosensor design represents a methodological advance, enabling direct visualization of CTT accessibility in living cells.

      (3) Integration of imaging, biochemical assays, and simulations provides a multi-scale perspective on lattice regulation.

      (4) The conceptual framework proposed lattice conformation as a determinant of post-translational modification accessibility is novel and potentially impactful for understanding microtubule regulation.

      Weaknesses:

      There are a number of weaknesses in the paper, many of which can be addressed textually. Some of the supporting evidence is preliminary and would benefit from additional experimental validation and clearer presentation before the conclusions can be considered fully supported. In particular, the authors should directly test in vitro whether Taxol addition can induce lattice exchange (see comments below).

      We thank the reviewer for their positive response to our work. We have altered the text and provided additional experimental validation as requested (see below).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The resolution of the figures is insufficient.

      (2) The provision of scale bars is inconsistent and insufficient.

      (3) Figure 1E, the scale bar looks like an MT.

      (4) Figure 2C, what does the grey bar indicate?

      (5) Figure 2E, missing scale bar.

      (6) Figure 3 C, D, significance brackets misaligned.

      (7) Figure 3E, consider using the same alpha-beta tubulin / MT graphic as in Figure 1B.

      (8) Figure 5E, show cell boundaries for consistency?

      (9) Figure 6D, stray box above the y-axis.

      (11) Figure S3A, scale bar wrong unit again.

      (12) S3B "fixed" and mount missing scale bar in the inset.

      (13) S4 scale bars without scale, inconsistency in scale bars throughout all the figures.

      We apologize for issues with the figures. We have corrected all of the issues indicated by the reviewer.

      (10) Figure 6F, surprising that 300 mM KCL washes out rigor binding kinesin

      We thank the reviewer for this important point. To address the reviewer’s concern, we have added a new supplementary figure (new Figure 6 – Figure Supplement 1) which shows that the washing step removes strongly-bound (apo) KIF5C(1-560)-Halo<sup>554</sup> protein from the microtubules. In addition, we have made a correction to the Materials and Methods section noting that ATP was added in addition to the KCl in the wash buffer. We apologize for omitting this detail in the original submission. We also added text noting that the wash out step was based on Shima et al., 2018 where the observation chamber was washed with either 1 mM ATP and 300 mM K-Pipes or with 10 mM ATP and 500 mM K-Pipes buffer. In our case, the chamber was washed with 3 mM ATP and 300 mM KCl. It is likely that the addition of ATP facilitates the detachment of strongly-bound KIF5C.

      (14) Supplementary movie, please identify alpha and beta tubules for clarity. Please identify residues lighting up in interaction sites 1,2 & 3.

      Thank you for the suggestions. We have made the requested changes to the movie.

      Reviewer #2 (Recommendations for the authors):

      There appear to have been some minor issues (perhaps with .pdf conversion) that leave some text and images pixelated in the .pdf provided, alongside some slightly jarring text and image positioning (e.g., Figure 5E panels). The authors should carefully look at the figures to ensure that they are presented in the clearest way possible.

      We apologize for these issues with the figures. We have reviewed the figures carefully to ensure that they are presented in the clearest way possible.

      The authors might consider providing a more definitive structural description of compact vs expanded lattice, highlighting what specific parameters are generally thought to change and by what magnitude. Do these differ between taxol-mediated expansion or the effects of MAPs?

      Thank you for the suggestion. We have added additional information to the Introduction section.

      Reviewer #3 (Recommendations for the authors):

      (1) Figure 1 should include a schematic overview of all constructs used in the study. A clear illustration showing the probe design, including the origin and function of each component (e.g., tags, domains), would improve clarity.

      Thank you for the suggestion. We have added new illustrations to Figure 1 showing the origin and design (including domains and tags) of each probe.

      (2) Add Western blot data for the 4×CAP-Gly construct to Figure 1C for completeness.

      We thank the reviewer for this suggestion. We carried out a far-western blot using the purified 4xCAPGly-mEGFP protein to probe GST-Y, GST-DY, and GST-DC2 proteins (new Figure 1 – Figure Supplement 1C). We note that some bleed-through signal can be seen in the lanes containing GST-ΔY and GST-ΔC2 protein due to the imaging requirements and exposure needed to visualize the 4xCAPGly-mEGFP protein. Nevertheless, the blot shows that the purified CAPGly sensor specifically recognizes the native (tyrosinated) CTT sequence of TUBA1A.

      (3) Essential background information on the CAP-Gly domain, SXIP motif, and EB proteins is missing from the Introduction. These concepts appear abruptly in the Results and should be properly introduced.

      Thank you for the suggestion. We have added additional information to the Introduction section about the CAP-Gly domain. However, we feel that introducing the SXIP motif and EB proteins at this point would detract from the flow of the Introduction and we have elected to retain this information in the Results section when we detail development of the 4xCAPGly probe.

      (4) In Figure 2E, it remains possible that the CAP-Gly domain displacement simply follows the displacement of EB proteins. An experiment comparing EB protein localization upon Taxol treatment would clarify this relationship.

      We thank the reviewer for raising this important point. To address the reviewer’s concern, we utilized HeLa cells stably expressing EB3-GFP. We performed live-cell imaging before and after Taxol addition (new Figure 2 – Figure Supplement 1C). EB3-EGFP was lost from the microtubule plus ends within minutes and did not localize to the now-expanded lattice.

      (5) Statements such as "significantly increased" (e.g., line 195) should be replaced with quantitative information (e.g., "1.5-fold increase").

      We have made the suggested changes to the text.

      (6) Phrases like "became accessible" should be revised to "became more accessible," as the observed changes are relative, not absolute. The current wording implies a binary shift, whereas the data show a modest (~1.5-fold) increase.

      We have made the suggested changes to the text.

      (7) Similarly, at line 209, the terms "minimally accessible" versus "accessible" should be rephrased to reflect the small relative change observed; saturation of accessibility is not demonstrated.

      We have made the suggested changes to the text.

      (8) Statements that MAP7 "expands the lattice" (line 222) should be made cautiously; to my knowledge, that has not been clearly established in the literature.

      We thank the reviewer for this important comment. We have added text indicating that MAP7’s ability to induce or presence an expanded lattice has not been clearly established.

      (9) In Figures 3 and 4, the overexpression of MAP7 results in a strikingly peripheral microtubule network. Why is there this unusual morphology?

      The reviewer raises an interesting question. We are not sure why the overexpression of MAP7 results in a strikingly peripheral microtubule network but we suspect this is unique to the HeLa cells we are using. We have observed a more uniform MAP7 localization in other cell types [e.g. COS-7 cells (Tymanskyj et al. 2018), consistent with the literature [e.g. BEAS-2B cells (Shen and Ori-McKenney 2024), HeLa cells (Hooikaas et al. 2019)].

      (10) In Supplementary Figure 5C, the Western blot of detyrosination levels is inconsistent with the text. Untreated cells appear to have higher detyrosination than both wild-type and E254A-overexpressing cells. Do you have any explanation?

      We thank the reviewer for this important comment. We do not have an explanation at this point but plan to revisit this experiment. Unfortunately, the authors who carried out this work recently moved to a new institution and it will be several months before they are able to get the cell lines going and repeat the experiment. We thus elected to remove what was Supp Fig 5C until we can revisit the results. We believe that the important results are in what is now Figure 5 - Figure Supplement 1A,B which shows that the expression levels of the WT and E254E proteins are similar to each other.

      (11) The image analysis method in Figures 5B and 5D requires clarification. It appears that "density" was calculated from skeletonized probe length over total area, potentially using a strict intensity threshold. It looks like low-intensity binding has been excluded; otherwise, the density would be the same from the images. If so, this should be stated explicitly. A more appropriate analysis might skeletonize and integrate total fluorescence intensity relative to the overall microtubule network.

      We have added additional information to the Materials and Methods section to clarify the image analysis. We appreciate the reviewer’s valuable feedback and the suggestion to use the integrated total fluorescence intensity, which is a theoretically sound approach. While we agree that integrated intensity is a valid metric for specific applications, its appropriate use depends on two main preconditions:

      (1) Consistent microscopy image acquisition conditions.

      (2) Consistent probe expression levels across all cells and experiments.

      We successfully maintained consistent image acquisition conditions (e.g., exposure time) throughout the experiment. However, despite generating a stably-expressing sensor cell lines to minimize variation, there remains an inherent, biological variability in probe expression levels between individual cells. Integrated intensity is highly susceptible to this cell-to-cell variability. Relying on it would lead to a systematic error where differences in the total amount of expressed probe would be mistaken for differences in Y-aCTT accessibility.

      The density metric (skeletonized probe length / total cell area) was deliberately chosen as it serves as a geometric measure rather than an intensity-based normalization. The density metric quantifies the proportion of the microtubule network that is occupied by Y-aCTT-labeled structures, independent of fluorescence intensity. Thus, the density metric provides a more robust and interpretable measure of Y-aCTT accessibility under the variable expression conditions inherent to our experimental system. Therefore, we believe that this geometric approach represents the most appropriate analysis for our image dataset.

      (12) In Figure 5D, the fold-change data are difficult to interpret due to the compressed scale. Replotting is recommended. The text should also discuss the relative fold changes between E254A and Taxol conditions, Figure 2H.

      We appreciate the reviewer's insightful comment. We agree that the presence of significant outliers led to a compressed Y-axis scale in Figure 5D, obscuring the clear difference between the WT-tubulin and E254A-tubulin groups. As suggested, we have replotted Figure 5D using a broken Y-axis to effectively expand the relevant lower range of the data while still accurately representing all data points, including the outliers. We believe that the revised graph significantly enhances the clarity and interpretability of these results. For Figure 2, we have added the relative fold changes to the text as requested.

      (13) Figure 6. The authors should directly test in vitro whether Taxol addition can induce lattice exchange, for example, by adding Taxol to GDP-microtubules and monitoring probe binding. Including such an assay would provide critical mechanistic evidence and substantially strengthen the conclusions. I was waiting for this experiment since Figure 2.

      We thank the reviewer for this suggestion. As suggested, we generated GDP-MTs from HeLa tubulin and added it to two flow chambers. We then flowed in the YL1/2<sup>Fab</sup>-EGFP probe into the chambers in the presence of DMSO (vehicle control) or Taxol. Static images were taken and the fluorescence intensity of the probe on microtubules in each chamber was quantified. There was a slight but not statistically significant difference in probe binding between control and Taxol-treated GDP-MTs (Author response image 1). While disappointing, these results underscore our conclusion (Discussion section) that microtubule assembly in vitro may not produce a lattice state resembling that in cells, either due to differences in protofilament number and/or buffer conditions and/or the lack of MAPs during polymerization.

      Author response image 1.

      References

      Hooikaas, P. J., Martin, M., Muhlethaler, T., Kuijntjes, G. J., Peeters, C. A. E., Katrukha, E. A., Ferrari, L., Stucchi, R., Verhagen, D. G. F., van Riel, W. E., Grigoriev, I., Altelaar, A. F. M., Hoogenraad, C. C., Rudiger, S. G. D., Steinmetz, M. O., Kapitein, L. C. and Akhmanova, A. (2019). MAP7 family proteins regulate kinesin-1 recruitment and activation. J Cell Biol, 218, 1298-1318.

      Shen, Y. and Ori-McKenney, K. M. (2024). Microtubule-associated protein MAP7 promotes tubulin posttranslational modifications and cargo transport to enable osmotic adaptation. Dev Cell, 59, 1553-1570.

      Tymanskyj, S. R., Yang, B. H., Verhey, K. J. and Ma, L. (2018). MAP7 regulates axon morphogenesis by recruiting kinesin-1 to microtubules and modulating organelle transport. Elife, 7.

    1. eLife Assessment

      This important study provides one mechanism that can explain the rapid diversification of poison-antidote pairs in fission yeast: recombination between existing pairs. The evidence is largely solid, but the study needs to tune down its claims (as it is not shown that the novel poison-antidote can serve as a meiotic driver), and to address small experimental requests. The work is of interest to scientists studying genetic incompatibilities.

    2. Reviewer #1 (Public review):

      Summary

      The authors determine the phylogenetic relation of the roughly two dozen wtf elements of 21 S. pombe isolates and show that none of them in the original S. pombe are essential for robust mitotic growth. It would be interesting to test their meiotic function by simply crossing each deletion mutant with the parent and analyzing spores for non-Mendelian inheritance. If this has been reported already, that information should be added to the MS. If not, I suggest the authors do these simple experiments and add this information.

      Strengths:

      The most interesting data (Fig. 4) show that one recombinant (wtfC4) between wtf18 and wtf23 produces in mitotic growth a poison counteracted by its own antidote but not by the parental antidotes. Again, it would be interesting to test this recombinant in a more natural setting - meiosis between it and each of the parents.

      Weaknesses:

      Some minor rewriting is needed.

      Comments on Revision:

      (1) The parameter for "maximum growth rate" in Figure 2D needs to be defined and put on the graph.

      (2) On page 8, line 182, the authors should consider testing the hybrid wtf in meiosis using strain 975 of Leupold, which is h+, or another standard h+ strain. I don't think the antidote allele is needed; rather, it seems to me it would counter the lethality of the poison protein and should be omitted to test drive of the hybrid wtf. This is a simple experiment and would add considerably to the paper.

    3. Reviewer #3 (Public review):

      Summary:

      In this manuscript, Wang and colleagues explore factors contributing to the diversification of wtf meiotic drivers. wtf genes are autonomous, single-gene poison-antidote meiotic drivers that encode both a spore-killing poison (short isoform) and an antidote to the poison (long isoform) through alternative transcriptional initiation. There are dozens of wtf drivers present in the genomes of various yeast species, yet the evolutionary forces driving their diversification remain largely unknown. This manuscript is written in a straightforward and effective manner, and the analyses and experiments are easy to follow and interpret. While I find the research question interesting and the experiments persuasive, they do not provide any deeper mechanistic understanding of this gene family.

      Revision update:

      Having read the response to the reviewers, I believe the major issues have been addressed. However, I would strongly suggest toning down the claim regarding the chimeric WTF element in the abstract, which currently reads

      "As proof-of-principle, we generate a novel meiotic driver through artificial recombination between wtf drivers, and its encoded poison cannot be detoxified by the antidotes encoded by their parental wtf genes but can be detoxified by its own antidote."

      As the author reports in their response, despite various attempts, it was not possible to show that this chimeric WTF element was indeed capable of meiotic drive in a natural context (not transgenic overexpression experiment). thus the authors should not claim they generated "a novel meiotic driver"

      Strengths:

      (1) The authors present a comprehensive compendium and analysis of the evolutionary relationships among wtf genes across 21 strains of S. pombe

      (2) The authors found that a synthetic chimeric wtf gene, combining exons 1-5 of wtf23 and exon 6 of wtf18, behaves like a meiotic driver that could only be rescued by the chimeric antidote but neither of the parental antidotes. This is a very interesting observation that could account for their inception and diversification.

      Weaknesses:

      (1) Deletion strains

      The authors separately deleted all 25 Wtf genes in the S. pombe ference strain. Next, the authors performed spot assay to evaluate the effect of wtf gene knockout on the yeast growth. They report no difference to the WT and conclude that the wtf genes might be largely neutral to the fitness of their carriers in the asexual life cycle at least in normal growth condition.

      The authors could have conducted additional quantitative growth assays in yeast, such as growth curves or competition assays, which would have allowed them to detect subtle fitness effects that cannot be quantified with a spot assay. Furthermore, the authors do not rule out simpler explanations, such as genetic redundancy. This could have been addressed by crossing mutants of closely related paralogs or editing multiple wtf genes in the same genetic background.

      Another concern is the lack of detailed information about the 25 knockout strains used in the study. There is no information provided on how these strains were generated or, more importantly, validated. Many of these wtf genes have close paralogs and are flanked by repetitive regions, which could complicate the generation of such deletion strains. As currently presented, these results would be difficult to replicate in other labs due to insufficient methodological details

      Revision update:

      The authors measured the fitness of the deletion strains using growth curves (Fig. 2C and D) and no significant differences were found, further supporting their claims. The requested information (details on the generation of the deletion strains) is now available in the methods section.

      (2) Lack of controls

      The authors found that a synthetic chimeric wtf gene, constructed by combining exons 1-5 of wtf23 and exon 6 of wtf18, behaves as a meiotic driver that can be rescued only by its corresponding chimeric antidote, but not by either of the parental antidotes (Figure 4F). In contrast, three other chimeric wtf genes did not display this property (Figure 4C-E). No additional experiments were conducted to explain these differences, and basic control experiments, such as verifying the expression of the chimeric constructs, were not performed to rule out trivial explanations. This should be at the very least discussed. Also, it would have been better to test additional chimeras.

      Revision update:

      The authors report that the expression of the construct was measured. However, they do not make reference to any specific figure or section of the main text. It would be very useful if the authors explicitly referenced where exactly changes were made (this is true for all changed made)

      (3) Statistical analyses

      In line 130 the authors state that: "Given complex phylogenetic mixing observed among wtf genes (Figure 1E), we tested whether recombination occurred. We detected signals of recombination in the 25 wtf genes of the S. pombe reference genome (p = 0) and in the wtf genes of the 21 S. pombe strains (p = 0) using pairwise homoplasy index (HPI) test. "<br /> Reporting a p-value of 0 is not appropriate. Please report exact P-values.

      Revision update:

      This has been addressed.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary

      The authors determine the phylogenetic relation of the roughly two dozen wtf elements of 21 S. pombe isolates and show that none of them in the original S. pombe are essential for robust mitotic growth. It would be interesting to test their meiotic function by simply crossing each deletion mutant with the parent and analyzing spores for non-Mendelian inheritance. If this has been reported already, that information should be added to the manuscript. If not, I suggest the authors do these simple experiments and add this information.

      Thanks for the great summary! All the wtf genes have been tested for meiotic drive phenotypes previously by Bravo Nunez et al. (2020; http://doi.org/10.1371/journal.pgen.1008350). The reference was cited in our original manuscript, and we added the details in the revised manuscript.  

      Strengths:

      The most interesting data (Figure 4) show that one recombinant (wtfC4) between wtf18 and wtf23 produces in mitotic growth a poison counteracted by its own antidote but not by the parental antidotes. Again, it would be interesting to test this recombinant in a more natural setting - meiosis between it and each of the parents.

      Thanks for this insightful comment! As suggested, we have tried to test this recombinant in a more natural setting. We created a recombinant strain (wtfC4) based on the laboratory strain 972h-. Specifically, we replaced the last exon of the original wtf23 gene with the last exon of wtf18. However, we encountered a challenge: since strain 972h- has only one mating type and cannot undergo meiosis on its own, we had to mate the recombinant strain with a BN0 h⁺ strain that only carries the wtf23<sup>antidote</sup>. Unfortunately, despite of tens of attempts over nearly a year, we did not observe meiotic driver phenotype as expected. This might be due to issues with the proper splicing and expression of the potential poison and antidote proteins or due to the genetic background. Similarly, the drive activity of wtf13 has been shown to be specifically suppressed in certain backgrounds.

      Weaknesses:

      In the opinion of this reviewer, some minor rewriting is needed.

      We did the rewriting as this reviewer suggested.

      Reviewer #2 (Public review):

      Summary:

      This important study provides a mechanism that can explain the rapid diversification of poison-antidote pairs (wtf genes) in fission yeast: recombination between existing genes.

      Thanks!

      Strengths:

      The authors analyzed the diversity of wtf in S. pombe strains, and found pervasive copy number variations. They further detected signals of recurrent recombination in wtf genes. To address whether recombination can generate novel wtf genes, the authors performed artificial recombination between existing wft genes, and showed that indeed a new wtf can be generated: the poison cannot be detoxified by the antidotes encoded by parental wtf genes but can be detoxified by own antidote.

      Thanks for the great summary!

      Weaknesses:

      The study can benefit from demonstrating that the novel poison-antidote constructed by the authors can serve as a meiotic driver.

      Thanks for this insightful comment! As suggested, we have tried to test this recombinant in a more natural setting. We created a recombinant strain (wtfC4) based on the laboratory strain 972h-. Specifically, we replaced the last exon of the original wtf23 gene with the last exon of wtf18. However, we encountered a challenge: since strain 972h- has only one mating type and cannot undergo meiosis on its own, we had to mate the recombinant strain with a BN0 h⁺ strain that only carries the wtf23<sup>antidote</sup>. Unfortunately, despite of tens of attempts over nearly a year, we did not observe meiotic driver phenotype as expected. This might be due to issues with the proper splicing and expression of the potential poison and antidote proteins or due to the genetic background. Similarly, the drive activity of wtf13 has been shown to be specifically suppressed in certain backgrounds.

      Reviewer #3 (Public review):

      Summary:

      In this manuscript, Wang and colleagues explore factors contributing to the diversification of wtf meiotic drivers. wtf genes are autonomous, single-gene poison-antidote meiotic drivers that encode both a spore-killing poison (short isoform) and an antidote to the poison (long isoform) through alternative transcriptional initiation. There are dozens of wtf drivers present in the genomes of various yeast species, yet the evolutionary forces driving their diversification remain largely unknown. This manuscript is written in a straightforward and effective manner, and the analyses and experiments are easy to follow and interpret. While I find the research question interesting and the experiments persuasive, they do not provide any deeper mechanistic understanding of this gene family.

      Thanks! Please see the following for our point-to-point response.

      Strengths:

      (1) The authors present a comprehensive compendium and analysis of the evolutionary relationships among wtf genes across 21 strains of S. pombe.

      (2) The authors found that a synthetic chimeric wtf gene, combining exons 1-5 of wtf23 and exon 6 of wtf18, behaves like a meiotic driver that could only be rescued by the chimeric antidote but neither of the parental antidotes. This is a very interesting observation that could account for their inception and diversification.

      Thanks for the great summary!

      Weaknesses:

      (1) Deletion strains

      The authors separately deleted all 25 Wtf genes in the S. pombe ference strain. Next, the authors performed a spot assay to evaluate the effect of wtf gene knockout on the yeast growth. They report no difference to the WT and conclude that the wtf genes might be largely neutral to the fitness of their carriers in the asexual life cycle at least in normal growth conditions.

      The authors could have conducted additional quantitative growth assays in yeast, such as growth curves or competition assays, which would have allowed them to detect subtle fitness effects that cannot be quantified with a spot assay. Furthermore, the authors do not rule out simpler explanations, such as genetic redundancy. This could have been addressed by crossing mutants of closely related paralogs or editing multiple wtf genes in the same genetic background.

      Another concern is the lack of detailed information about the 25 knockout strains used in the study. There is no information provided on how these strains were generated or, more importantly, validated. Many of these wtf genes have close paralogs and are flanked by repetitive regions, which could complicate the generation of such deletion strains. As currently presented, these results would be difficult to replicate in other labs due to insufficient methodological details

      We generated growth curves for all the 25 wtf deletion strains. We provided the details for wtf gene knockout. However, for 25 wtf genes, there are too many combinations for editing two genes, and it is technically challenging to knock out multiple wtf together. Nevertheless, our results suggest single wtf genes have little effect on the host fitness under normal condition.

      (2) Lack of controls

      The authors found that a synthetic chimeric wtf gene, constructed by combining exons 1-5 of wtf23 and exon 6 of wtf18, behaves as a meiotic driver that can be rescued only by its corresponding chimeric antidote, but not by either of the parental antidotes (Figure 4F). In contrast, three other chimeric wtf genes did not display this property (Figure 4C-E). No additional experiments were conducted to explain these differences, and basic control experiments, such as verifying the expression of the chimeric constructs, were not performed to rule out trivial explanations. This should be at the very least discussed. Also, it would have been better to test additional chimeras.

      We verified the expression of the chimeric genes. The last exon of wtf18 is too small (128bp) to do more meaningful chimeras.

      (3) Statistical analyses

      In line 130 the authors state that: "Given complex phylogenetic mixing observed among wtf genes (Figure 1E), we tested whether recombination occurred. We detected signals of recombination in the 25 wtf genes of the S. pombe reference genome (p = 0) and in the wtf genes of the 21 S. pombe strains (p = 0) using pairwise homoplasy index (HPI) test. ". Reporting a p-value of 0 is not appropriate. Exact P-values should be reported. 

      Due to software limitations, the PHI test reports p-values of 0.0 for extremely significant results. We have therefore reported them as <0.0001 in the revised manuscript.

      Recommendations for the authors:

      Reviewing Editor Comments:

      Regarding the synthetic chimeric wtf gene constructed by combining exons of wtf23 and wtf18, the authors did not explicitly test whether it acts as a meiotic driver in the natural context of a cross. Instead, they examined this possibility only through transgenic overexpression experiments. Given that this is arguably the most important claim of the paper, it is critical that the authors perform, report, and discuss such an experiment in a natural context, regardless of the outcome. It is not necessary to test other recombinants or other wtf loci.

      Thanks for this insightful comment! As suggested, we have tried to test this recombinant in a more natural setting. We created a recombinant strain (wtfC4) based on the laboratory strain 972h-. Specifically, we replaced the last exon of the original wtf23 gene with the last exon of wtf18. However, we encountered a challenge: since strain 972h- has only one mating type and cannot undergo meiosis on its own, we had to mate the recombinant strain with a BN0 h⁺ strain that only carries the wtf23<sup>antidote</sup>. Unfortunately, despite of tens of attempts over nearly a year, we did not observe meiotic driver phenotype as expected. This might be due to issues with the proper splicing and expression of the potential poison and antidote proteins or due to the genetic background. Similarly, the drive activity of wtf13 has been shown to be specifically suppressed in certain backgrounds.

      Reviewer #1 (Recommendations for the authors):

      The paper is very well written, but some minor points should be corrected or checked.

      (1) Line 95: Why "Putative"? Is it not clear what a wtf pseudogene is?

      “Putative” was removed.

      (2) Line 105: Does "known functional" mean they are active (i.e., have been tested and shown to be active)? If so, a reference should be added.

      We used “known meiotic divers”, and added reference here.

      (3) Line 135: "no recombination signal was tested". Do the authors mean no signal was inferred? 

      We changed “tested” to “detected”.

      (4) Line 147: References for "known functional meiotic drivers (wtf23) and artificially generated meiotic driver (wtf18)" should be given. A statement of how wtf18 was "artificially generated" is essential so the reader knows how that element differs from the wtfC4 generated here.

      Reference for wtf23. As for wtf18, we have specified in the follow text, namely “we artificially introduced an in-frame ATG codon right before the start of exon 2, generating wtf18poison/-0M.”

      (5) Lines 154 and 424 say an ATG codon was introduced "right before the start of exon 2," but Figure 4B shows it before exon 1.

      We thank the reviewer. The introduced ATG is the second start codon in the long transcript and the first in the short transcript. The right panel of Figure 4B shows the short transcript, so the text and figure are consistent.

      (6) Line 159: The wtf18 mutant with this additional ATG codon should be tested in meiosis, to see if "putative" is correct.

      Thanks. As wtfC4, we came with technical challenges to show the driver phenotype in a natural setting, and thus removed this statement.

      (7) Line 181: change "driver" to "drive".

      Driver is correct.

      (8) Line 184: insert to read "wtf genes tested". Also, what is the basis for proposing that "the last exon might be crucial for antidote function"?

      “Tested” added, and removed the statement.

      (9) Line 198: change to read "detects only large differences".

      Done as suggested.

      (10) Line 204: change "removed" to "removal".

      Done as suggested.

      (11) Lines 242 and 243: Are "Splittree4" and "SplitsTree4" different, or is this a misprint?

      Corrected!

      (12) Lines 274-5 and 412 -3 would read better as "strains were diluted in five 10-fold steps and ... μL of each dilution spotted on .... to assay for ..."

      Done as suggested.

      (13) Line 284 says "No new data were generated." This is clearly wrong. Perhaps the authors mean there are no supplementary data files.

      Corrected!

      (14) Line 406: Change "is" to "are".

      Corrected!

      (15) Line 413: Surely, they were spotted onto YE agar medium, not liquid medium.

      Corrected!

      (16) Figure 3C: Define "Rho" and the scale used.

      The definition of Rho has been added to the Methods section in the revised manuscript.

      Reviewer #2 (Recommendations for the authors):

      The evidence is largely solid, but the study can benefit from demonstrating that the novel poison-antidote constructed by the authors can serve as a meiotic driver.

      As suggested, we have tried to test this recombinant in a more natural setting. We created a recombinant strain (wtfC4) based on the laboratory 972h-. Specifically, we replaced the last exon of the original wtf23 gene with the last exon of wt18f. However, we encountered a challenge: since 972h- is a mating-type strain and cannot undergo meiosis on its own, we had to mate the recombinant strain with a BN0 h⁺ strain that carries the wtf23<sup>antidote</sup>. Unfortunately, despite of tens of attempts over nearly a year, we did not observe meiotic driver phenotype as expected. This might be due to issues with the proper splicing and expression of the potential poison and antidote proteins.

      Reviewer #3 (Recommendations for the authors):

      I strongly recommend the authors provide all the details concerning the generation of the knock-out strains, including specific primers used (for both the deletion and validation), the result of these validations, and the specific genotype (and ID) of the strains generated.

      These details are now included in the Materials and Methods section and in Supplementary.

      Please also provide exact P-values (see point 3).

      Due to software limitations, the PHI test reports p-values of 0.0 for extremely significant results. We have therefore reported them as <0.0001 in the revised manuscript.

    1. eLife Assessment

      This fundamentally significant study provides the first genome-wide characterization of H3K115 acetylation and identifies a striking and previously unappreciated association of this globular-domain histone modification with fragile nucleosomes at CpG island promoters, active enhancers, and CTCF binding sites. While the work is largely descriptive and correlative in nature the evidence is compelling. The authors present multiple, orthogonal genomic and biochemical analyses that consistently support their central conclusions.

    2. Reviewer #1 (Public review):

      Summary

      The authors set out to define the genomic distribution and potential functional associations of acetylation of histone H3 lysine 115 (H3K115ac), a poorly characterized modification located in the globular domain of histone H3. Using native ChIP-seq and complementary genomic approaches in mouse embryonic stem cells and during differentiation to neural progenitor cells, they report that H3K115ac is enriched at CpG island promoters, active enhancers, and CTCF binding sites, where it preferentially localizes to regions containing fragile or subnucleosomal particles. These observations suggest that H3K115ac marks destabilized nucleosomes at key regulatory elements and may serve as an informative indicator of chromatin accessibility and regulatory activity.

      Strengths

      A major strength of this study is its focus on a histone post-translational modification in the globular domain, an area that has received far less attention than histone tail modifications despite strong evidence from structural and in vitro studies that such marks can directly influence nucleosome stability. The authors employ a wide range of complementary genomic analyses-including paired-end ChIP-seq, fragment size-resolved analyses, contour (V-) plots, and sucrose gradient fractionation-to consistently support the association of H3K115ac with fragile nucleosomes across promoters, enhancers, and architectural elements. The revised manuscript is careful in its interpretation and provides a coherent and internally consistent picture of how H3K115ac differs from other acetylation marks such as H3K27ac and H3K122ac. The datasets generated will be valuable to the chromatin community as a resource for further exploration of nucleosome dynamics at regulatory elements.

      Weaknesses

      The conclusions are largely correlative. While the authors provide strong evidence for the localization of H3K115ac to fragile nucleosomes, the work does not directly test whether this modification causally contributes to nucleosome destabilization or regulatory function in vivo. Questions regarding the enzymes responsible for depositing or removing H3K115ac and its direct functional consequences therefore remain open.

      Overall assessment and impact

      Overall, the authors largely achieve their stated aims by providing a detailed and well-supported characterization of H3K115ac distribution in mammalian chromatin and its association with fragile nucleosomes at regulatory elements. While mechanistic insight remains to be established, the study introduces a compelling new perspective on globular-domain histone acetylation and highlights H3K115ac as a potentially useful marker for identifying functionally important regulatory regions. The work is likely to stimulate further mechanistic studies and will be of broad interest to researchers studying chromatin structure, transcriptional regulation, and genome organization.

    3. Reviewer #2 (Public review):

      Summary:

      Kumar et al. aimed to assess the role of the understudied H3K115 acetylation mark, which is located in the nucleosomal core. To this end, the authors performed ChIP-seq experiments of H3K115ac in mouse embryonic stem cells as well as during differentiation into neuronal progenitor cells. Subsequent bioinformatic analyses revealed an association of H3K115ac with fragile nucleosomes at CpG island promoters, as well as with enhancers and CTCF binding sites. This is an interesting study, which provides important novel insights into the potential function of H3K115ac. However, the study is mainly descriptive, and functional experiments are missing.

      Strengths:

      (1) The authors present the first genome-wide profiling of H3K115ac and link this poorly characterized modification to fragile nucleosomes, CpG island promoters, enhancers, and CTCF binding sites.

      (2) The study provides a valuable descriptive resource and raises intriguing hypotheses about the role of H3K115ac in chromatin regulation.

      (3) The breadth of the bioinformatic analyses adds to the value of the dataset

      Comments on revisions:

      The authors sufficiently addressed my concerns.

    4. Reviewer #3 (Public review):

      Summary:

      Kumar et al. examine the H3K115 epigenetic mark located on the lateral surface of the histone core domain and present evidence that it may serve as a marker enriched at transcription start sites (TSSs) of active CpG island promoters and at polycomb-repressed promoters. They also note enrichment of the H3K115ac mark is found on fragile nucleosomes within nucleosome-depleted regions, on active enhancers and CTCF bound sites. They propose that these observations suggest that H3K115ac contributes to nucleosome destabilization and so may servers a marker of functionally important regulatory elements in mammalian genomes.

      Strengths:

      The authors present novel observations suggesting that acetylation of a histone residue in a core (versus on a histone tail) domain may serve a functional role in promoting transcription in CPG islands and polycomb-repressed promoters. They present a solid amount of confirmatory in silico data using appropriate methodology that supports the idea that H3K115ac mark may function to destabilize nucleosomes and contribute to regulating ESC differentiation. These findings are quite novel.

      Weaknesses:

      Additional experiments to confirm specificity of the antibodies used have been done, improving confidence in the study.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public reviews):

      (1) The absence of replicate paired-end datasets limits confidence in peak localization.

      The reviewer was under the impression that that we did not perform biological replicates of our ChIP-seq experiments. All ChIP-seq (and ATAC-seq) experiments were performed with biological replicates and the Pearson’s correlations (all >0.9) between replicates were provided in Supplementary Table 1. We had indicated this in the text and methods but will try to make this even clearer.

      (2) The analyses are primarily correlative, making it difficult to fully assess robustness or to support strong mechanistic conclusions.

      Histone modifications are difficult to alter genetically because of the high copy number of histone genes and inhibition of HATs/HDACs in general leads to alterations in other histone modifications. It is an inherent challenge in establishing causality of histone modifications, especially histone acetylation marks.

      (3) Some claims (e.g., specificity for CpG islands, "dynamic" regulation during differentiation) are not fully supported by the analyses as presented.

      We have modified the text in response to this point. The new text reads: “Non-CGI promoters have lower overall levels of transcription compared to CGI promoters, and for this promoter class H3K115ac enrichment detected by ChIP is only really seen for the highest quartile of transcription (4SU) quartile of expression (Figure 1G). CGI promoters on the other hand, exhibit significant levels of detected H3K115ac even for the lowest quartile of expression. These results suggest a special link between CGI promoters and H3K115ac”.

      (4) Overall, the study introduces an intriguing new angle on globular PTMs, but additional rigor and mechanistic evidence are needed to substantiate the conclusions.

      We agree that the paper does not provide mechanistic details or solid causality of H3K115ac. We have only emphasized the potential role of H3K115ac in nucleosome fragility based on our in vivo data and previously published in-vitro experiments (Manohar et.al., 2009, Chatterjee et. al., 2015). We do provide the evidence that H3K115ac is enriched on subnucleosomal particles via sucrose gradient sedimentation of MNase-digested chromatin (Figure 3C-D).

      Reviewer #2 (Public review):

      (1) I am not fully convinced about the specificity of the antibody. Although the experiment in Figure S1A shows a specific binding to H3K115ac-modified peptides compared to unmodified peptides, the authors do not show any experiment that shows that the antibody does not bind to unrelated proteins. Thus, a Western of a nuclear extract or the chromatin fraction would be critical to show. Also, peptide competition using the H3K115ac peptide to block the antibody may be good to further support the specificity of the antibody. Also, I don't understand the experiment in Figure S1B. What does it tell us when the H3K115ac histone mark itself is missing? The KLF4 promoter does not appear to be a suitable positive control, given that hundreds of proteins/histone modifications are likely present at this region. It is important to clearly demonstrate that the antibody exclusively recognizes H3K115ac, given that the conclusion of the manuscript strongly depends on the reliability of the obtained ChIP-Seq data.

      ChIP-qPCR in S1B includes competition from native chromatin and shows high specificity to its target. We have provided antibody validation in three ways:

      - Western blot with dot-blot of synthetic peptides (Figure S1A).

      - Western blots with Whole cell extracts (Figure 4D).

      - ChIP-qPCR on native chromatin spiked with a cocktail of synthetic mono-nucleosomes, each carrying a single acetylation and a specific barcode (SNAP-ChIP K-AcylStat Panel).

      We could not include H3K115ac marked nucleosomes as they are not available in the panel. Figure S1B shows that the H3K115ac antibody exhibits negligible binding to known K-acyl marks, comparable to an unmodified nucleosome. Because of the absence of a H3K115ac modified barcoded nucleosome, we used the KLF4 promoter from mESCs as a positive control, in agreement with ChIP-seq signal shown in the genome browser profile (Figure 1E), the KLF4 promoter shows a significantly higher signal than the gene body.

      (2) The association of H3K115ac with fragile nucleosomes is based on MNase-sensitivity and fragment length, which are indirect methods and can have technical bias. Experiments that support that the H3K115ac modified nucleosomes are indeed more fragile are missing.

      We have performed ChIP-seq on MNase digested mESC chromatin fractionated on sucrose gradients and this shows that H3K115ac is enriched in fractions containing sub-nucleosomal and fragile nucleosomes but depleted in fractions containing stable nucleosomes (Figure 3D).

      (3) The comparison of H3K115ac with H3K122ac and H3K64ac relies on publicly available datasets. Since the authors argue that these marks are distinct, data generated under identical experimental conditions would be more convincing. At a minimum, the limitations of using external datasets should be discussed.

      H3K64ac and H3K122ac datasets were generated by us in a previous publication (Pradeepa et. al., 2016) using same native MNase ChIP protocol as used here. The ChIP-seq datasets for H3K122ac and H3K27ac are processed in an identical manner, with the same computational pipelines, to the H3K115ac data sets generated in this paper.

      (4) The enrichment of H3K115ac at enhancers and CTCF binding sites is notable but remains descriptive. It would be interesting to clarify whether H3K115ac actively influences transcription factor/CTCF binding or is a downstream correlate.

      We agree with the reviewer’s comment, but we have not claimed causality.

      (5) No information is provided about how H3K115ac may be deposited/removed. Without this information, it is difficult to place this modification into established chromatin regulatory pathways.

      Due to broad target specificity, redundancies and crosstalk among different classes of HATs and HDACs, it is not tractable to answer this question in the current manuscript.

      Reviewer #3 (Public reviews):

      Reviewer 3 is mistaken in thinking our ChIP experiments are performed under cross-linked conditions. As clearly stated in the main text and methods, all our ChIP-seq for histone modifications is done on native MNase-digested chromatin – with no cross-linking. This includes the spike-in experiment shown in Fig S1B to test H3K115ac antibody specificity against the bar-coded SNAP-ChIP® K-AcylStat Panel from Epicypher. We could not include H3K115ac bar-coded nucleosomes in that experiment since they are not available in the panel.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) I have two primary concerns that resound through the entire paper:

      (a) Overall, the manuscript is making strong claims based on entirely correlative datasets. No quantitative analyses are performed to demonstrate co-occupancy/localization. Please see more detailed descriptions below.

      Our responses to specific points are provided against each comment below.

      (b) Lack of paired-end replicates for H3K115ac ChIP-seq. While the reviewer token for the deposited data was not made accessible to me, looking at Supplementary Table 1, it appears there are two H3K115ac ChIP-seq datasets. One is paired-end and is single-read. So are peaks called with only one replicate of PE? Or are inaccurate peaks called with SR datasets? Either way, this is not a rigorous way to evaluate H3K115ac localization.

      We are sorry that this reviewer was not able to access the data – the token for the GEO accession was provided for reviewers at the journal’s request. All ChIP-seq (and ATAC-seq) experiments (paired and single-end) were performed with two biological replicates and the Pearson’s correlations (all >0.9) between replicates were provided in Supplementary Table 1. This was indicated in both the main text and in the methods. In the revised manuscript we have tried to make this even clearer and have put the relevant Pearsons coefficient (r) into the text at the appropriate places. For the reviewer’s information, here is the complete list of data samples in the GEO Accession:

      Author response image 1.

      While I agree that H3K115ac occupancy is high at +CGIs, the authors downplay that H3K122ac and H3K27ac is also more highly enriched at these locations (page 7, last sentence of first paragraph). I imagine this is all due to the more highly transcribed nature of these genes. Sub-stratifying the K27ac and K122ac by transcription (as in Figure 1G) would help to demonstrate a unique nature of H3K115ac. But even better would be to do an analysis that plots H3K115ac enrichment vs transcription for every individual gene rather than aggregate analyses that are biased by single locations. For example, make an XY scatterplot of RNAPII occupancy or 4SU-seq signal vs H3K115ac level, where each point represents a single gene. Because the interpretation that it is CGI-based and not transcription is confounded with the fact that -CGI are more lowly transcribed. So, looking at Figure 1G, even the -CGI occupancy of H3K115ac is correlated with transcription, but it is just more lowly transcribed.

      We thank the reviewer for these suggestions but point out that Figure 1G shows H3K115ac signal for CGI+ and CGI– TSS that are matched for expressions levels (quartiles of 4SU-seq). Fig 1F shows that H3k115ac is much more of a discriminator between CGI+ and – than H3K27ac or H3K122ac.

      (2) H3K115ac, H3K27ac, and H3K122ac are all more enriched (in aggregate) at +CGI locations (Fig 1F); so do these locations just have more positioned nucleosomes? More H3.3? So that these PTMs are just more enriched due to the opportunity?

      Positioned nucleosomes are generally found downstream of the TSS of active CpG island promoters, so what the reviewer suggests may well account for the relative enrichment of H327ac and H3K122ac at CGI+ vs CGI- promoters in Fig.1F. But H3K115ac localisation is distinct, with the peak at the nucleosome-depleted region not the +1 nucleosome. This is also confirmed by the contour plots in Fig 3. Our observation is also not explained by an enrichment of H3.3 at CGI promoters, since we show that H3K115ac is not specific to H3.3 (Fig 4D).

      (3) The authors note in paragraph 2 of page 7 that "H3K115ac does not scale linearly with gene expression..." but the authors never show a quantification of this; stratification in four clusters is not able to make a linear correlation. Furthermore, in the second line of page 7, the authors state that the levels do generally correlate with transcription. To claim it is a specific CGI link and not transcription is tricky, but I encourage the authors to consider more quantifiable ways, rather than correlations, to demonstrate this point, if it is observed.

      We thank the reviewer for this comment, and taking it into consideration, we have decided to re-phrase this paragraph. The new text reads: “Non-CGI promoters have lower overall levels of transcription compared to CGI promoters, and for this promoter class H3K115ac enrichment detected by ChIP is only really seen for the highest quartile of transcription (4SU) quartile of expression (Figure 1G). CGI promoters on the other hand, exhibit significant levels of detected H3K115ac even for the lowest quartile of expression. These results suggest a special link between CGI promoters and H3K115ac”.

      (4) The authors claim on page 7 that "on average, transcription increased from TSS that also gained H3K115ac but to a modest extent, compared with the more substantial loss of H3K115ac from downregulated TSS". However, both upregulated and downregulated are significant; the difference in magnitude could simply be due to more highly or more lowly transcribed locations, meaning that fold change could be more robustly detected. I caution the authors to substantiate claims like this rather than stating a correlation.

      We thank the reviewer for this comment which relates to the data in Fig 2A. It is Fig. 2B shows that the association of H3K115ac loss with downregulation is statistically stronger than H3K115ac gain with upregulation, but only for CGI promoters. With regard to the text on the original pg 7 that is referred to, we have now reworded this to read “Average levels of transcription increased from TSS that also gained H3K115ac, and there was loss of H3K115ac from downregulated TSS (Figure 2A).”

      (5) For Figure 2C, the authors argue that H3K115ac correlate with bivalent locations. So this is all qualitative and aggregate localization; please quantitatively demonstrate this claim.

      Figure S2D provides statistics for this (observed/expected and Fishers exact test).

      (6) The authors claim in Figure 2 that H3115ac is dynamic during differentiation (title of Figure 2). However, there are locations that gain and lose, or maintain H3K115ac. In fact, the most discussed locations are H3K115ac with no change (2C); which means it is NOT dynamic during differentiation. So what is the message for the role during differentiation? From Supplemental Table 1, it appears there is a single ChIP experiment for H3K115ac in NPC, and it is a single read. So this is also a difficult claim with one replicate. Related to this, in S2A, the authors show K115ac where there is no change in transcription; so what is the role of H3K115ac at TSSs relevant to differentiation - it is at both locations changed and unchanged in transcription, but H3K115ac levels itself do not change at these subsets. So, how is this dynamic? This is very confusing, and clearer analyses and descriptions are necessary to deconvolute these data.

      We apologise for the misleading title for Figure 2. This has now been amended to “Changes in H3K115ac during differentiation”. The message of this figure is that whilst changes in H3K115ac at TSS are small (panels A-C), at enhancers the changes are much more dramatic (panel D). The reviewer is incorrect about the number of replicates for NPCs – there are two biological replicates (see response to point 1b).

      (7) The authors go on to examine H3K115ac enrichment on fragile nucleosomes through sucrose gradient sedimentation. A control for H3K27ac or H3K122ac would be nice for comparison.

      We do not have the material available to perform these experiments

      (8) When discussing Figures 3 and SF3, the authors mention performing a different MNase for a second ChIP. Showing the MNase distribution for both the more highly digested and the lowly digested would be nice. a) Related to the above, the authors show input in SF3E to argue that the difference in H3K115ac vs H3K27ac is not due to the library, but they do not show the MNase digestion patterns, which is more important for this argument.

      Input libraries (first two graphs of FigS3E) are the MNase-digested chromatin. Comparison of nucleotide frequencies from millions of reads is more robust method than the fragment length patterns.

      (9) The authors move on to examine H3K115ac at enhancers. Just out of curiosity, given what was found at promoters, is H3K115ac enriched at +CGI enhancers? And what is the correlation with enhancer transcription?

      This is an interesting point, but the number of enhancers associated with CGI is not very high and so we did not focus on this. We have not analysed a correlation with eRNAs in this paper.

      (10) The authors state on page 14 that the most frequent changes in H3K115ac during differentiation are at these enhancers. So do these changes connect with differentiation-specific genes, and/or genes that have altered transcription during differentiation? Just trying to understand the functional role.

      Given the challenges of connecting enhancers with target genes, we have not addressed this question quantitatively. However, we draw the reviewer’s attention to the Genome Browser shots in Figures 2D and S2C, which show clear gain of H3K115ac (and ATAC-seq peaks) at intra and intergenic regions close to genes whose transcription is activated during the differentiation to NPCs.

      (11) Related, at the end of page 14, the authors state that the changes in H3K115ac correlate with changes in ATAC-seq; I imagine this dynamic is not unique for H3K115ac and this is observed for other PTMs (H3K27ac), so assessing and clarifying this, to again get to the specific interest of H3K115ac, would be ideal.

      We have not claimed that chromatin accessibility is unique to H3K115ac. It is the location of H3K115ac which is found inside the ATAC-seq peak region while H3K27ac is found only upstream/downstream of the ATAC peak that is so striking. This is apparent in Fig 4C.

      (12) The authors examine levels of H3K115ac in H3.3 KO cell lines via western blot (Figure 4D), but no replicates and/or quantification are shown.

      We now provide a biological replicate for the Western Blot (new FigS4H) together with an image of the whole gel for the data in Fig 4D

      (13) In Figure S4 and at the end of page 17, the authors are arguing that there is a link to pioneer TF complexes, based on Oct4 binding. First, while Oct4 has pioneering activity, not all Oct4 sites (or motifs) are pioneering; this has been established. So if you want to use Oct4, substratifying by pioneer vs no pioneer is necessary. Second, demonstrating this is unique to pioneer and not to non-pioneer TFs would be an important control.

      In response to the reviewer’s comment, we have removed the term “pioneer” from the manuscript.

      (14) Minor point: Figure 4 A and B, there are some formatting issues with the scale bars.

      We thank the reviewer for pointing this out, and the errors have been corrected in the revised figure.

      (15) Minor point is that it should be clear when single replicates of data are used and when PE/SR sequences are combined or which one is used in each analysis, as this was hard to discern when reading the paper and figure legends.

      We have clearly stated in the text that, after Figure2, we repeated all experiments in paired-end mode. All processing steps are defined separately for single end and paired end datasets in the method section. Details of biological replicates are provided in Sup. Table 1. These concerns are also addressed in our response to Reviewer’s public comment-1.

      (16) Minor point: it is surprising that different MNase and different units were used in the ChIP vs sucrose sedimentation. Could the authors clarify why?

      Chromatin prep for sucrose gradients were done on a much larger scale than for ChIP-seq and required different setups to obtain the right level of MNase digestion.

      (17) The authors note that fragile nucleosomes contain H2A.Z and H3.3, but they never perform an analysis of available data to demonstrate a correlation (or better a quantifiable correlation) between H3K115ac occupancy and these marks at the locations they identify H3K115ac.

      Since have shown (Fig. 4) that depletion of H3.3 does not affect overall levels of H3K115ac, we do not think there is value in further quantitative correlative analyses of H3K115ac and variant histones.

      (18) Minor point: What is the overlap in peaks for H3K115ac, H3K122ac, and H3K27ac (Figure 1C)?

      Nearly all H3K115ac peaks overlap with H3K122ac and/or H3K27ac. Its most distinct properties are its association with CGI promoters, fragile nucleosomes and its unique localisation within the NDRs, three points that the manuscript is focussed on.

      Reviewer #3 (Recommendations for the authors):

      (1) The western blot results in Figure 4D probing for H3, H3.3, and H3K115ac use Ponceau S staining, presumably of an area of the membrane where histones might be expected to migrate, as a measure of loading. However, the Ponceau S bands appear uniformly weaker in the H3.3KO lanes, yet despite this, blotting with H3.3 antibody detects a band in H3.3 knockout ESCs, suggesting that the antibody does not have a high degree of specificity. Again, a blocking experiment with appropriate peptides would instill more confidence in the specificity of these reagents, and/or the authors could provide independent validation of the knockout model to differentiate between a partial knockout or antibody cross-reactivity (e.g., by Sanger sequencing).

      In a revised Fig. S4H we now show the whole gel corresponding to this blot but including co-staining with an antibody for H4 to provide a better loading control. We also provide a biological replicate of this Western blot in the lower panel of Fig. S4H.

      (2) The manuscript would benefit from in vitro follow-up and validation, but if the authors intend to keep the manuscript primarily in silico, I suggest dedicating a few lines in each section to explain the plots, their axes, and their purpose, as well as to assist with interpretation, rather than directly discussing the results. This would make the manuscript more accessible and understandable for a broader audience in the field of epigenetics.

      In the revised version, we have tried to improve the text to make the data more accessible to a broad audience.

    1. eLife Assessment

      In this important study, the authors conducted extensive sets of computational and investigations of the mechanism of cholesterol transport in the smoothened (SMO) protein. The computational component integrated multiple state-of-the-art approaches such as adaptive sampling, free energy simulations, and Markov state modeling, providing compelling support for the proposed mechanistic model, which is further validated with solid experimental mutagenesis data.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript uses primarily simulation tools to probe the pathway of cholesterol transport with the smoothened (SMO) protein. The pathway to the protein and within SMO is clearly discovered and interactions deemed important are tested experimentally to validate the model predictions.

      Strengths:

      The authors have clearly demonstrated how cholesterol might go from the membrane through SMO for the inner and outer leaflets of a symmetrical membrane model. The free energy profiles, structural conformations and cholesterol-residue interactions are clearly described.

      Weaknesses:

      None. I find the revised manuscript strong and the work should be published.

    3. Reviewer #2 (Public review):

      Summary:

      In this work, the authors applied a range of computational methods to probe the translocation of cholesterol through the Smoothened receptor. They test whether cholesterol is more likely to enter the receptor straight from the outer leaflet of membrane or via a binding pathway in the inner leaflet first. Their data reveal that both pathways are plausible but that the free energy barriers of pathway 1 is lower suggesting this route is preferable. They also probe the pathway of cholesterol transport from the transmembrane region to the cysteine-rich domain (CRD).

      Strengths:

      A wide range of computational techniques are used, including potential of mean force calculations, adaptative sampling, dimensionality reduction using tICA, and MSM modelling. These are all applied in a rigorous manner and the data are very convincing. The computational work is an exemplar of a well-carried out study.

      Their computational predictions are experimentally supported using mutagenesis, with an excellent agreement between their PMF and mRNA fold change data.

      The data are described clearly and coherently, with excellent use of figures. They combine their findings into a mechanism for cholesterol transport, which on the whole seems sound.

      Their methods are described well, and much of their analysis methods have been made available via GitHub, which is an additional strength.

    4. Reviewer #3 (Public review):

      This manuscript presents a study combining molecular dynamics simulations and Hedgehog (Hh) pathway assays to investigate cholesterol translocation pathways to Smoothened (SMO), a G protein-coupled receptor central to Hedgehog signal transduction. The authors identify and characterize two putative cholesterol access routes to the transmembrane domain (TMD) of SMO and propose a model whereby cholesterol traverses through the TMD to the cysteine-rich domain (CRD), which is presented as the primary site of SMO activation.

      The MD simulations and biochemical experiments are carefully executed and provide useful data.

      Comments on revisions:

      I appreciate the authors' detailed response and the substantial revisions made to the manuscript. The changes addressing Comments 3.1-3.5 have significantly improved the balance and framing of the work, and my primary concerns regarding overstatement and selective interpretation have been satisfactorily addressed.

      The authors' rebuttal to my initial review includes extended argumentation regarding specific interpretations of prior studies and broader models of SMO regulation. These issues represent longstanding differences in interpretation that have already been discussed extensively in the literature and are not essential to evaluating the quality or conclusions of the present study.

      For readers seeking a comprehensive and balanced overview of cholesterol-dependent SMO activation that integrates both CRD- and TMD-centered models, I would point to recent review articles (e.g., Zhang and Beachy, Nat Rev Mol Cell Biol2023). I do not feel it is productive to rehash these debates further in the context of this review, and I have no additional substantive concerns with the revised manuscript.

    5. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript uses primarily simulation tools to probe the pathway of cholesterol transport with the smoothened (SMO) protein. The pathway to the protein and within SMO is clearly discovered, and interactions deemed important are tested experimentally to validate the model predictions.

      Strengths:

      The authors have clearly demonstrated how cholesterol might go from the membrane through SMO for the inner and outer leaflets of a symmetrical membrane model. The free energy profiles, structural conformations, and cholesterol-residue interactions are clearly described.

      We thank the reviewer for their kind words.

      (1) Membrane Model: The authors decided to use a rather simple symmetric membrane with just cholesterol, POPC, and PSM at the same concentration for the inner and outer leaflets. This is not representative of asymmetry known to exist in plasma membranes (SM only in the outer leaflet and more cholesterol in this leaflet). This may also be important to the free energy pathway into SMO. Moreover, PE and anionic lipids are present in the inner leaflet and are ignored. While I am not requesting new simulations, I would suggest that the authors should clearly state that their model does not consider lipid concentration leaflet asymmetry, which might play an important role.

      We thank the reviewer for their comment. Membrane asymmetry is inherent in endogenous systems; we acknowledge that as a limitation of our current model. We have addressed the comment by adding this limitation to our discussion in the manuscript.

      Added lines: (End of paragraph 6, Results subsection 2):

      “One possibility that might alter the thermodynamic barriers is native membrane asymmetry, particularly the anionic lipid-rich inner leaflet. This presents as a limitation of our current model.”

      (2) Statistical comparison of barriers: The barriers for pathways 1 and 2 are compared in the text, suggesting that pathway 2 has a slightly higher barrier than pathway 1. However, are these statistically different? If so, the authors should state the p-value. If not, then the text in the manuscript should not state that one pathway is preferred over the other.

      We thank the reviewer for their comment. We have added statistical t-tests for the barriers.

      Changes made: (Paragraph 6, Results subsection 2)

      “However, we also observe that pathway 1 shows a lower thermodynamic barrier (5.8 ± 0.7 kcal/mol v/s 6.5 ± 0.8 kcal/mol, p = 0.0013)”

      (3) Barrier of cholesterol (reasoning): The authors on page 7 argue that there is an enthalpy barrier between the membrane and SMO due to the change in environment. However, cholesterol lies in the membrane with its hydroxyl interacting with the hydrophilic part of the membrane and the other parts in the hydrophobic part. How is the SMO surface any different? It has both characteristics and is likely balanced similarly to uptake cholesterol. Unless this can be better quantified, I would suggest that this logic be removed.

      We thank the reviewer for this suggestion. We have removed the line to avoid confusion.

      Reviewer #2 (Public review):

      Summary:

      In this work, the authors applied a range of computational methods to probe the translocation of cholesterol through the Smoothened receptor. They test whether cholesterol is more likely to enter the receptor straight from the outer leaflet of the membrane or via a binding pathway in the inner leaflet first. Their data reveal that both pathways are plausible but that the free energy barriers of pathway 1 are lower, suggesting this route is preferable. They also probe the pathway of cholesterol transport from the transmembrane region to the cysteine-rich domain (CRD).

      Strengths:

      (1) A wide range of computational techniques is used, including potential of mean force calculations, adaptive sampling, dimensionality reduction using tICA, and MSM modelling. These are all applied rigorously, and the data are very convincing. The computational work is an exemplar of a well-carried out study.

      (2) The computational predictions are experimentally supported using mutagenesis, with an excellent agreement between their PMF and mRNA fold change data.

      (3) The data are described clearly and coherently, with excellent use of figures. They combine their findings into a mechanism for cholesterol transport, which on the whole seems sound.

      (4) The methods are described well, and many of their analysis methods have been made available via GitHub, which is an additional strength.

      Weaknesses:

      (1) Some of the data could be presented a little more clearly. In particular, Figure 7 needs additional annotation to be interpretable. Can the position of the cholesterol be shown on the graph so that we can see the diameter change more clearly?

      We thank the reviewer for this suggestion. We have added the cholesterol positions as requested.

      Changes made: (Caption, Figure 7)

      “The tunnel profile during cholesterol translocation in SMO. (a) Free energy plot of the zcoordinate v/s the tunnel diameter when cholesterol is present in the core TMD. The tunnel shows a spike in the radius in the TMD domain, indicating the presence of a cholesterol-accommodating cavity. (b) Representative figure for the tunnel when a cholesterol molecule is in the TMD. (c) Same as (a), when cholesterol is at the TMD-CRD interface. (e) same as (b), when cholesterol is at the TMD-CRD interface. (e) same as (a), when cholesterol is at the CRD binding site. (f) same as (b), when cholesterol is at the CRD binding site. Tunnel diameters shown as spheres. Cholesterol positions marked on plots using dotted lines. All snapshots presented are frames taken from MD simulations.”

      (2) In Figure 3C, it doesn’t look like the Met is constricting the tunnel at all. What residue is constricting the tunnel here? Can we see the Ala and Met panels from the same angle to compare the landscapes? Or does the mutation significantly change the tunnel? Why not A283 to a bulkier residue? Finally, the legend says that the figure shows that cholesterol can still pass this residue, but it doesn’t really show this. Perhaps if the HOLE graph was plotted, we could see the narrowest point of the tunnel and compare it to the size of cholesterol.

      We thank the reviewer for this suggestion. A283 was mutated to methionine as it presents with a longer heavy tail containing sulfur. We have plotted the tunnel radii for both WT and A283M mutants and added them as a supplemental figure. As shown in the figure, the presence of methionine doesn’t completely block the tunnel, but occludes it, thereby increasing the barrier for cholesterol transport slightly.

      Changes made: (End of Results subsection 1)

      “When we calculated the PMF for cholesterol entry, A<sup>2.60f</sup>M mutant showed restricted tunnel but it did not fully block the tunnel (Figure 3—figure Supplement 3).”

      (3) The PMF axis in 3b and d confused me for a bit. Looking at the Supplementary data, it’s clear that, e.g., the F455I change increases the energy barrier for chol entering the receptor. But in 3d this is shown as a -ve change, i.e., favourable. This seems the wrong way around for me. Either switch the sign or make this clearer in the legend, please.

      We thank the reviewer for this suggestion. We measured ∆PMF as PMF<sub>WT</sub> PMF<sub>mutant</sub>, hence the negative values. We have added additional text to the legend to clarify this.

      Changes made: (Caption, Figure 3)

      “(b) ∆Gli1 mRNA fold change (high SHH vs untreated) and ∆ PMF (difference of peak PMF , calculated as PMF<sub>WT</sub> - PMF<sub>mutant</sub>) plotted for the mutants in Pathway 1. (c) Example mutant A<sup>2_._60f</sup>M shows that cholesterol can enter SMO through Pathway 1 even on a bulky mutation. (d) Same as (b) but for Pathway 2 (e) Example mutant L<sup>5.62f</sup>A shows that cholesterol can enter SMO through Pathway 2 due to lesser steric hindrance. All snapshots presented are frames taken from MD simulations.”

      Changes made: (Caption, Figure 6)

      “(b) ∆Gli1 mRNA fold change (high SHH vs untreated) and ∆ PMF (difference of peak PMF, calculated as PMF<sub>WT</sub> - PMF<sub>mutant</sub>) plotted for mutants along the TMD-CRD pathway. (c, d) Example mutants Y<sup>LD</sup>A and F<sup>5.65f</sup>A show that cholesterol is unable to translocate through this pathway because of the loss of crucial hydrophobic contacts provided by Y207 and F484 and along the solvent-exposed pathway.”

      (4) The impact of G280V is put down to a decrease in flexibility, but it could also be a steric hindrance. This should be discussed.

      We thank the reviewer for this suggestion. We have added it as a possible mechanism of the decrease in activity of SMO.

      Changes made: (Paragraph 5, Results subsection 1)

      “We mutated G280<sup>2.57f</sup>  to valine - G<sup>2.57f</sup>V to test whether reducing the flexibility of TM2 prevents cholesterol entry into the TMD. Consequently, the activity of mSMO showed a decrease. However, this decrease could also be attributed to steric hindrance added by the presence of a bulky propyl group in valine.”

      (5) Are the reported energy barriers of the two pathways (5.8plus minus0.7 and 6.5plus minus0.8 kcal/mol) significantly and/or substantially different enough to favour one over the other? This could be discussed in the manuscript.

      We thank the reviewer for this suggestion. We have added statistical t-tests for the barriers.

      Changes made: (Paragraph 6, Results subsection 2)

      “However, we also observe that pathway 1 shows a lower thermodynamic barrier (5.8 ± 0.7 kcal/mol v/s 6.5 ± 0.8 kcal/mol, p = 0.001)”

      (6) Are the energy barriers consistent with a passive diffusion-driven process? It feels like, without a source of free energy input (e.g., ion or ATP), these barriers would be difficult to overcome. This could be discussed.

      We thank the reviewer for this suggestion. We have added a discussion to further clarify this point.

      Discussion: (Paragraph 6, Results subsection 2)

      “These values are comparable to ATP-Binding Cassette (ABC) transporters of membrane lipids, which use ATP hydrolysis (-7.54 ± 0.3 kcal/mol) (Meurer et al., 2017) to drive lipid transport from the membrane to an extracellular acceptor. Some of these transporters share the same mechanism as SMO, where the lipid from the inner leaflet is flipped and transported to the extracellular acceptor protein (Tarling et al., 2013). Additionally, for secondary active transporters that do not use ATP for the transport of substrates, a thermodynamic barrier of 5-6 kcal/mol has been reported in literature. (Chan et al., 2022; Selvam et al., 2019; McComas et al., 2023; Thangapandian et al., 2025).”

      (7) Regarding the kinetics from MSM, it is stated that the values seen here are similar to MFS transporters, but this then references another MSM study. A comparison to experimental values would support this section a lot.

      We thank the reviewer for this suggestion. We have added a discussion discussing millisecond-scale timescales measured for MFS transporters.

      Changes made: (Paragraph 2, Results subsection 5)

      “These timescales are comparable to the substrate transport timescales of Major Facilitator Superfamily (MFS) transporters (Chan et al., 2022). Furthermore, several experimental studies have also resolved the millisecond-scale kinetics of MFS transporters (Blodgett and Carruthers, 2005; Körner et al., 2024; Bazzone et al., 2022; Smirnova et al., 2014; Zhu et al., 2019), further corroborating the results from our study.”

      Reviewer #2 (Recommendations for the authors):

      (1) The heatmaps in Figures 2a and 4a are great. On these, an arrow denotes what looks like a minimum energy path. Is it possible to see this plotted, as this might show the height of the energy barriers more clearly?

      We thank the reviewer for this suggestion. We have computed the minimum energy paths for both pathways and presented them in a supplementary figure.

      Added lines: (Paragraph 4, Results subsection 1):

      For further clarity, we have plotted the minimum energy path taken by cholesterol as it translocates along this pathway (Figure 2—figure Supplement 3)a,b)

      Added lines: (Paragraph 4, Results subsection 2):

      For further clarity, we have plotted the minimum energy path taken by cholesterol as it translocates along this pathway (Figure 2—figure Supplement 3)c,d)

      (2) The tiCA data in S15 is first referred to on line 137, but the technique isn’t introduced until line 222. This makes understanding the data a little confusing. Reordering this might improve readability.

      We thank the reviewer for this suggestion. We have reordered the text to make it clearer.

      Changes made: (Paragraph 2, Results subsection 1) This provides evidence for multiple stable poses along the pathway as observed in the multiple stable poses of cholesterol in Cryo-EM structures of SMO bound to sterols (Deshpande et al., 2019; Qi et al., 2019b, 2020). A reliable estimate of the barriers comes from using the time-lagged Independent Components (tICs), which project the entire dataset along the slowest kinetic degrees of freedom. Overall, the highest barrier along Pathway 1 is 5.8 ± 0.7 kcal/mol, and it is associated with the entry of cholesterol into the TMD (Figure 2—Figure Supplement 2).

      Changes made: (Paragraph 3, Results subsection 2)

      “On plotting the first two components of tICs, (Figure 2—Figure Supplement 2), we observe that the energetic barrier between η and θ is ∼6.5 ± 0.8 kcal/mol.”

      (3) Missing bracket on line 577.

      We thank the reviewer for this suggestion. The typo has been fixed.

      (4) Line 577: Fig. S2nd?

      We thank the reviewer for this suggestion. This typo has been fixed.

      Reviewer #3 (Public review):

      Summary:

      This manuscript presents a study combining molecular dynamics simulations and Hedgehog (Hh) pathway assays to investigate cholesterol translocation pathways to Smoothened (SMO), a G protein-coupled receptor central to Hedgehog signal transduction. The authors identify and characterize two putative cholesterol access routes to the transmembrane domain (TMD) of SMO and propose a model whereby cholesterol traverses through the TMD to the cysteine-rich domain (CRD), which is presented as the primary site of SMO activation. The MD simulations and biochemical experiments are carefully executed and provide useful data.

      Weaknesses:

      However, the manuscript is significantly weakened by a narrow and selective interpretation of the literature, overstatement of certain conclusions, and a lack of appropriate engagement with alternative models that are well-supported by published data-including data from prior work by several of the coauthors of this manuscript. In its current form, the manuscript gives a biased impression of the field and overemphasizes the role of the CRD in cholesterol-mediated SMO activation. Below, I provide specific points where revisions are needed to ensure a more accurate and comprehensive treatment of the biology.

      (1) Overstatement of the CRD as the Orthosteric Site of SMO Activation

      The manuscript repeatedly implies or states that the CRD is the orthosteric site of SMO activation, without adequate acknowledgment of alternative models. To give just a few examples (of many in this manuscript):

      (a) “PTCH is proposed to modulate the Hh signal by decreasing the ability of membrane cholesterol to access SMO’s extracellular cysteine-rich domain (CRD)” (p. 3).

      (b) “In recent years, there has been a vigorous debate on the orthosteric site of SMO” (p. 3).

      (c) “cholesterol must travel through the SMO TMD to reach the orthosteric site in the CRD” (p. 4).

      (d) “we observe cholesterol moving along TM6 to the TMD-CRD interface (common pathway, Fig. 1d) to access the orthosteric binding site in the CRD” (p. 6).

      While the second quote in this list at least acknowledges a debate, the surrounding text suggests that this debate has been entirely resolved in favor of the CRD model. This is misleading and not reflective of the views of other investigators in the field (see, for example, a recent comprehensive review from Zhang and Beachy, Nature Reviews Molecular and Cell Biology 2023, which makes the point that both the CRD and 7TM sites are critical for cholesterol activation of SMO as well as PTCH-mediated regulation of SMO-cholesterol interactions).

      In contrast, a large body of literature supports a dual-site model in which both the CRD and the TMD are bona fide cholesterol-binding sites essential for SMO activation. Examples include:

      (a) Byrne et al., Nature 2016: point mutation of the CRD cholesterol binding site impairs-but does not abolish-SMO activation by cholesterol (SMO D99A, Y134F, and combination mutants - Fig 3 of the 2016 study).

      (b) Myers et al., Dev Cell 2013 and PNAS 2017: CRD deletion mutants retain responsiveness to PTCH regulation and cholesterol mimetics (similar Hh responsiveness of a CRD deletion mutant is also observed in Fig. 4 Byrne et al, Nature 2016).

      (c) Deshpande et al., Nature 2019: mutation of residues in the TMD cholesterol binding site blocks SMO activation entirely, strongly implicating the TMD as a required site, in contrast to the partial effects of mutating or deleting the CRD site.

      Qi et al., Nature 2019, and Deshpande et al., Nature 2019, both reported cholesterol binding at the TMD site based on high-resolution structural data. Oddly, Deshpande et al., Nature 2019, is not cited in the discussion of TMD binding on p. 3, despite being one of the first papers to describe cholesterol in the TMD site and its necessity for activation (the authors only cite it regarding activation of SMO by synthetic small molecules).

      Kinnebrew et al., Sci Adv 2022 report that CRD deletion abolished PTCH regulation, which is seemingly at odds with several studies above (e.g., Byrne et al, Nature 2016; Myers et al, Dev Cell 2013); but this difference may reflect the use of an N-terminal GFP fusion to SMO in the Kinnebrew et al 2022, which could alter SMO activation properties by sterically hindering activation at the TMD site by cholesterol (but not synthetic SMO agonists like SAG); in contrast, the earlier work by Byrne et al is not subject to this caveat because it used an untagged, unmodified form of SMO.

      Although overexpression of PTCH1 and SMO (wild-type or mutant) has been noted as a caveat in studies of CRD-independent SMO activation by cholesterol, this reviewer points out that several of the studies listed above include experiments with endogenous PTCH1 and low-level SMO expression, demonstrating that SMO can clearly undergo activation by cholesterol (as well as regulation by PTCH1) in a manner that does not require the CRD.

      Recommendation: The authors should revise the manuscript to provide a more balanced overview of the field and explicitly acknowledge that the CRD is not the sole activation site. Instead, a dual-site model is more consistent with available structural, mutational, and functional data. In addition, the authors should reframe their interpretation of their MD studies to reflect this broader and more accurate view of how cholesterol binds and activates SMO.

      We thank the reviewer for this comprehensive overview of the existing literature. We agree that cholesterol binding to both the TMD and CRD sites is required for full activation of SMO. As described below in responses to comments, we have made changes to the manuscript to make this point clear. For instance, in the revised manuscript, we refrain from calling the CRD cholesterol binding site the “orthosteric site”. Instead, we highlight that the goal of the manuscript is not to resolve the debate over whether the TMD or CRD site is more important for PTCH1 regulation by SMO but rather to use molecular dynamics to understand the fascinating question of how cholesterol in the membrane can reach the CRD, located at a significant distance above the outer leaflet of the membrane. We believe that this is an important goal since there is an abundance of evidence that supports the view that PTCH1 inhibits SMO by reducing cholesterol access to the CRD. This evidence is now summarized succinctly in the introduction:

      Changes made: (Paragraph 4, Introduction)

      “While cholesterol binding to both the TMD and CRD sites is required for full SMO activation, our work focuses on how cholesterol gains access to the CRD site, perched above the outer leaflet of the membrane (Luchetti et al., 2016; Kinnebrew et al., 2022). Multiple lines of evidence suggest that PTCH1-regulated cholesterol binding to the CRD plays an instructive role in SMO regulation both in cells and animals. Mutations in residues predicted to make hydrogen bonds with the hydroxyl group of cholesterol bound to the CRD reduced both the potency and efficacy of SHH in cellular signaling assays (Kinnebrew et al., 2022; Byrne et al., 2016) and, more importantly, eliminated HH signaling in mouse embryos (Xiao et al., 2017). Experiments using both covalent and photocrosslinkable sterol probes in live cells directly show that PTCH1 activity reduces sterol access to the CRD (Kinnebrew et al., 2022; Xiao et al., 2017). Notably, our simulations evaluate a path of cholesterol translocation that includes both the TMD and CRD sites: cholesterol first enters the 7-transmembrane domain bundle from the membrane; it then engages the TMD site before continuing along a conduit to the CRD site. Thus, we analyze translocation energetics and residue-level contacts along a path that includes both the TMD and the CRD.”

      However, Reviewer 3 makes several comments below that are biased, inaccurate, or selective. We feel it is important to address these so readers can approach the literature from a balanced perspective. Indeed, the eLife review forum provides an ideal venue to present contrasting views on a scientific model. We encourage the editors to publish both Reviewer 3’s comments and our response in full so readers can read the original papers and reach their own conclusions. It is important to note these issues are not relevant to the quality of the computational and experimental data presented in this paper.

      We have now removed the term “orthosteric” to describe the CRD site throughout the paper and clearly state in the introduction that “both the CRD and TMD sites are required for SMO activation” but that our focus is on how cholesterol moves from the membrane to the CRD site. There is no doubt that cholesterol binding to the CRD plays a key role in SMO activation– our focus on this path is justified and does not devalue the importance of the TMD site. Our prior models (see Figure 7 of Kinnebrew 2022 explicitly include contributions of both sites).

      Now we respond to some of the concerns outlined, individually:

      (1) Byrne et al., Nature 2016: point mutation of the CRD cholesterol binding site impairs-but does not abolish-SMO activation by cholesterol (SMO D99A, Y134F, and combination mutants - Fig 3 of the 2016 study)

      The fact that a point mutation dramatically diminishes (but does not abolish signaling) does not mean that the CRD cholesterol binding site is not important for SMO regulation. Indeed, the reviewer fails to mention that Song et. al. (Molecular Cell, 2017) found that a SMO protein carrying a subtle mutation at D99 (D95/99N, a residue that makes a hydrogen bond with the cholesterol hydroxyl) completely abolishes SMO signaling in mouse embryos. Thus, the CRD site is critical for SMO activation in an intact animal, justifying our focus on evaluating the path of cholesterol translocation to the CRD site.

      (2) Myers et al., Dev Cell 2013 and PNAS 2017: CRD deletion mutants retain responsiveness to PTCH regulation and cholesterol mimetics (similar Hh responsiveness of a CRD deletion mutant is also observed in Fig 4 Byrne et al, Nature 2016).

      The Reviewer fails to note that CRD-deleted versions of SMO have markedly (>10-fold) higher basal (i.e. ligand-independent) activity compared to full-length SMO. The response to SHH is minimal (∼2-fold), compared to >50-100-fold with full-length SMO. Thus, CRD-deleted SMO is likely in a non-native conformation. Local changes in cholesterol accessibility caused by PTCH1 inactivation or cholesterol loading can cause small fluctuations in delta-CRD activity, but this cannot be used to infer meaningful insights about how native, full-length SMO (with >10-fold lower basal activity) is regulated. We encourage the reviewer to read our previous paper (Kinnebrew et. al. 2022), which presents a unified view of how the TMD and CRD sites together regulate SMO activation.

      A more physiological experiment, reported in Kinnebrew et. al. 2022, tested mutations in residues that make hydrogen bonds with cholesterol at the CRD and TMD sites in the context of full-length SMO. These mutants were stably expressed at moderate levels in Smo<sup>−/−</sup> cells. Mutations at the CRD site reduced the fold-increase in signaling output in response to SHH, as would be expected for a PTCH1-regulated site. In contrast, analogous mutations in the TMD site reduced the magnitude of both basal and maximal signaling, without affecting the fold-change in response to SHH. In signaling assays, the key parameter in evaluating the impact of a mutation is whether it impacts the change in output in response to a signal (in this case PTCH1 inactivation by SHH). A mutation in SMO that affects PTCH1 regulation is expected to decrease the fold-change in signaling in response to SHH, a criterion that is fulfilled by mutations in the CRD site. Accordingly, mutations in the CRD site abolish SMO signaling in mouse embryos (Xiao et al., 2017).

      (3) Deshpande et al., Nature 2019: mutation of residues in the TMD cholesterol binding site blocks SMO activation entirely, strongly implicating the TMD as a required site, in contrast to the partial effects of mutating or deleting the CRD site.

      Introduction of bulky mutations at the TMD site (V333F) that abolish SMO activity were first reported by Byrne et. al. 2016 and were used to markedly increase the stability of SMO for protein expression. These mutations indeed stabilize the inactive state of SMO, increasing protein abundance and completely preventing its localization at primary cilia. SMO variants carrying such bulky mutations cannot be used to infer the importance of the TMD site since they do not distinguish between the following possibilities: (1) SMO is inactive because the sterol cannot bind, or (2) SMO is inactive because it is locked in an inactive conformation, or (3) SMO is inactive because it cannot localize to primary cilia (where it must be localized to activate downstream signaling).

      As described in Response 3.3, a better evaluation of the importance of the TMD site is the use of mutations in residues that make hydrogen bonds with the hydroxyl group of TMD cholesterol. These mutations do not markedly increase protein stability or prevent ciliary localization (Kinnebrew 2022, Fig.S2). While a TMD site mutation decreases the magnitude of maximal (and basal) SMO signaling, it does not impact the fold-increase in signal output in response to Hh ligands (the key parameter that should be used to evaluate PTCH1 activity).

      (4) Qi et al., Nature 2019, and Deshpande et al., Nature 2019, both reported cholesterol binding at the TMD site based on high-resolution structural data. Oddly, Deshpande et al., Nature 2019 not cited in the discussion of TMD binding on p. 3, despite being one of the first papers to describe cholesterol in the TMD site and its necessity for activation (the authors only cite it regarding activation of SMO by synthetic small molecules)

      The reference has now been added at this location in the manuscript.

      (5) Kinnebrew et al., Sci Adv 2022 report that CRD deletion abolished PTCH regulation, which is seemingly at odds with several studies above (e.g., Byrne et al, Nature 2016; Myers et al, Dev Cell 2013); but this difference may reflect the use of an N-terminal GFP fusion to SMO in the Kinnebrew et al 2022, which could alter SMO activation properties by sterically hindering activation at the TMD site by cholesterol (but not synthetic SMO agonists like SAG); in contrast, the earlier work by Byrne et al is not subject to this caveat because it used an untagged, unmodified form of SMO.

      The reviewer fails to note that CRD deleted versions of SMO have markedly (>10-fold) higher basal activity than full-length SMO. The response to SHH is minimal (∼2fold), compared to >50-fold with full-length SMO. Thus, CRD-deleted SMO is likely in a non-native conformation. Local changes in cholesterol accessibility caused by PTCH1 inactivation or cholesterol loading can cause small fluctuations in delta-CRD activity, but this cannot be used to infer meaningful insights about how native, full-length SMO (with >10-fold lower basal activity) is regulated. Please see Response 3.3 for further details.

      Reviewer 3 presents an incomplete picture of the extensive experiments reported in Kinnebrew et. al. to establish the functionality of YFP-tagged delta-CRD SMO. Most importantly, a TMDselective sterol analog (KK174) can fully activate YFP-tagged delta-CRD, showing conclusively that the YFP fusion does not block sterol access to the TMD site. The fact that this protein is nearly unresponsive to SHH highlights the critical role of the CRD-bound cholesterol in SMO regulation by PTCH1. Indeed, the YFP-tagged, CRD-deleted SMO was made purposefully to test the requirement of the CRD in a construct that had normal basal activity. Again, this data justifies the value of investigating the path of cholesterol movement from the membrane via the TMD site to the CRD.

      (6) Although overexpression of PTCH1 and SMO (wild-type or mutant) has been noted as a caveat in studies of CRD-independent SMO activation by cholesterol, this reviewer points out that several of the studies listed above include experiments with endogenous PTCH1 and low-level SMO expression, demonstrating that SMO can clearly undergo activation by cholesterol (as well as regulation by PTCH1) in a manner that does not require the CRD.

      This comment is inaccurate. The data presented in Deshpande et. al. (and prior work in Myers et. al.) used transient transfection to overexpress SMO in Smo<sup>−/−</sup> cells. At the individual cell level transient transfection produces expression levels that are markedly higher (10-1000-fold) than stable expression (in addition to being more variable). Most scientists would agree that stable expression (as used in Kinnebrew 2022) at a moderate expression level is a better system to compare mutant phenotypes, assess basal and activated signaling, and provide an accurate measure of the fold-change in signal output in response to SHH. Notably, introduction of a mutation in the CRD cholesterol binding site at the endogenous mouse Smo locus (an even better experiment than stable expression) leads to complete loss of SMO activity (PMID 28344083). This result again justifies our investigation of the pathway of cholesterol movement from the membrane to the CRD site.

      We have changed the initial discussion and reflect a more general outlook.

      Changes made: (Paragraph 1, Introduction)

      “PTCH modulates the availability of accessible cholesterol at the primary cilium and thereby regulates SMO, with models invoking effects on both the CRD and 7TM pockets.”

      Changes made: (Results subsection 3, paragraph 1)

      “According to the dual-site model, to reach the binding site in the CRD (ζ), cholesterol translocate along the TMD-CRD interface from the TM binding site (α∗) is required.”

      Added lines: (Paragraph 5, Results subsection 3):

      “The computational investigation showed here covers the dual-site model, where cholesterol reaches the CRD site via binding to the TM binding site first. In comparison to the CRD site, the TM site is more stable by ∼ 2 kcal/mol (Figure 2—Figure Supplement 3b, d).”

      Added lines: (Paragraph 2, Conclusions):

      “Here we have explored the role the CRD-site plays in SMO activation. In addition, through simulating the CRD site-dependent SMO activation hypothesis, we have also simulated the TMD site-dependent activation. We show that the overall stability of cholesterol is higher than the CRD site by ∼ 2 kcal/mol.”

      (2) Bias in Presentation of Translocation Pathways

      The manuscript presents the model of cholesterol translocation through SMO to the CRD as the predominant (if not sole) mechanism of activation. Statements such as: "Cholesterol traverses SMO to ultimately reach the CRD binding site" (p. 6) suggest an exclusivity that is not supported by prior literature in the field. Indeed, the authors’ own MD data presented here demonstrate more stable cholesterol binding at the TMD than at the CRD (p 17), and binding of cholesterol to the TMD site is essential for SMO activation. As such, it is appropriate to acknowledge that cholesterol may activate SMO by translocating through the TM5/6 tunnel, then binding to the TMD site, as this is a likely route of SMO activation in addition to the CRD translocation route they highlight in their discussion.

      The authors describe two possible translocation pathways (Pathway 1: TM2/3 entry to TMD; Pathway 2: TM5/6 entry and direct CRD transfer), but do not sufficiently acknowledge that their own empirical data support Pathway 2 as more relevant. Indeed, because their experimental data suggest Pathway 2 is more strongly linked to SMO activation, this pathway should be weighted more heavily in the authors’ discussion. In addition, Pathway 2 is linked to cholesterol binding to both the TMD and CRD sites (the former because the TMD binding site is at the terminus of the hydrophobic tunnel, the latter via the translocation pathway described in the present manuscript), so it is appropriate that Pathway 2 figures more prominently than Pathway 1 in the authors’ discussion.

      The authors also claim that "there is no experimental structure with cholesterol in the inner leaflet region of SMO TMD" (p 16). However, a structural study of apo-SMO from the Manglik and Cheng labs (Zhang et al., Nat Comm, 2022) identified a cholesterol molecule docked at the TM5/6 interface and also proposed a "squeezing" mechanism by which cholesterol could enter the TM5/6 pocket from the membrane. The authors do not consider this SMO conformation in their models, nor do they discuss the possibility that conformational dynamics at the TM5/6 interface could facilitate cholesterol flipping and translocation into the hydrophobic conduit, despite both possibilities having precedent in the 2022 empirical cryoEM structural analysis.

      Recommendation: The authors should avoid oversimplifying the SMO cholesterol activation process, either by tempering these claims or broadening their discussion to better reflect the complexity and multiplicity of cholesterol access and activation routes for SMO. They should also consider the 2022 apo-SMO cryoEM structure in their analysis of the TM5/6 translocation pathway.

      We thank the reviewer for this comprehensive overview of the existing literature and parts we have missed to include in the discussion. We agree with the reviewer, since our data shows that both pathways are probable. Through our manuscript, we have avoided using a competitive approach (that one pathway dominates over the other). Instead, we have evaluated both pathways independently and presented a comparative rather than competitive overview of both pathways from our observations. While we agree that experimental evidence suggests the inner leaflet pathway is possible, we cannot discount the observations made in previous studies that support the outer leaflet pathway, particularly Hedger et al. (2019), Bansal et al. (2023), and Kinnebrew et al. (2021). Therefore, considering the reviewer’s comments have made the following changes:

      (1) Added lines: (Paragraph 3, Conclusions):

      “We show that the barriers associated with the pathway starting from the outer leaflet are lower by ∼0.7 kcal, (p=0.0013). We also provide evidence that cholesterol can enter SMO via both leaflets, considering that multiple computational and experimental studies have found cholesterol entry sites and activation modulation via the outer leaflet, between TM2TM3. This is countered by evidence from multiple experimental and computational studies corroborating entry via the inner leaflet, between TM5-TM6, including this study. Overall, we posit that cholesterol translocation from either pathway is feasible.”

      (2)nChanges made: (Paragraph 6, Results subsection 2)

      “Based on our experimental and computational data, we conclude that cholesterol translocation can happen via either pathway. This is supported on the basis of the following observations: mutations along pathway 2 affect SMO activity more significantly, and the presence of a direct conduit that connects the inner leaflet to the TMD binding site. In addition, a resolved structure of SMO in the presence of cholesterol shows a cholesterol situated at the entry point from the membrane into the protein between TM5 and TM6, in the inner leaflet. However, we also observe that pathway 1 shows a lower thermodynamic barrier (5.8 ± 0.7 kcal/mol vs. 6.5 ± 0.8 kcal/mol, p \= 0.0013). Additionally, PTCH1 controls cholesterol accessibility in the outer leaflet. This shows that there is a possibility for transport from both leaflets. One possibility that might alter the thermodynamic barriers is native membrane asymmetry, particularly the anionic lipid-rich inner leaflet. This presents as a limitation of our current model.”

      (3)nChanges made: (Paragraph 1, Results subsection 2)

      “In a structure resolved in 2022, cholesterol was observed at the interface between the protein and the membrane, in the inner leaflet, between TMs 5 and 6. However, cholesterol in the inner leaflet has a downward orientation, with the polar hydroxyl group pointing intracellularly (η). A striking observation is that this cholesterol binding site pose was never used as a starting point for simulations and was discovered independent of the pose described in Zhang et al. (2022) (Figure 4—Figure Supplement 1).”

      (3) Alternative Possibility: Direct Membrane Access to CRD

      The possibility that the CRD extracts cholesterol directly from the membrane outer leaflet is not considered. While the crystal structures place the CRD in a stable pose above the membrane, multiple cryo-EM studies suggest that the CRD is dynamic and adopts a variety of conformations, raising the possibility that the stability of the CRD in the crystal structures is a result of crystal packing and that the CRD may be far more dynamic under more physiological conditions.

      Recommendation: The authors should explicitly acknowledge and evaluate this potential mechanism and, if feasible, assess its plausibility through MD simulations.

      We thank the reviewer for the suggestion. We have addressed this comment by calculating the distance from the lipid headgroups for each lipid in the membrane to the cholesterol binding site. We show that in our study, we do not observe any bending of the CRD over the membrane, precluding any cholesterol from being extracted from the membrane directly.

      Added lines: (Paragraph 3, Conclusions):

      “An alternative possibility states that the flexibility associated with the CRD would allow it to directly access the membrane, and consequently, cholesterol. In the extensive simulations reported in this study, the binding site of cholesterol in the CRD remains at least 20 Å away from the nearest lipid head group in the membrane, suggesting that such direct extraction and the bending of the CRD do not occur within the timescales sampled (Appendix 2 – Figure 6).

      The mechanistic details of this process are still unexplored and form the basis of future work.”

      (4) Inconsistent Framing of Study Scope and Limitations

      The discussion contains some contradictory and misleading language. For example, the authors state that "In this study we only focused on the cholesterol movement from the membrane to the CRD binding site," and then several sentences later state that "We outline the entire translocation mechanism from a kinetic and thermodynamic perspective." These statements are at odds. The former appropriately (albeit briefly) notes the limited scope of the modeling, while the latter overstates the generality of the findings.

      In addition, the authors’ narrow focus on the CRD site constitutes a major caveat to the entire work. It should be acknowledged much earlier in the manuscript, preferably in the introduction, rather than mentioned as an aside in the penultimate paragraph of the conclusion.

      Recommendation: The authors should clarify the scope of the study and expand the discussion of its limitations. They should explicitly acknowledge that the study models one of several cholesterol access routes and that the findings do not rule out alternative pathways.

      We thank the reviewer for the suggestion. We have addressed this comment by explicitly mentioning the scope of the study.

      Changes made: (Paragraph 3, Conclusions)

      “We outline the entire translocation mechanism from a kinetic and thermodynamic perspective for one of the leading hypotheses for the activation mechanism of SMO.”

      (5) Summary:

      This study has the potential to make a useful contribution to our understanding of cholesterol translocation and SMO activation. However, in its current form, the manuscript presents an overly narrow and, at times, misleading view of the literature and biological models; as such, it is not nearly as impactful as it could be. I strongly encourage the authors to revise the manuscript to include:

      (1) A more balanced discussion of the CRD vs. TMD binding sites.

      (2) Acknowledgment of alternative cholesterol access pathways.

      (3) More comprehensive citation of prior structural and functional studies.

      (4) Clarification of assumptions and scope.

      Of note, the above suggestions require little to no additional MD simulations or experimental studies, but would significantly enhance the rigor and impact of the work.

      We thank the reviewer for the suggestions. We have taken into account the literature and diverse viewpoints. We have changed the initial discussion and reflected a more general outlook. In the revised version of the manuscript, we have refrained from referring to the CRD site as the orthosteric site. Instead, we refer to it as the CRD sterol-binding site. To better represent the dual-site model, we add further discussion in the Introduction. Through our manuscript, we have avoided using a competitive approach (that one pathway dominates over the other). Instead, we have evaluated both pathways independently and presented a comparative rather than competitive overview of both pathways from our observations. We explicitly mention the scope of the study.

    1. eLife Assessment

      This valuable study uses tools of population and functional genomics to examine long non-coding RNAs (lncRNAs) in the context of human evolution. Analyses of computationally predicted human-specific lncRNAs and their genomic targets lead to the development of hypotheses regarding the potential roles of these genetic elements in human biology. Compared to previous versions, the conclusions regarding evolutionary acceleration and adaptation have become more solid by more fully taking data and literature on human/chimpanzee genetics and functional genomics into account.

    2. Joint Public Review:

      While DNA sequence divergence, differential expression and differential methylation analysis have been conducted between humans and the great apes to study changes that "make us human", the role of lncRNAs and their impact on the human genome and biology has not been fully explored. In this study the authors computationally predict HSlncRNAs as well as their DNA Binding sites using a method they have developed previously and then examine these predicted regions with different types of enrichment analyses. Broadly the analysis are straightforward and after identifying these regions/HSlncRNAs they examined their effects using different external datasets.

      Comments on the latest version from Reviewer #2:

      I think this is as good as it is going to get, and I do appreciate that the authors are still engaging in good faith after all these rounds of revision, so I am happy to stop here! I do think the paper is significantly improved from the last time around, and the conclusions have been tempered significantly.

    3. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #2 (Public review):

      In this valuable manuscript, Lin et al attempt to examine the role of long non coding RNAs (lncRNAs) in human evolution, through a set of population genetics and functional genomics analyses that leverage existing datasets and tools. Although the methods are incomplete and at times inadequate, the results nonetheless point towards a possible contribution of long non coding RNAs to shaping humans, and suggest clear directions for future, more rigorous study.

      Comments on revisions:

      I thank the authors for their revision and changes in response to previous rounds of comments. As before, I appreciate the changes made in response to my comments, and I think everyone is approaching this in the spirit of arriving at the best possible manuscript, but we still have some deep disagreements on the nature of the relevant statistical approach and defining adequate controls. I highlight a couple of places that I think are particularly relevant, but note that given the authors disagree with my interpretation, they should feel free to not respond!

      (1) On the subject of the 0.034 threshold, I had previously stated: "I do not agree with the rationale for this claim, and do not agree that it supports the cutoff of 0.034 used below."

      In their reply to me, the authors state:

      "What we need is a gene number, which (a) indicates genes that effectively differentiate humans from chimpanzees, (b) can be used to set a DBS sequence distance cutoff. Since this study is the first to systematically examine DBSs in humans and chimpanzees, we must estimate this gene number based on studies that identify differentially expressed genes in humans and chimpanzees. We choose Song et al. 2021 (Song et al. Genetic studies of human-chimpanzee divergence using stem cell fusions. PNAS 2021), which identified 5984 differentially expressed genes, including 4377 genes whose differential expression is due to trans-acting differences between humans and chimpanzees. To the best of our knowledge, this is the only published data on trans-acting differences between humans and chimpanzees, and most HS lncRNAs and their DBSs/targets have trans-acting relationships (see Supplementary Table 2). Based on these numbers, we chose a DBS sequence distance cutoff of 0.034, which corresponds to 4248 genes (the top 20%), slightly fewer than 4377."

      I have some notes here. First, Agoglia et al, Nature, 2021, also examined the nature of cis vs trans regulatory differences between human and chimps using a very similar set up to Song et al; their Supplementary Table 4 enables the discovery of genes with cis vs trans effects although admittedly this is less straightforward than the Song et al data. Second, I can't actually tell how the 4377 number is arrived at. From Song et al, "Of 4,671 genes with regulatory changes between human-only and chimpanzee-only iPSC lines, 44.4% (2,073 genes) were regulated primarily in cis, 31.4% (1,465 genes) were regulated primarily in trans, and the remaining 1,133 genes were regulated both in cis and in trans (Fig. 2C). This final category was further broken down into a cis+trans category (cis- and transregulatory changes acting in the same direction) and a cis-trans category (cis- and trans-regulatory changes acting in opposite directions)." Even when combining trans-only and cis&trans genes that gives 2,598 genes with evidence for some trans regulation. I cannot find 4,377 in the main text of the Song et al paper.

      Elsewhere in their response, the authors respond to my comment that 0.034 is an arbitrary threshold by repeating the analyses using a cutoff of 0.035. I appreciate the sentiment here, but I would not expect this to make any great difference, given how similar those numbers are! A better approach, and what I had in mind when I mentioned this, would be to test multiple thresholds, ranging from, eg,0.05 to 0.01 <DBS dist =0.01 -> 0.034 -> 0.05> at some well-defined step size.

      (1) We sincerely thank the reviewer for this critical point. Our initial purpose, based on DBS distances from the human genome to chimpanzee genome and archaic genomes, was that genes with large DBS distances may have contributed more to human evolution. However, our ORA (overrepresentation analysis) explored only genes with large DBS distances (the legend of old Figure 2 was “1256 target genes whose DBSs have the largest distances from modern humans to chimpanzees and Altai Neanderthals are enriched in different Biological Processes GO terms”), with the use of the cutoff (threshold) of 0.034 for defining large distance. The cutoff is not totally unreasonable (as our new results and the following sensitivity analysis indicate), but this approach was indirect and flawed.

      (2) We have now performed ORA using two methods. The first uses only DBS distances. Instead of using a cutoff, we now sort genes by DBS distance (human-chimpanzee distances and human-Altai Neanderthal distance, respectively, see Supplementary Table 5) and use the top 25% and bottom 25% of genes to perform ORA. This directly examines whether DBS distances along indicate that genes with large DBS distances contribute more to human evolution than genes with small DBS distances. The second also explores the ASE genes (allele-specific expression, genes undergoing human/chimpanzee-specific regulation in the tetraploid human–chimpanzee hybrid iPS) reported by Agoglia et al. 2021. We select the top 50% and bottom 50% of genes with large and small DBS distances, intersect them with ASE genes from Agoglia et al. 2021 (their Supplementary Table 4), and apply ORA to the intersections. Both the results are that: (a) more GO terms are obtained from genes with large DBS distances, (b) more human evolution-related GO terms are obtained from genes with large DBS distances (Supplementary Table 5,6,7; Figure 2; Supplementary Fig. 15). These results directly suggest that genes with large DBS distances contribute more to human evolution than genes with small DBS distances, which is a key theme of the study.

      (3) Regarding Song et al 2021, the statement of “we differentiated…allotetraploid (H1C1a, H1C1b, H2C2a, H2C2b) lines into ectoderm, mesoderm, and endoderm” made us assume that their differentiated hybrid cell lines cover more tissue types than those of Agoglia et al. 2021. Now, upon re-examining Supplementary Table 5 of Song et al. and Supplementary Table 4 of Agoglia et al. 2021, we find that the latter more clearly indicates significant ASE genes (p-adj<0.01 and |LFC>0.5| in GRCh38 and PanTro5).

      (4) We have also performed two additional analyses in response to the suggestion of “test multiple thresholds, ranging from, eg, 0.05 to 0.01 <DBS dist =0.01 -> 0.034 -> 0.05> at some well-defined step size”. First, we performed a multi-threshold sensitivity analysis using a spectrum of cutoffs (0.03, 0.034, 0.04, 0.05), and tracked the number of genes identified and the enrichment significance of key GO terms (e.g., "neuron projection development," "behavior") across these thresholds. The result confirms that while the absolute number of genes varies with the cutoffs, the core biological conclusion (specifically, the significant enrichment of target genes in neurodevelopmental and cognitive functions) remains stable and significant. For instance, "behavior" maintains strong statistical significance (FDR<0.01) in both the human-chimpanzee and human-Altai Neanderthal comparisons across all tested cutoffs, and "Neuron projection development" also remains significant across three (0.03, 0.034, 0.04) of the four cutoffs in the Altai comparison. This pattern suggests that our core findings regarding neurodevelopmental functions are robust across a range of cutoffs. Nevertheless, we did not extend the analysis to smaller cutoffs (e.g., 0.01 or 0.02) because such values would identify an excessively large number of genes (>10000) for ORA, which would render the GOterm enrichment analysis less meaningful due to a loss of specificity.

      Second, we have performed an additional validation to directly evaluate whether the 0.034 cutoff itself represents a stringent and biologically meaningful value. We sought to empirically determine how often a DBS sequence distance of 0.034 or greater might occur by chance in promoter regions, thereby testing its significance as a marker of potential evolutionary divergence. We randomly sampled 10,000 windows from annotated promoter regions across the hg38 genome, each with a size matching the average length of DBSs (147 bp). We then calculated the per-base sequence distances for these random windows between modern humans and chimpanzees, as well as between modern humans and the three archaic humans (Altai, Denisovan, Vindija). The analysis reveals that a distance of ≥0.034 is a rare event in random promoter sequences: for Human-Chimp, Human-Altai, HumanDenisovan, and Human-Vindija, 5.49% (549/10000), 0.31% (31/10000), 4.47% (447/10000), and0.03% (3/10000) of random windows reach this distance. This empirical evidence suggests that 0.034 is a sufficiently strong cutoff for defining large DBS distance, it would occur very unlikely in a random genomic background (P<0.1 for Chimpanzee and P<0.05 for the archaic humans), and DBSs exceeding this cutoff are significantly enriched for sequences that have undergone substantial evolutionary change instead of being random neutral variations.  

      (5) We present new Figure 2, Supplementary Table 5,6,7, and Supplementary Fig. 15. We have substantially revised section 2.3, related sections in Results, Supplementary Note 3, and Supplementary Table 8. We have removed related descriptions and explanations in the main text and Supplementary Notes. The results of the above two analyses are presented here as two Author response images.

      Author response image 1.

      Sensitivity analysis of GO-term enrichment across different DBS sequence distance cutoffs. The table shows the numbers of target genes identified and the false discovery rates (FDR) for the enrichment of three selected GO terms at four different distance cutoffs. Note that, unlike in the old Figure 2, the results for chimpanzees and Altai Neanderthals are not directly comparable here, as the numbers of target genes used for the enrichment analysis differ between them at each cutoff.

      Author response image 2.

      Distribution of per-base sequence distances for DBS size-matched random genomic windows in Ensembl-annotated promoter regions, calculated between modern humans and (A) chimpanzee, (B) Altai Neanderthal, (C) Denisovan, and (D) Vindija Neanderthal genomes.

      (2) The authors have introduced a new TFBS section, as a control for their lncRNAs - this is welcome, though again I would ask for caution when interpreting results. For instance, in their reply to me the authors state: "The number of HS TFs and HS lncRNAs (5 vs 66) <HS TF vs all HS lncRNAs> alone lends strong evidence suggesting that HS lncRNAs have contributed more significantly to human evolution than HS TFs (note that 5 is the union of three intersections between <many2zero + one2zero> and the three <human TF list>)."

      But this assumes the denominator is the same! There are 35899 lncRNAs according to the current GENCOVE build; 66/35899 = 0.0018, so, 0.18% of lncRNAs are HS. The authors compare this to 5 TFs. There are 19433 protein coding genes in the current GENCOVE build, which naively (5/19433) gives a big depletion (0.026%) relative to the lnc number. However, this assumes all protein coding genes are TFs, which is not the case. A quick search suggests that ~2000 protein coding genes are TFs (see, eg, https://pubmed.ncbi.nlm.nih.gov/34755879/); which gives an enrichment (although I doubt it is a statistically significant one!) of HS TFs over HS lncRNAs (5/2000 = 0.0025). Hence my emphasis on needing to be sure the controls are robust and valid throughout!

      We thank the reviewer for this comment. While 5 vs 66 reveals a difference, a direct comparison is too simplified. The real take-home message of the new TFBS section is not the numbers but the distributions of HS TFs’ targets and HS lncRNAs’ targets across GTEx organs and tissues (Figure 3 and Supplementary Figures 24, 25) - correlated HS lncRNA-target transcript pairs are highly enriched in brain regions, but correlated HS TF-target transcript pairs are distributed broadly across GTEx tissues and organs. We have now removed the simple comparison of “5 vs 66” and more carefully explained our comparison in section 2.6.

      (3) In my original review I said: line 187: "Notably, 97.81% of the 105141 strong DBSs have counterparts in chimpanzees, suggesting that these DBSs are similar to HARs in evolution and have undergone human-specific evolution." I do not see any support for the inference here. Identifying HARs and acceleration relies on a far more thorough methodology than what's being presented here. Even generously, pairwise comparison between two taxa only cannot polarise the direction of differences; inferring human-specific change requires outgroups beyond chimpanzee.

      In their reply to me, the authors state:

      Here, we actually made an analogy but not an inference; therefore, we used such words as "suggesting" and "similar" instead of using more confirmatory words. We have revised the latter half sentence, saying "raising the possibility that these sequences have evolved considerably during human evolution".

      Is the aim here to draw attention to the ~2.2% of DBS that do not have a counterpart? In that case, it would be better to rewrite the sentence to emphasise those, not the ones that are shared between the two species? I do appreciate the revised wording, though.

      (1) Our original phrasing may be misleading, and we agree entirely that “pairwise comparison between two taxa only cannot polarise the direction of differences; inferring human-specific change requires outgroups beyond chimpanzee”. As explained in that reply, we know and think that DBSs and HARs are two different classes of sequences, and indeed, identifying HARs and acceleration relies on a far more thorough methodology. Yet, three factors prompted us to compare them. First, both suggest the importance of sequences outside genes. Second, both are quite “old” sequences and have undergone considerable evolution recently (although the references are different). Third, both have contributed greatly to human brain evolution.  

      (2) Here, our stress is 97.81% but not 2.2%, and we have made this analogy more clearly and cautiously. Relevant revisions have been made in the Results, Discussion, and Methods sections.   

      (3) We also have further determined whether the 2.2% DBSs are human-specific gains by analyzing them using the UCSC Multiz Alignments of 100 Vertebrates. The result confirms that all 2248 DBSs are present in the human genome but are absent from the chimpanzee genome and all other aligned vertebrate genomes. We add this result into the manuscript.

      (4) Finally, Line 408: "Ensembl-annotated transcripts (release 79)" Release 79 is dated to March 2015, which is quite a few releases and genome builds ago. Is this a typo? Both the human and the chimpanzee genome have been significantly improved since then!

      (1) We thank the reviewer for this comment, which prompts us to provide further explanation and additional data. First, we began predicting HS lncRNAs’ DBSs when Ensembl release 79 was available, but did not re-predict DBSs when new Ensembl releases were published because (a) these new Ensembl releases are based also on hg38, (b) we did not find any fault in the LongTarget program during our use, nor received any one from users, (c) predicting lncRNAs’ DBSs using the LongTarget program is highly time-consuming.  

      (2) Second, to assess the influence of newer Ensembl releases, we compared the promoters annotated in release 79 and in release 115. We found that the vast majority (87.3%) of promoters newly annotated in release 115 belong to non-coding genes. Thus, using release 115 may predict more DBSs in non-coding genes, but downstream analyses based on protein-coding genes would be essentially the same (meaning that all figures and tables would be the same).

      (3) Third, a key element of this study is GTEx data analysis, and these data were also published years ago.  

      (4) Finally, some lncRNA genes have new gene symbols in new Ensembl releases. To allow researchers to use our data conveniently, we have added a new column titled "Gene symbol (Ensembl release115)" to Supplementary Tables 2A and 2B.  

      Summary:

      Major changes based on Reviewer’s comments:

      (1) The following revisions are made to address the comment on “the 0.034 threshold”: (a) Section 2.3, section 2.4, Supplementary Note 3, and related contents in Discussion and Methods are revised, (b) new Figure 2, Supplementary Figure 15, new Supplementary Table 5,6,7, (c) Table 2 and Supplementary Table 8 are revised.

      (2) To address the comment on “new TFBS section”, section 2.6 and section 4.13 are revised.  

      (3) To address the comment on “97.81% and 2.2% of DBSs”, section 2.3 is revised.

      (4) The following revisions are made to address the comment on “release 79”: (a) the old Supplementary Table 2, 3 are merged to Supplementary Table 2AB, and the new column "Gene symbol (Ensembl release115)" is added to Supplementary Table 2AB, (b) accordingly, Supplementary Table 4,5 are renamed to Supplementary Table 3,4.

      Additional revisions:

      (1) Section 2.5 “Young weak DBSs may have greatly promoted recent human evolution” is moved into Supplementary Note 3 (which now has the subtitle “Target genes with specific DBS features are enriched in specific functions”), because this section is short and lacking sufficient cross-validation.

      (2) Considerable minor revisions of sentences have been made.

      (3) Since there are many supplementary figures, the main text now cites only Supplementary Notes, as the reader can easily access supplementary figures in Supplementary Notes.

    1. eLife Assessment

      This study addresses a key, long-standing question about how visual feature selectivity is organized in mid-level visual cortex, using an ambitious combination of large-scale neural recordings and image synthesis. It provides important insights into the complexity of single-neuron selectivity and suggests a structured organization across cortical depth. While the evidence is generally solid and technically impressive, several key claims would be strengthened by additional controls, particularly regarding the sources of similarity across neurons and the dependence of the results on modeling choices.

    2. Reviewer #1 (Public review):

      Willeke et al. hypothesize that macaque V4, like other visual areas, may exhibit a topographic functional organization. One challenge to studying the functional (tuning) organization of V4 is that neurons in V4 are selective for complex visual stimuli that are hard to parameterize. Thus, the authors leverage an approach comprising digital twins and most exciting stimuli (MEIs) that they have pioneered. This data-driven, deep-learning framework can effectively handle the difficulty of parametrizing relevant stimuli. They verify that the model-synthesized MEIs indeed drive V4 neurons more effectively than matched natural image controls. They then performed psychophysics experiments (on humans) along with the application of contrastive learning to illustrate that anatomically neighboring neurons often care about similar stimuli. Importantly, the weaknesses of the approach are clearly appreciated and discussed.

      Comments:

      (1) The correlation between predictions and data is 0.43. I'd agree with the authors that this is "reliable" and would recommend that they discuss how the fact that performance is not saturated influences the results.

      (2) Modeling V4 using a CNN and claiming that the identified functional groups look like those found in artificial vision systems may be a bit circular.

      (3) No architecture other than ResNet-50 was tested. This might be a major drawback, since the MEIs could very well be reflections of the architecture and also the statistics of the dataset, rather than intrinsic biological properties. Do the authors find the same result with different architectures as the basis of the goal-driven model?

      (4) The closed-loop analysis seems to be using a much smaller sample of the recorded neurons - "resulting in n=55 neurons for the analysis of the closed-loop paradigm".

      (5) A discussion on adversarial machine learning and the adversarial training that was used is lacking.

    3. Reviewer #2 (Public review):

      This is an ambitious and technically powerful study, investigating a long-standing question about the functional organization of area V4. The project combined large-scale single-unit electrophysiology in macaque V4 with deep learning-based activation maximization to characterize neuronal tuning in natural image space. The authors built predictive encoding models for V4 neurons and used these models to synthesize most exciting images (MEIs), which are subsequently validated in vivo using a closed-loop experimental paradigm.

      Overall, the manuscript advances three main claims:

      (1) Individual V4 neurons showed complex and highly structured selectivity for naturalistic visual features, including textures, curvatures, repeating patterns, and apparently eye-like motifs.

      (2) Neurons recorded along the same linear probe penetration tended to have more similar MEIs than neurons recorded at different cortical locations (this similarity was supported by human psychophysics and by distances in a learned, contrastive image embedding space).

      (3) MEIs clustered into a limited number of functional groups that resembled feature visualizations observed in deep convolutional neural networks.

      Strengths:

      (1) The study is important in that it is the first to apply activation maximization to neurons sampled at such fine spatial resolution. The authors used 32-channel linear silicon probes, spanning approximately 2 mm of cortical depth, with inter-contact spacing of roughly 60 µm. This enabled fine sampling across most of the cortical thickness of V4, substantially finer resolution than prior Utah-array or surface-biased approaches.

      (2) A key strength is the direct in vivo validation of model-derived synthetic images by stimulating the same neurons used to build the models, a critical step often absent in other neural network-based encoding studies.

      (3) More broadly, the study highlights the value of probing neuronal selectivity with rich, naturalistic stimulus spaces rather than relying exclusively on oversimplified stimuli such as Gabors.

      Weaknesses:

      (1) A central claim is that neurons sampled within the same penetration shared MEI tuning properties compared to neurons sampled in different penetrations because of functional organization. I am concerned about technical correlations in activity due to technical or methodology-related approaches (for example, shared reference or grounding) instead of functional organization alone. These recordings were obtained with linear silicon probes, and there have been observations that neuronal activity along this type of probe (including neuropixels probes) may be correlated above what prior work showed, using manually advanced single electrodes. For example, Fujita et al. (1992) showed finer micro-domains and systematic changes in selectivity along a cortical penetration, and it is not clear if that is true or detectable here. I think that the manuscript would be strengthened by a more thorough and explicit characterization of lower-level response correlations (at the neuronal electrophysiology level) prior to starting with fitting models. In particular, the authors could examine noise correlations along the electrode shaft (using the repeated test images, for example), as well as signal correlations in tuning, both within and across sessions. It would also be helpful to clarify whether these correlations depended on penetration day, recording chamber hole (how many were used?), or spatial separation between penetrations, and whether repeated use of the same hole yielded stable or changing correlations. Illustrations of the peristimulus time histogram changes across the shaft and across penetrations would also help. All of this would help us understand if the reports of clustering were technically inevitable due to the technique.

      (2) It is difficult to understand a story of visual cortex neurons without more information about their receptive field locations and widths, particularly given that the stimulus was full-screen. I understand that there was a sparse random dot stimulus used to find the population RF, so it should be possible to visualize the individual and population RFs. Also, the investigators inferred the locations of the important patches using a masking algorithm, but where were those masks relative to the retinal image, and how distributed were they as a function of the shaft location? This would help us understand how similar each contact was.

      (3) A major claim is that V4 MEIs formed groups that were comparable to those produced by artificial vision systems, "suggesting potential shared encoding strategies." The issue is that the "shared encoding strategy" might be the authors' use of this same class of models in the first place. It would be useful to know if different functional groups arise as a function of other encoding neural network models, beyond the robust-trained ResNet-50. I am unsure to what extent the reported clustering, depth-wise similarity, and correspondence to artificial features depended on architectural and training bias. It would substantially strengthen the manuscript to test whether a similar organizational structure would emerge using alternative encoding models, such as attention-based vision transformers, self-supervised visual representations, or other non-convolutional architectures. Another important point of contrast would be to examine the functional groups encoded by the ResNet architecture before its activations were fit to V4 neuronal activity: put simply, is ResNet just re-stating what it already knows?

      (4) Several comparisons to prior work are presented largely at a qualitative level, without quantitative support. For example, the authors state that their MEIs are consistent with known tuning properties of macaque V4, such as selectivity for shape, curvature, and texture. However, this claim is not supported by explicit image analyses or metrics that would substantiate these correspondences beyond appeal to visual inspection. Incorporating quantitative analyses, for instance, measures of curvature, texture statistics, or comparisons to established stimulus sets, would strengthen these links to prior literature and clarify the relationship between the synthesized MEIs and previously characterized V4 tuning properties.

    4. Author response:

      We thank the reviewers for their careful reading and constructive feedback. We were glad to see that they recognized both the technical scope of the study and its contribution as the first to apply activation maximization with such fine spatial sampling. Their appreciation for the critical in vivo validation of model-derived stimuli is very encouraging.

      The reviewers raised several important points that we plan to address in the revised manuscript. These center on:

      Model Architecture and Potential Circularity:

      Both reviewers raised the concern that using a CNN-based model could introduce circularity when comparing V4 functional groups to artificial vision systems, and questioned whether similar results would emerge with alternative architectures. We believe that the in vivo verification provides a critical control for this concern: the MEIs synthesized by our model were empirically validated to elicit significantly higher responses than matched natural image controls, demonstrating that the model captures genuine biological tuning properties rather than architectural artifacts. This means that even if these features emerged from the particular architectural choice, the biological neurons seem to prefer the same features. We will clarify this point in the respective section in the revised manuscript.

      Recording locations and spike sorting contamination:

      Reviewer #2 raised concerns about potential correlation artefacts along the silicon probe. Unfortunately, assessing functional correlations across sessions proved challenging because neurons recorded at different penetration sites had non-overlapping receptive fields, precluding direct comparison of responses to identical stimuli across recording sites. We will make this limitation explicit in the manuscript. Furthermore, we maintain conservative standards for spike sorting to minimize the risk of multi-unit activity (MUA) "smearing" across unit definitions. Our primary analyses are restricted to well-isolated single units that meet all isolation metrics. Due to our low-impedance ground placed on the bone, shared-reference contamination as a source of tuning similarity is also mitigated.

      Quantitative Comparisons to Prior Literature:

      Reviewer #2 also noted that our comparisons between MEIs and known V4 tuning properties (e.g., shape, curvature, texture selectivity) were presented qualitatively, and suggested that explicit image analyses or metrics would strengthen these links to prior literature. We will revise the text to more carefully frame these comparisons as qualitative observations consistent with prior findings.

      Alternative Similarity Metrics:

      We will expand our justification for the Böhm et al. contrastive embedding approach in the Methods section. However, we believe that a systematic comparison of multiple clustering and similarity methods is beyond the scope of the current study.

      In the revised manuscript, we will address these points primarily through clarifications and expanded discussion. Specifically, we will: (1) strengthen our discussion of model architecture choice emphasizing that in vivo verification serves as a critical control against architectural artifacts; (2) clarify the stringent matching criteria underlying our closed-loop sample size and its consistency with the larger population analyses; (3) explicitly describe the recording geometry, including the use of multiple grid holes, and explain why direct functional comparisons across penetrations were precluded by non-overlapping receptive fields; (4) better characterize the spatial relationship between receptive fields and MEI masks; (5) reframe comparisons to prior V4 literature as qualitative observations rather than quantitative validations; and (6) expand our justification for the contrastive embedding approach. We believe these revisions will improve the clarity and rigor of the manuscript while appropriately scoping the claims to what the current data support.

    1. eLife Assessment

      This valuable study introduces miRTarDS, a novel computational framework that predicts microRNA-target interactions based on a publicly available pretrained Sentence-BERT language model and downstream classification analysis. The strength of the evidence is incomplete, as the evaluation framework relies on unreliable ground-truth and false sets. Furthermore, the analysis fails to compare miRTarDS against existing state-of-the-art biomedical language models.

    2. Reviewer #1 (Public review):

      The author presents a new method for microRNA target prediction based on (1) a publicly available pretrained Sentence-BERT language model that the author fine-tunes using MeSH information and (2) downstream classification analysis for microRNA target prediction. In particular, the author's approach, named "miRTarDS", attempts to solve the microRNA target prediction problem by utilizing disease information (i.e., semantic similarity scores) from their language model. The author then compares the prediction performance with other sequence- and disease-based methods and attempts to show that miRTarDS is superior or at least comparable to existing methods. The author's general approach to this microRNA target prediction problem seems promising, but fails to demonstrate concrete computational evidence that miRTarDS outperforms other existing methods. The author's claim that disease information-based language models are sufficient is unfounded. The manuscript requires substantial rewriting and reorganization for readers with a strong background in biomedical research.

      A major issue related to the author's claim of computational advance of miRTarDS: The author does not introduce existing biomedical-specific language models, and does not compare them against miRTarDS's fine-tuned model. The performance of miRTarDS is largely dependent on the semantic embedding of disease terms. The author shows in Figure 5 that MeSH-based fine-tuning leads to a substantial improvement in MeSH-based correlation compared to the publicly available pretrained SBERT model "multi-qa-MiniLM-L6-cos-v1" without sacrificing a large amount of BIOSSES-based correlation. However, the author does not compare the performance of MeSH- and BIOSSES-based correlation with existing language models such as ChatGPT, BioBERT, PubMedBERT, and more. Also, the substantial improvement in MeSH-based correlation is a mere indication that the MeSH-based fine-tuning strategy was reasonable and not that it's superior to the publicly available pretrained SBERT model "multi-qa-MiniLM-L6-cos-v1".

      Another major issue is in the author's claim that disease-information from miRTarDS's language model is "sufficient" for accurate microRNA target prediction. Available microRNA targets with experimental evidence are largely biased for those with disease implications that have been reported in the biomedical literature. It's possible that their language model is biased by existing literature that has also been used to build microRNA target databases. Therefore, it is important that the author provides strong evidence that excludes the possibility of data leakage circularity. Similar concerns are prevalent across the manuscript, and so I highly recommend that the author reassess the evaluation frameworks and account for inflated performance, biased conclusions, and self-confirming results.

      Last but not least, the manuscript requires a deeper and careful description and computational encoding of microRNA biology. I'd advise the author to include an expert in microRNA biology to improve the quality of this manuscript. For example, the author uses the pre-miRNA notation and replaces the mature miRNA notation to maintain computational encoding consistency across databases. However, the mature microRNA notation "the '-3p' or '-5p' is critical as the 3p and 5p mature microRNAs have different seed sequences and thus different mRNA targets. The 3p mature microRNA would most likely not target an mRNA targeted by the 5p mature microRNA.

    3. Reviewer #2 (Public review):

      Summary:

      This study introduces a novel knowledge-driven approach, miRTarDS, which enables microRNA-Target Interaction (MTI) prediction by leveraging the disease association degree between a miRNA and its target gene. The core hypothesis is that this single feature is sufficient to distinguish experimentally validated functional MTIs from computationally predicted MTIs in a binary classification setting. To quantify the disease association, the authors fine-tuned a Sentence-BERT (SBERT) model to generate embeddings of disease descriptions and compute their semantic similarity. Using only this disease association feature, miRTarDS achieved an F1 score of 0.88 on the test set.

      Strengths:

      The primary strength is the innovative use of the disease association degree as an independent feature for MTI classification. In addition, this study successfully adapts and fine-tunes the Sentence-BERT (SBERT) model to quantify the semantic similarity between biomedical texts (disease descriptions). This approach establishes a critical pathway for integrating powerful language models and the vast growth in clinical/disease data into biochemical discovery, like MTI prediction.

      Weaknesses:

      The main weakness lies in its definition of the ground-truth dataset, which serves as a foundation for methodological evaluation. The study defines the Negative Set as computationally predicted MTIs that lack experimental evidence. However, the absence of experimental validation does not equate to non-functionality. Similarly, the miRAW sets are classified by whether the target and miRNA could form a stable duplex structure according to RNA structure prediction. This definition is biologically irrelevant, as duplex stability does not fully encapsulate the complex in vivo binding of miRNAs within the AGO protein complex.

    4. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      The author presents a new method for microRNA target prediction based on (1) a publicly available pretrained Sentence-BERT language model that the author fine-tunes using MeSH information and (2) downstream classification analysis for microRNA target prediction. In particular, the author's approach, named "miRTarDS", attempts to solve the microRNA target prediction problem by utilizing disease information (i.e., semantic similarity scores) from their language model. The author then compares the prediction performance with other sequence- and disease-based methods and attempts to show that miRTarDS is superior or at least comparable to existing methods. The author's general approach to this microRNA target prediction problem seems promising, but fails to demonstrate concrete computational evidence that miRTarDS outperforms other existing methods. The author's claim that disease information-based language models are sufficient is unfounded. The manuscript requires substantial rewriting and reorganization for readers with a strong background in biomedical research.

      We appreciate the reviewer’s careful examination of modeling, benchmarking, and interpretation, and we are particularly encouraged that they found the proposed method promising. We will make corresponding revisions to the manuscript based on the reviewer’s comments.

      A major issue related to the author's claim of computational advance of miRTarDS: The author does not introduce existing biomedical-specific language models, and does not compare them against miRTarDS's fine-tuned model. The performance of miRTarDS is largely dependent on the semantic embedding of disease terms. The author shows in Figure 5 that MeSH-based fine-tuning leads to a substantial improvement in MeSH-based correlation compared to the publicly available pretrained SBERT model "multi-qa-MiniLM-L6-cos-v1" without sacrificing a large amount of BIOSSES-based correlation. However, the author does not compare the performance of MeSH- and BIOSSES-based correlation with existing language models such as ChatGPT, BioBERT, PubMedBERT, and more. Also, the substantial improvement in MeSH-based correlation is a mere indication that the MeSH-based fine-tuning strategy was reasonable and not that it's superior to the publicly available pretrained SBERT model "multi-qa-MiniLM-L6-cos-v1".

      We thank the reviewer for the constructive suggestions regarding the benchmarking of language models. We acknowledge that the performance of miRTarDS largely depends on the semantic embeddings of disease terms. So, in the revisions, I will: 1) conduct a literature review to introduce existing biomedical-specific language models, and 2) perform a horizontal comparison between our fine-tuned model and these existing models, to more comprehensively evaluate the model’s capabilities.

      Another major issue is in the author's claim that disease-information from miRTarDS's language model is "sufficient" for accurate microRNA target prediction. Available microRNA targets with experimental evidence are largely biased for those with disease implications that have been reported in the biomedical literature. It's possible that their language model is biased by existing literature that has also been used to build microRNA target databases. Therefore, it is important that the author provides strong evidence that excludes the possibility of data leakage circularity. Similar concerns are prevalent across the manuscript, and so I highly recommend that the author reassess the evaluation frameworks and account for inflated performance, biased conclusions, and self-confirming results.

      We thank the reviewer for the comment. We recognize that existing experimentally validated microRNA targets may be biased toward those reported in biomedical literature as disease‑related. To mitigate this bias, we attempted to extract predicted microRNA targets that share a very similar number of miRNA- and gene‑ disease entries as the experimentally validated microRNA targets using the K‑Nearest Neighbors (KNN) method. Then applied Positive‑Unlabeled (PU) Learning to classify the two groups. PU‑Learning is designed to address scenarios where only a subset of the training data is explicitly labeled as positive, while the remaining data are unlabeled—with the unlabeled set containing both potential positives and true negatives—which is highly suitable for the application context of this manuscript [1]. Preliminary results show that after applying the new data extraction and classification approach, model performance drops to around F1=0.73 (the MISIM method also shows a decline, with F1 around 0.58; detailed code is available on GitHub). The specific reasons for this require further investigation.

      Last but not least, the manuscript requires a deeper and careful description and computational encoding of microRNA biology. I'd advise the author to include an expert in microRNA biology to improve the quality of this manuscript. For example, the author uses the pre-miRNA notation and replaces the mature miRNA notation to maintain computational encoding consistency across databases. However, the mature microRNA notation "the '-3p' or '-5p' is critical as the 3p and 5p mature microRNAs have different seed sequences and thus different mRNA targets. The 3p mature microRNA would most likely not target an mRNA targeted by the 5p mature microRNA.

      We thank the reviewer for the critique and suggestion. We fully agree with the reviewer that the distinction between the 3p and 5p mature strands is critical for determining mRNA targeting, as they possess distinct seed sequences. In our study, we relied on the miRNA–disease associations provided by the HMDD database, which annotates interactions at the pre-miRNA level: “… the enriched functions of each mature miRNA are aggregated to the corresponding miRNA precursor.” [2] Furthermore, existing literature suggests that the pre-miRNA level can be appropriate and informative for disease association analyses: “Compared with the mature miRNA method, the pre-miRNA method is more useful for studying disease association.” [3] We also find that, in some cases, both strands cooperate to regulate the same or complementary pathways [4]. We acknowledge the reviewer’s point as an important consideration for future revision. We plan to consult or collaborate with biologists to enhance the quality of the manuscript in biology.

      Reviewer #2 (Public review):

      This study introduces a novel knowledge-driven approach, miRTarDS, which enables microRNA-Target Interaction (MTI) prediction by leveraging the disease association degree between a miRNA and its target gene. The core hypothesis is that this single feature is sufficient to distinguish experimentally validated functional MTIs from computationally predicted MTIs in a binary classification setting. To quantify the disease association, the authors fine-tuned a Sentence-BERT (SBERT) model to generate embeddings of disease descriptions and compute their semantic similarity. Using only this disease association feature, miRTarDS achieved an F1 score of 0.88 on the test set.

      We thank the reviewers for their positive feedback, especially for their recognition of the novelty of this manuscript.

      Strengths:

      The primary strength is the innovative use of the disease association degree as an independent feature for MTI classification. In addition, this study successfully adapts and fine-tunes the Sentence-BERT (SBERT) model to quantify the semantic similarity between biomedical texts (disease descriptions). This approach establishes a critical pathway for integrating powerful language models and the vast growth in clinical/disease data into biochemical discovery, like MTI prediction.

      We would like to thank the reviewer again for their positive feedback. We appreciate their recognition of the novelty of our work, as well as their acknowledgment that the proposed method paves the way for integrating language models with clinical/disease data into biochemical discovery.

      Weaknesses:

      The main weakness lies in its definition of the ground-truth dataset, which serves as a foundation for methodological evaluation. The study defines the Negative Set as computationally predicted MTIs that lack experimental evidence. However, the absence of experimental validation does not equate to non-functionality. Similarly, the miRAW sets are classified by whether the target and miRNA could form a stable duplex structure according to RNA structure prediction. This definition is biologically irrelevant, as duplex stability does not fully encapsulate the complex in vivo binding of miRNAs within the AGO protein complex.

      We thank the reviewers for their constructive feedback. We have realized that treating predicted MTI as a negative class may pose some issues. Therefore, we have decided to adopt Positive Unlabeled (PU) Learning in subsequent updates. This classification method can be applied to datasets such as ours, which contain only positive classes and lack negative ones [1]. We used the miRAW dataset to enable a horizontal comparison of our method with traditional sequence-based prediction approaches. We acknowledge that miRAW may overlook some biological insights, and we plan to optimize the construction of test datasets in the future. Some preliminary explorations have already been conducted, and the relevant code is available on GitHub.

      Furthermore, we will make the following revisions: 1) We will clearly specify the version of miRBase and incorporate more miRNA-related databases. 2) Conduct a further literature review on miRNA biological mechanisms to enhance the quality of the manuscript in biology. 3) Perform a more comprehensive evaluation of the model’s performance. 4) Attempt to identify some representative MTIs that have been overlooked by existing prediction tools but can be predicted by our proposed method.

      References

      (1) Li, F., Dong, S., Leier, A., Han, M., Guo, X., Xu, J., ... & Song, J. (2022). Positive-unlabeled learning in bioinformatics and computational biology: a brief review. Briefings in Bioinformatics, 23(1), bbab461.

      (2) Huang, Z., Shi, J., Gao, Y., Cui, C., Zhang, S., Li, J., ... & Cui, Q. (2019). HMDD v3. 0: a database for experimentally supported human microRNA–disease associations. Nucleic acids research, 47(D1), D1013-D1017.

      (3) Wang, H., & Ho, C. (2023). The human pre-miRNA distance distribution for exploring disease association. International Journal of Molecular Sciences, 24(2), 1009.

      (4) Mitra, R., Adams, C. M., Jiang, W., Greenawalt, E., & Eischen, C. M. (2020). Pan-cancer analysis reveals cooperativity of both strands of microRNA that regulate tumorigenesis and patient survival. Nature Communications, 11(1), 968.

    1. eLife Assessment

      There is a perennial question in the field of birdsong: the contribution of the cerebellum to singing and processing song-related information. This study provides a valuable first step into this discussion, using electrophysiology of cerebellar neurons during a battery of assays, including singing and song playback. While the electrophysiological dataset here is novel and could shed light on key aspects of the neural control of vocal behavior, the evidence provided for the conclusions reached by the authors is currently incomplete.

    2. Reviewer #1 (Public review):

      In this study, Ursu, Centeno, and Leblois record from the cerebellum of zebra finches and analyze neurons for auditory and song-related activity. The paper covers a lot of ground, ranging from lesions of the deep nuclei to song and white noise playback inside and outside of singing, and some level of survey of response types across cerebellar lobules, to provide foundational information on cerebellar relationships with song. There are a number of interesting observations in the study, to me most notably, the lack of responsivity of song-related activity in lobule IV to distorted auditory feedback. This observation is interesting in light of the perennial idea that the cerebellum may participate in rapid error corrections in other somatic control domains. If such a role were relevant for song, it stands to reason that some alteration of activity could be found there. Of course, on the other hand, zebra finches do not show rapid corrections during DAF, so perhaps the null result does not resolve much. Nevertheless, these data are important steps forward in establishing the involvement or lack of involvement in a broader set of brain structures beyond the song control system typically studied. While the study presents some interesting and important inroads, in my opinion, there was a general lack of 'polish' to the study that led to ambiguity in the report and confusing displays. This detracted from rigorous reporting of the findings.

    3. Reviewer #2 (Public review):

      In this paper, the authors investigate the role of the cerebellum in song production in the zebra finch. First, they replicate prior studies to show that lesions of the lateral deep cerebellar nuclei (latDCN, primarily lobules IV-VII and IX) result in shorter duration syllables and song motifs than sham controls. The authors then record neural activity from the cerebellum during both passive auditory exposure in anesthetized birds and in freely singing animals. The authors claim that across multiple lobules, the cerebellum receives "non-selective" auditory inputs locked to syllable boundaries (based on acute recordings) and that cerebellar neurons display song-locked responses that are unaffected by auditory feedback perturbations (in chronic recordings). Moreover, the authors emphasized the distinct properties of lobule IV, which they argue is tightly locked to the onset and offset of syllables, and conclude that the cerebellum might contribute to the duration of song elements.

      This paper presents novel and useful descriptions of song-related neural activity in the cerebellum. However, there are multiple serious issues. First, there are major issues with the design and presentation of the analysis of the electrophysiological data; based on these, it is unclear whether the authors are justified in some of their conclusions about neural tuning or are entitled to any of their claims about the specific tuning or function of neurons in particular lobules. Second, because the authors' conceptual framework seems to ignore possible non-auditory inputs to the cerebellum, their results on (minimal) effects of auditory manipulation during singing are over-interpreted with respect to providing evidence of a forward model. Third, the paper's central assertion - that the songbird cerebellum may contribute to the duration of vocal events during song - was firmly established by a prior lesion study (Radic et al., 2024). Although the authors do cite this prior study with respect to longer-term postlesion changes after cerebellar lesions, this paper also showed a large change in syllable duration immediately after cerebellar lesion (Figure 5 in Radic et al). The electrophysiological results in the present paper could provide valuable insights into the neural mechanisms underlying this already-described role of the songbird cerebellum; however, given the other concerns above, it is not clear that the authors have done so.

    4. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      In this study, Ursu, Centeno, and Leblois record from the cerebellum of zebra finches and analyze neurons for auditory and song-related activity. The paper covers a lot of ground, ranging from lesions of the deep nuclei to song and white noise playback inside and outside of singing, and some level of survey of response types across cerebellar lobules, to provide foundational information on cerebellar relationships with song. There are a number of interesting observations in the study, to me most notably, the lack of responsivity of song-related activity in lobule IV to distorted auditory feedback. This observation is interesting in light of the perennial idea that the cerebellum may participate in rapid error corrections in other somatic control domains. If such a role were relevant for song, it stands to reason that some alteration of activity could be found there. Of course, on the other hand, zebra finches do not show rapid corrections during DAF, so perhaps the null result does not resolve much. Nevertheless, these data are important steps forward in establishing the involvement or lack of involvement in a broader set of brain structures beyond the song control system typically studied. While the study presents some interesting and important inroads, in my opinion, there was a general lack of 'polish' to the study that led to ambiguity in the report and confusing displays. This detracted from rigorous reporting of the findings.

      We thank reviewer #1 for his comments. We will clarify the possible misleading or ambiguous claims and interpretations in the present manuscript and polish the presentation of the results. We will also modify the discussion to better place or results within the current knowledge on cerebellum and songbirds, and in particular address the link between our findings and the low sensitivity to auditory feedback in zebra finches.

      Reviewer #2 (Public review):

      In this paper, the authors investigate the role of the cerebellum in song production in the zebra finch. First, they replicate prior studies to show that lesions of the lateral deep cerebellar nuclei (latDCN, primarily lobules IV-VII and IX) result in shorter duration syllables and song motifs than sham controls. The authors then record neural activity from the cerebellum during both passive auditory exposure in anesthetized birds and in freely singing animals. The authors claim that across multiple lobules, the cerebellum receives "non-selective" auditory inputs locked to syllable boundaries (based on acute recordings) and that cerebellar neurons display song-locked responses that are unaffected by auditory feedback perturbations (in chronic recordings). Moreover, the authors emphasized the distinct properties of lobule IV, which they argue is tightly locked to the onset and offset of syllables, and conclude that the cerebellum might contribute to the duration of song elements.

      This paper presents novel and useful descriptions of song-related neural activity in the cerebellum. However, there are multiple serious issues. First, there are major issues with the design and presentation of the analysis of the electrophysiological data; based on these, it is unclear whether the authors are justified in some of their conclusions about neural tuning or are entitled to any of their claims about the specific tuning or function of neurons in particular lobules. Second, because the authors' conceptual framework seems to ignore possible non-auditory inputs to the cerebellum, their results on (minimal) effects of auditory manipulation during singing are over-interpreted with respect to providing evidence of a forward model. Third, the paper's central assertion - that the songbird cerebellum may contribute to the duration of vocal events during song - was firmly established by a prior lesion study (Radic et al., 2024). Although the authors do cite this prior study with respect to longer-term postlesion changes after cerebellar lesions, this paper also showed a large change in syllable duration immediately after cerebellar lesion (Figure 5 in Radic et al). The electrophysiological results in the present paper could provide valuable insights into the neural mechanisms underlying this already-described role of the songbird cerebellum; however, given the other concerns above, it is not clear that the authors have done so.

      We thank reviewer #2 for these comments. We will improve the presentation of the results, in particular our cell-type classification of the electrophysiology recordings based on latest literature and  the statistics of the tuning differences between lobules. We will also modify the discussion regarding singing related internal models and consider non-auditory feedback. Finally, we will clarify the position of our work within the existing songbird literature and clarify what are the specific contributions of this work. We fully agree that prior studies have already shown the behavioural effects of lesions, as already clearly mentioned in introduction and discussion, and rather aimed at reproducing partially these results before diving into neural mechanisms. We will clarify this point in our revision.

    1. eLife Assessment

      This study makes a valuable contribution to our understanding of how the lipase regulator ABHD5 may control lipase activity through interactions with lipid droplets and cellular membranes. By combining multiscale molecular dynamics simulations with experimental approaches, the authors provide novel molecular insights into this membrane-protein interaction and present evidence suggesting that the regulatory mechanism depends on protein conformational changes and local membrane remodeling. While much of the evidence supporting the main conclusions is convincing, several aspects of the analysis, interpretation, and discussion remain incomplete. Overall, this work will be of interest to structural and molecular biologists working on lipid metabolism and membrane biophysics.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, the authors investigated the detailed structural mechanism of activation of ABHD5 upon interaction with lipid structures (bilayer and LD). The authors used an elaborate multiscale computational workflow, incorporating coarse-grained, all-atom, and enhanced-sampling molecular dynamics simulations, to propose a structural mechanism for the interaction and activation of ABHD5, as well as its specific interaction with TAG in LD. The authors then corroborated these observations with experimental studies involving hydrogen-deuterium exchange coupled with mass spectrometry of wild-type ABHD5 to assess the structural and conformational changes in ABHD5 upon binding, as well as mutagenesis with cell-based and in vitro assays monitoring membrane association, defining specific interactions that infer ABHD5 to localize LD.

      Strengths:

      The manuscript is well-written, and the data are reported in high-quality figures. The experimental design and data analysis are rigorous and support the conclusion. One major strength is the multiscale computational work that reveals a mechanism for the insertion of ABHD5 into lipid bilayers and LD involving the insertion of the N-term portion and the lid helix motif. The design of the computational workflow was very elaborate, and the undertaking was quite extensive, with multiple strategies to (GC, all-atom MD and GaMD). The authors then elegantly generate a hypothesis from these observations to experimentally corroborate the proposed mechanism. Particularly, the HDX-MS data support the engagement of the two regions upon binding, and the fluorescence microscopy data show the role of specific residues in localization/specificity to LD.

      Weaknesses:

      The following limitation is noted. Central to this manuscript is the model, as observed computationally, that initial lipid interaction by the N-term insertion is followed by the insertion of lid-helix in the membrane, which undergoes a conformational switch in the process. However, HDX-MS reveals that, in the unbound form, the lid helix region displays a bimodal isotopic envelope, revealing two species, one with low uptake, suggesting a structured species and one with high uptake, suggesting a less structured species. It is unclear from the manuscript whether the authors think the bimodality fits EX1 regime kinetics or not. Regardless, the model of unbound ABHD5 shows a lid-helix region devoid of secondary structure (Figure 5A), which is more consistent with the unprotected species. The authors also mention that previous modeling had pointed to the high flexibility of the insertion domain. Upon binding, the lid-helix region seems to be ordered from computational observations and loses bimodality by HDX-MS with a deuterium uptake consistent with the protected species of the bimodal envelop in the unbound form. The authors fall short of interpreting or even discussing what the bimodality of the lid-helix represents in the unbound form. What does the protected species in the bimodal envelope represent? Is it a transition representing lid-helix formation and unfolding? Does it imply that interaction and insertion into the lipid structures are governed by conformational selection? This issue should be at the very least acknowledged and discussed, or optimally investigated by performing more integrative studies of the HDX-MS data with the extensive computational data at hand, using existing protection factor calculations or HDX-guided ensemble refinement methods.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript describes a combined computational and experimental approach to investigate the ABHD5 binding to and insertion into membranes.

      Strengths:

      Mutational experiments support computational findings obtained on ABHD5 membrane insertion with enhanced-sampling atomistic simulations.

      Weaknesses:

      While the addressed problem is interesting, I have several concerns, which fall into two categories:

      (A) I see statements throughout the manuscript, e.g. on PNPLA activation, that are not supported by the results.

      (B) The presentation of the computational and experimental results lacks in part clarity and detail.

      Comments and questions on (A):

      (1) I think the following statements in the abstract, which go beyond ABHD5 membrane binding, are not supported by the presented data:

      the addition "to control lipolytic activation" in the 3rd sentence of the abstract.

      further below ".... transforming ABHD5 into an active and membrane-localized regulator".

      (2) The authors state in the Introduction (page numbers and line numbers are missing to be more specific):

      "We hypothesize that binding of ABHD5 alters the nanoscale chemical and biophysical properties of the LD monolayer, which, combined with direct protein-protein interactions, enables PNPLA paralogs to access membrane-restricted substrates. This regulatory mechanism represents a paradigm shift from conventional enzyme-substrate interactions to sophisticated allosteric control systems that operate at membrane interfaces."

      This hypothesis and the suggested paradigm shift are not supported by the data. Protein-protein interactions are not considered. What is meant by "sophisticated allosteric control"?

      (3) The authors state in the Results section:

      "We hypothesize that this TAG nanodomain is critical for ABHD5-activated TAG hydrolysis by PNPLA2." In previous pages, the authors state the location of the nanodomain: "TAG nanodomain under ABHD5".

      If the nanodomain is located under ABHD5, how can it be accessible to PNPLA2? To my understanding, ABHD5 then sterically blocks access of PNPLA2 to the TAG nandomain.

      (4) Another statement: "Our findings suggest that ABHD5-mediated membrane remodeling regulates lipolysis in part by regulating PNPLA2 access to its TAG substrate."

      I don't see how the reported results support this statement (see point 3 above).

      Comments and questions on (B):

      (1) The authors state that the GaMD simulations started "from varying conformations observed during CGMD".

      What is missing is a clear description of the CGMD simulation conformations, and the CG simulations as a whole, prior to the results section on GaMD. The authors use standard secondary and tertiary constraints in the Martini CG simulations. Do the authors observe some (constrained) conformational changes of ABHD5 already in the CG simulations (depending on the strength of the constraints)? Or do the conformational changes occur exclusively in the GaMD simulations? Both are fine, but this needs to be described.

      (2) The authors write: "Three replicas of GaMD were performed."

      Do these replicas lead to similar, or statistically identical, membrane-bound ABHD5 conformations? Is this information, i.e. a statistical analysis of differences in the replica runs, already included in the manuscript?

      (3) The authors state on the hydrogen exchange results:

      "HDX-MS provided orthogonal experimental evidence for the dynamics of the lid. In solution, a peptide (residues 200-226) spanning the lid helix displayed a bimodal isotopic distribution (Fig. S4), indicating the coexistence of different conformations. Upon LD binding, this distribution shifted to a single, low-exchange peak, demonstrating stabilization of the membrane-bound conformation with reduced solvent accessibility. These experimental observations corroborate our MD simulations."

      I find this far too short to be understandable. Also, there are no computational results of ABHD5 in solution that show a bimodal conformational distribution of the lid helix, which is observed in the hydrogen exchange experiments. Which aspects of the MD simulations are corroborated?

    1. eLife Assessment

      This important study introduces NoSeMaze, a semi-naturalistic platform for continuous, high-dimensional tracking of social and cognitive behaviors in group-housed mice, and uses it to show that individual social rank is stable across changing social contexts. By integrating automated dominance measures, proactive social behaviors, and reinforcement-learning-based profiles, the authors demonstrate a novel framework for examining how stable individual differences shape social structure. The strength of evidence is solid, advancing our understanding that social hierarchy can be viewed as a trait-like dimension of individuality. Yet, the interpretation of dominance in this paradigm and its broader ecological meaning remains somewhat ambiguous. This work will be of broad relevance for behavioral neuroscience and social behavior research.

    2. Reviewer #1 (Public review):

      Summary:

      The goal of the study was to address the question of the degree to which social position in a group is a stable trait that persists across conditions. Reinwald et al. use a custom-built cage system with automated tracking and continuous testing for social dominance that does not require intervention by the experimenter. Remixing of individuals from different groups revealed that social position was rather stable and not really predictable from other measures that were taken. The authors conclude that social position is multifaceted but dependent on characteristics like personality traits.

      Strengths:

      (1) Reductionistic, highly controlled setting that allows for the control of many confounding variables.

      (2) Very interesting and important question.

      (3) Confirms the emergence of inter-individual behavior-driven differences in inbred mice in a shared environment.

      (4) Innovative paradigm and experimental setup.

      (5) Fresh perspective on an old question that makes the best use of modern technology.

      (6) Intelligent use of behavioral and cognitive covariables to generate a non-social context.

      (7) Bold and almost provocative conclusion, inviting discussion and further elaboration.

      Weaknesses:

      (1) Reductionistic, highly controlled setting that blends out much of the complexity of social behavior in a community.

      (2) The motivation to enter the test tube is not "trait" (or at least not solely a trait) but the basic need to reach food and water; chasing behavior would be less dependent on this stimulus.

      (3) Dominance is only one aspect of sociality, social structure is reduced to rank. The information that might lie in the chasing behavior is not optimally used to explain social behavior beyond the rank measure.

      (4) Focus on rank bears the risk of overgeneralization for readers not familiar with the context.

      (5) Conclusion only valid for the reductionistic setting, in which environment, social and non-social changes only within narrow limits, and in which the mouse population does not face challenges

      (6) Animals are not naive at the beginning of the experiment, but are already several weeks old.

      In summary, this is a wonderful study, but not one that is easy to interpret. The bold conclusion is valid only within the constraints of the study, but nevertheless points in an important direction. The paradigm is clever and could be used for many interesting follow-ups.

      To define social position as a personality trait will elicit strong opposition and much debate; the nuances of the paper might be lost on many readers and call for the (re)-consideration of many concepts that are touched. I find this attitude a strength of the paper, but the approach bears the risk of misunderstanding.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript presents the "NoSeMaze", a novel automated platform for studying social behavior and cognitive performance in group-housed male mice. The authors report that mice form robust, transitive dominance hierarchies in this environment and that individual social rank remains largely stable across multiple group compositions. They further demonstrate that social dominance and aggressive behaviors, like chasing, are partially dissociable and that dominance traits are independent of non-social cognitive performance. The study includes a genetic manipulation of oxytocin receptor expression in the anterior olfactory nucleus, which showed only transient effects on social rank.

      Strengths:

      (1) Innovative Methodology:<br /> The NoSeMaze platform is a technically elegant and conceptually well-integrated system that enables fully automated, long-term monitoring of both social and cognitive behaviors in large groups of group-housed mice. It combines tube-test-like dominance contests, voluntary chase-escape interactions, and an embedded operant olfactory discrimination task within a single, ethologically relevant environment. This modular design allows for high-throughput, minimally invasive behavioral assessment without the need for repeated handling or artificial isolation.

      (2) Experimental Scale and Rigor:<br /> The study includes 79 male mice and over 4,000 mouse-days of observation across multiple group reshufflings. The use of RFID-based identification, automated data logging, and longitudinal design enables robust quantification of individual trait stability and group-level social structure.

      (3) Multidimensional Behavioral Profiling:<br /> The integration of social (tube dominance, proactive chasing), physical (body weight), and cognitive (olfactory learning task) measures offers a rich, multi-dimensional profile of each individual mouse. The authors' finding that social dominance traits and non-social cognitive performance are largely uncorrelated reinforces emerging models of orthogonal behavioral trait axes or "animal personalities".

      (4) Clarity and Data Analysis:<br /> The analytical framework is well-suited to the study's complexity, with appropriate use of dominance metrics, mixed-effects models, and permutation tests. The analyses are clearly explained, statistically rigorous, and supported by transparent supplementary materials.

      Weaknesses:

      (1) Conceptual Novelty and Prior Work:<br /> While the study is carefully executed and methodologically innovative, several of its core findings reaffirm concepts already established in the literature. The emergence of stable, transitive social hierarchies, the persistence of individual differences in social behavior, and the presence of non-despotic social structures have all been previously reported in mice, including under semi-naturalistic conditions (e.g., Fan et al., 2019; Forkosh et al., 2019). Although this work extends those findings with greater behavioral resolution and scale, the manuscript would benefit from a clearer articulation of what is genuinely novel at the conceptual level, beyond the technological advance.

      (2) Role of OXTR Deletion:<br /> The inclusion of the OXTR manipulation feels somewhat disconnected from the manuscript's central aims. The effects were minimal and transient, and the authors defer full interpretation to a separate study.

      (3) Scope Limitations (Sex and Age):<br /> The study is limited to male mice, and although this is acknowledged, the title and overall framing imply broader generalizability. This sex-specific focus represents a common but problematic bias. Additionally, results from the older mouse cohort are under-discussed; if age had no effect, this should be explicitly stated.

      (4) Ambiguity of Dominance as a Construct:<br /> While the study robustly quantifies social rank and hierarchy structure, the broader functional meaning of "dominance" remains unclear. As in prior work (e.g., Varholick et al., 2019), dominance rank here shows only weak associations with physical attributes (e.g., body weight), cognitive strategy, or neuromodulatory manipulation (OXTR deletion). This recurring pattern, where rank metrics are reliably established yet poorly predictive of other behavioral or biological traits, raises important questions about what such measures actually capture. In particular, it challenges the assumption that outcomes in paradigms like the tube test or chase frequency necessarily reflect dominance per se, rather than other constructs.

    4. Reviewer #3 (Public review):

      Reinwald et al. present the NoSeMaze, a semi-natural behavioral system designed to track social behaviors alongside reinforcement-learning in large groups of mice. Accumulating more than 4,000 days of behavioral monitoring, the authors demonstrate that social rank (determined by tube competitions) is a stable trait across shuffled cohorts and correlated with active chasing behaviors. The system also provides a solid platform for long-term measurements of reinforcement learning, including flexibility, response adaptation, and impulsiveness. Yet, the authors show that social ranking and chasing are mostly independent of these cognitive traits, and both seem mostly independent of oxytocin signaling in the AON.

      Strengths:

      (1) The neuroethological approach for automated tracking of several mice under semi-natural conditions is still rare in social behavioral research and should be encouraged.

      (2) The assessment of dominance by two independent measures, i.e., spontaneous tube competitions and proactive chasing, is innovative and valuable.

      (3) The integration of a long-term reinforcement-learning module into the semi-natural system provides novel opportunities to combine cognitive traits into social personality assessments.

      (4) The open-source system provides a valuable resource for the scientific community.

      Limitations:

      (1) Apparent ambiguity and inconsistency in age structure and cohort participation across rounds, raising concerns about uncontrolled confounds.

      (2) Chasing behavior appears more stable than tube-test competitions (Figure 4D vs. Figure 3D), which challenges the authors' decision to treat tube competitions as the primary basis for hierarchy determination.

      Major concerns:

      (1) Unclear and inconsistent handling of age groups and repeated sampling. The manuscript repeatedly refers to "younger" and "older" adults, but it is unclear whether age was ever controlled for or included in models. Some mice completed only one round, others 2-5 rounds, without explanation of the criteria or balancing.

      (2) Stability of chasing appears stronger than the stability of tube competitions. Figure 4D shows highly consistent chasing behavior across weeks, while Figure 3D shows weaker and more variable correlations for tube-based David scores. This is also evident from Figure 5A-B,D. Thus, it appears that chasing, which serves to quantify dominance in similar semi-natural setups, may be a more reliable and behaviorally meaningful measure of dominance than the incidental tube competitions.

      (3) Unbalanced participation across rounds compromises stability analyses. Stability analyses (e.g., ICCs, round-to-round correlations) assume comparable sampling across individuals. However, some mice contribute 1 round, others 2, 3, 4, and even 5 rounds. This imbalance may inflate stability estimates or confound group reshuffling effects, and the rationale for variable participation is not explained.

    1. eLife Assessment

      This valuable study highlights the challenge of identifying the role of immune imprinting in influenza immunity. The manuscript provides solid evidence that statistical support for imprinting depends heavily on model choice and can be found in the absence of imprinting due to age-related processes. However, the results are incomplete in that the impact of incorrectly modeling imprinting is not clear. The work will be of interest to researchers who study adaptive immunity in any system where imprinting may be observed.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript uses serological data to quantify the effects of imprinting on subsequent influenza antibody responses. While this is an admirable goal, the HI dataset sounds impressive, and the authors developed a number of models, the manuscript came off as very dense and technical. One of the biggest pitfalls is that it is not easy to understand the lessons learned. The two Results section headers make clear statements - there was an imprinting signal in the HI titers, but much of this signal could also be seen in an imprinting-free simulation - and then the Discussion states a number of limitations. This is fine, but it leaves the reader wondering exactly how large an error would be introduced by ignoring imprinting effects altogether; alternatively, if imprinting is purposefully added, what would the expected effect size be? The comments below will provide some concrete steps to help clarify these points.

      Major comments:

      (1) Lines 107-133: The first Results section is a dense slog of information, and the reader is never given a good overview of what the imprinting coefficients exactly are. As the paper currently stands, if you do not start by reading the Methods, you will take away very little. I suggest adding a schematic for any of your models, showing what HI titers would be expected with/without imprinting effects. or age effects, or both, to tie in your modeling coefficients with quantities that all readers are familiar with.

      (1.1) Clarify what the imprinting coefficient (y-axis in Figure 1A) looks like in this schematic.

      (1.2) Another aspect that I missed: In addition to stating which models were best by BIC, what is the absolute effect size in the HI titers? During my initial reading, I had hoped that Figure 3 would answer this question, but it turned out to be just an overview of the dataset. I strongly suggest having such a figure to show the imprinting effect inferred by different models. What would the expected effect be if you kept someone's birth year constant but tuned their age? What if you kept their age at collection constant but tuned their birth year?

      (1.3) It would also help to explain in your schematic what the x-axis labels (H1, H2, H1/H3) would look like in these scenarios, and what imprinting relative to H3 means.

      (2) As mentioned above, it was hard to understand the takeaway messages, such as:

      (2.1) A similar question would be: If you model antibody titers without imprinting, how far off would you be from the actual measurements (2x off, 4x off...)? If you add the imprinting effect, how much closer do you get?

      (2.2) Are there specific age groups that require imprinting to be taken into consideration, since otherwise HI analyses will be markedly off?

      (2.3) Are there age groups where imprinting can be safely ignored?

      (3) HI titers against multiple H1 and H3 variants were measured, but it is unclear how these are used, and why titers against a single variant each season would not have worked equally well.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, the authors were testing the hypothesis that hemagglutination inhibition antibody titers, measured later in life, might be higher against influenza viruses that belong to the same hemagglutinin classification group as the influenza virus that a person was likely first exposed to early in life. This is one conceptualization of a phenomenon termed immune imprinting, which may explain previously observed differences in susceptibility to severe influenza infection between cohorts that were likely first exposed to different hemagglutinin groups. The results of the analysis provide some support for this analysis. However, support for the hypothesis is not consistently observed across sensitivity analyses, and a simulation study finds that antibody patterns consistent with immune imprinting may arise due to other factors in the absence of true imprinting effects. Therefore, overall support for the hypothesis is weak. Nonetheless, this study is important in that it provides guidance and has developed an analytic methodology for additional studies in this area of research. These findings and methods may also be useful for other infectious diseases for which patterns consistent with immune imprinting have been observed.

      Strengths:

      The strengths of this study include the relatively large cohort data source with broad age representation, rigorous statistical methods, and the use of sensitivity and simulation analyses to assess the robustness of the results.

      Weaknesses:

      The model outcome includes antibody titers measured against many different viruses, and the imprinting parameter was defined at the subtype level. This may obscure specific imprinting effects related to finer structural similarities between first and subsequent virus exposures. This analysis focuses only on one component of the immune response to influenza; immune imprinting may also involve other immune mechanisms. The analysis was carried out in a Chinese cohort, and vaccination status of the cohort is not discussed; the results may not be generalizable to other populations, particularly if vaccination patterns differ.

    1. eLife Assessment

      This study makes an important contribution by revealing how saccades selectively disrupt spatial working memory while sparing other object features, and by demonstrating how this mechanism is altered in aging and neurodegeneration. The findings are supported by convincing evidence derived from well-controlled eye-tracking experiments and systematic generative model comparisons. Together, the work provides a computationally grounded framework that is of importance for understanding trans-saccadic memory and its clinical relevance.

    2. Reviewer #1 (Public review):

      Summary:

      This study employed a saccade-shifting sequential working memory paradigm, manipulating whether a saccade occurred after each memory array to directly compare retinotopic and transsaccadic working memory for both spatial location and color. Across four participant groups (young and older healthy adults, and patients with Parkinson's disease and Alzheimer's disease), the authors found a consistent saccade-related cost specifically for spatial memory - but not for color - regardless of differences in memory precision. Using computational modeling, they demonstrate that data from healthy participants are best explained by a complex saccade-based updating model that incorporates distractor interference. Applying this model to the patient groups further elucidates the sources of spatial memory deficits in PD and AD. The authors then extend the model to explain copying deficits in these patient groups, providing evidence for the ecological validity of the proposed saccade-updating retinotopic mechanism.

      Strengths:

      Overall, the manuscript is well written, and the experimental design is both novel and appropriate for addressing the authors' key research questions. I found the study to be particularly comprehensive: it first characterizes saccade-related costs in healthy young adults, then replicates these findings in healthy older adults, demonstrating how this "remapping" cost in spatial working memory is age-independent. After establishing and validating the best-fitting model using data from both healthy groups, the authors apply this model to clinical populations to identify potential mechanisms underlying their spatial memory impairments. The computational modeling results offer a clearer framework for interpreting ambiguities between allocentric and retinotopic spatial representations, providing valuable insight into how the brain represents and updates visual information across saccades. Moreover, the findings from the older adult and patient groups highlight factors that may contribute to spatial working memory deficits in aging and neurological disease, underscoring the broader translational significance of this work.

      Comments on revisions:

      The authors have addressed my earlier concerns.

    3. Reviewer #2 (Public review):

      Summary:

      Zhao et al investigate how object location and colour are degraded across saccadic eye movements. They employ an eye-tracking task that requires participants to remember two sequentially presented items and subsequently report the colour and position of either one of these. Through counterbalancing of the presence or absence of saccades across items, the authors endeavour to dissect the impact of saccades independently on item location or colour. These behavioural findings form the basis of generative models designed to test competing, nested accounts of how stored information is stored and updated across saccades.

      Strengths:

      The combination of eye-tracking and generative modelling is a strength of the paper, which opens new perspectives into the impact of Alzheimer's and Parkinson's disease on the performance of visuospatial cognitive tests. The finding that the model parameters covary with clinical performance on the ROCF test is a nice example of a "computational assay" of disease.

      Comments on revisions:

      I thank the authors for their detailed responses and revisions arising from my feedback on the original manuscript. The revised manuscript adequately addresses all of my concerns.

    4. Reviewer #3 (Public review):

      Summary:

      The manuscript introduces a visual paradigm aimed at studying tran-saccadic memory.

      The authors observe how memory of object location is selectively impaired across eye movements, whereas object colour memory is relatively immune to intervening eye movements.<br /> Results are reported for young and elderly healthy controls, as well as PD and AD participants.

      A computational model is introduced to account for these results, indicating how early differences in memory encoding and decay (but not tran-saccadic updating per se) can account for the observed differences between healthy controls and clinical groups.

      In the revised manuscript, the authors have addressed most of my initial concerns. The dataset is generally compelling, as it includes healthy younger and older adults as well as clinical populations. In addition, the authors propose an interesting modelling approach designed to isolate and characterize the key components underlying the observed patterns of results.

      It is important to acknowledge potential limitations of the modelling approach, particularly the differences in the number of parameters across the tested models. As models with more parameters typically achieve better fit, this issue warrants careful consideration. The authors have substantially addressed this point in their rebuttal.

      Concerns regarding the specificity of the findings were also raised and have been adequately discussed in the authors' response. Specifically, they clarified the selective impact of saccade-related costs on spatial working memory updating across eye movements-without affecting feature‑based memory (e.g., color) -as well as the specificity of the updating effects observed with the Rey-Osterrieth Complex Figure.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Thank you so much for your comprehensive and insightful assessment of our manuscript. We appreciate your recognition of the novelty of our experimental design and the utility of our computational framework for interpreting visual remapping across the lifespan and in clinical populations. We are very grateful for your suggestions regarding the narrative flow, which have helped us to improve the manuscript's focus and coherence. Our responses to your specific concerns are detailed below.

      (1) Relevance of the figure-copy results (pp. 13-15). Is it necessary to include the figure-copy task results within the main text? The manuscript already presents a clear and coherent narrative without this section. The figure-copy task represents a substantial shift from the LOCUS paradigm to an entirely different task that does not measure the same construct. Moreover, the ROCF findings are not fully consistent with the LOCUS results, which introduces confusion and weakens the manuscript's coherence. While I understand the authors' intention to assess the ecological validity of their model, this section does not effectively strengthen the manuscript and may be better removed or placed in the Supplementary Materials.

      We thank the reviewer  for their perspective regarding the narrative flow and the transition between the LOCUS paradigm and the ROCF results. However, we remain keen to retain these findings in the main text, as they provide critical ecological and clinical validation for the computational mechanisms identified in our study.

      We think these results strengthen the manuscript for the following main reasons:

      (1) The ROCF we used is a standard neuropsychological tool for identifying constructional apraxia. Our results bridge the gap between basic cognitive neuroscience and clinical application by demonstrating that specific remapping parameters—rather than general memory precision—predict real-world deficits in patients.

      (2) The finding that our winning model explains approximately 62% of the variance in ROCF copy scores across all diagnostic groups further indicates that these parameters from the LOCUS task represent core computational phenotypes that underpin complex, real-life visuospatial construction (copying drawings).

      (3) Previous research has often observed only a weak or indirect link between drawing ability and traditional working memory measures, such as digit span (Senese et al., 2020). This was previously attributed to “deictic” strategies—like frequent eye and hand movements—that minimise the need to hold large amounts of information in memory (Ballard et al., 1995; Cohen, 2005; Draschkow et al., 2021). While our study was not exclusively designed to catalogue all cognitive contributions to drawing, the findings provide significant and novel evidence indicating that transsaccadic integration is a critical driver of constructional (copying drawing) ability. By demonstrating this link, the results provide evidence to stimulate a new direction for future research, shifting the focus from general memory capacity toward the precision of spatial updating across eye movements.

      In summary, by including the ROCF results in the main text, we provide evidence for a functional role for spatial remapping that extends beyond perceptual stability into the domain of complex visuomotor control. We have expanded on these points throughout the revised manuscript:

      In the Introduction: p.2:

      “The clinical relevance of these spatial mechanisms is underscored by significant disruptions to visuospatial processing and constructional apraxia—a deficit in copying and drawing figures—observed in neurodegenerative conditions such as Alzheimer's disease (AD) and Parkinson's disease (PD).[20,21] This raises a crucial question: do clinical impairments in complex visuomotor tasks stem from specific failures in transsaccadic remapping? If so, the computational parameters that define normal spatial updating should also provide a mechanistic account of these clinical deficits, differentiating them from general age-related decline.”

      p.3: "Finally, by linking these mechanistic parameters to a standard clinical measure of constructional ability (the Rey-Osterrieth Complex Figure task), we demonstrate that transsaccadic updating represents a core computational phenotype underpinning real-world visuospatial construction in both health and neurodegeneration.

      In the Results:

      “To assess whether the mechanistic parameters derived from the LOCUS task represent core phenotypes of real-world visuospatial abilities, we also instructed all participants to complete the Rey-Osterrieth Complex Figure copy task (ROCF; Figure 7A) on an Android tablet using a digital pen (see examples in Figure 7B; all Copy data are available in the open dataset: https://osf.io/95ecp/). The ROCF is a gold-standard neuropsychological tool for identifying constructional apraxia.[29] Historically, drawing performance has shown only weak or indirect correlations with traditional working memory measures.[30] This disconnect has been attributed to active visual-sampling strategies—frequent eye movements that treat the environment as an external memory buffer, minimising the necessity of holding large volumes of information in internal working memory.[3–5]

      We hypothesised that drawing accuracy is primarily constrained by the precision of spatial updating across frequent saccades rather than raw memory capacity. To evaluate the ecological validity of the identified saccade-updating mechanism, we modelled individual ROCF copy scores across all four groups using the estimated (maximum a posteriori) parameters from the winning “Dual (Saccade) + Interference” model (Model 7; Figure 8) as regressors in a Bayesian linear model. Prior to inclusion, each regressor was normalised by dividing by the square root of its variance.

      This model successfully explained 61.99% of the variance in ROCF copy scores, indicating that these computational parameters are strong predictors of real-word constructional ability (Figure 8A). … This highlights the critical role of accurate remapping based on saccadic information; even if the core saccadic update mechanism is preserved across groups (as shown in previous analyses), the precision of this updating process is crucial for complex visuospatial tasks. Moreover, worse ROCF copy performance is associated particularly with higher initial angular encoding error. This indicates that imprecision in the initial registration of angular spatial information contributes to difficulties in accurately reproducing complex visual stimuli.”

      In the Discussion:

      “Importantly, our computational framework establishes a direct mechanistic link between trassaccadic updating and real-world constructional ability. Specifically, higher saccade and angular encoding errors contribute to poorer ROCF copy scores. By mapping these mechanistic estimates onto clinical scores, we found that the parameters derived from our winning model explain approximately 62% of the variance in constructional performance across groups. These findings suggest that the computational parameters identified in the LOCUS task represent core phenotypes of visuospatial ability, providing a mechanistic bridge between basic cognitive theory and clinical presentation.

      This relationship provides novel insights into the cognitive processes underlying drawing, specifically highlighting the role of transsaccadic working memoty.ry. Previous research has primarily focused on the roles of fine motor control and eye-hand coordination in this skill.[4,50–55] This is partly because of consistent failure to find a strong relation between traditional memory measures and copying abili [4,31] For instance, common measures of working memory, such as digit span and Corsi block tasks, do not directly predict ROCF copying performance.[31,56] Furthermore, in patients with constructional apraxia, these memory performance measures often remain relatively preserved despite significant drawing impairments.[56–58] In the literature, this lack of association has often been attributed to “deictic” visual-sampling strategies, characterised by frequent eye movements that treat the environment as an external memory buffer, thereby minimising the need to maintain a detailed internal representation.[4,59] In a real-world copying task, the ROCF requires a high volume of saccades, making it uniquely sensitive to the precision of the dynamic remapping signals identified here. Recent eye-tracking evidence confirms that patients with AD exhibit significantly more saccades and longer fixations during figure copying compared to controls, potentially as a compensatory response to trassaccadic working memory constraints.[56] This high-frequency sampling—averaging between 150 and 260 saccades for AD patients compared to approximately 100 for healthy controls—renders the task highly dependent on the precision of dynamic remapping signals.[56] To ensure this relationship was not driven by a general "g-factor" or non-spatial memory impairment, we further investigated the role of broader cognitive performance using the ACE-III Memory subscale. We found that the relationship between transsaccadic working memory and ROCF performance remains highly significant, even after controlling for age, education, and ACE-III Memory subscore. This suggests that transsaccadic updating may represent a discrete computational phenotype required for visuomotor control, rather than a non-specific proxy for global cognitive decline.

      In other words, even when visual information is readily available in the world, the act of copying depends critically on working memory across saccades. This reveals a fundamental computational trade-off: while active sampling strategies (characterised with frequent eye-hand movements) effectively reduce the load on capacity-limited working memory, they simultaneously increase the demand for precise spatial updating across eye movements. By treating the external world as an "outside" memory buffer, the brain minimises the volume of information it must hold internally, but it becomes entirely dependent on the reliability with which that information is remapped after each eye movement. This perspective aligns with, rather contradicts, the traditional view of active sampling, which posits that individuals adapt their gaze and memory strategies based on specific task demands.[3,60] Furthermore, this perspective provides a mechanistic framework for understanding constructional apraxia; in these clinical populations, the impairment may not lie in a reduced memory "span," but rather in the cumulative noise introduced by the constant spatial remapping required during the copying process.[58,61]

      Beyond constructional ability, these findings suggest that the primary evolutionary utility of high-resolution spatial remapping lies in the service of action rather than perception. While spatial remapping is often invoked to explain perceptual stability,[11–13,15] the necessity of high-resolution transsaccadic memory for basic visual perception is debated.[13,62–64] A prevailing view suggests that detailed internal models are unnecessary for perception, given the continuous availability of visual information in the external world.[13,44] Our findings support an alternative perspective, aligning with the proposal that high-resolution transsaccadic memory primarily serves action rather than perception.[13] This is consistent with the need for precise localisation in eye-hand coordination tasks such as pointing or grasping.[65] Even when unaware of intrasaccadic target displacements, individuals rapidly adjust their reaching movements, suggesting direct access of the motor system to remapping signals.66 Further support comes from evidence that pointing to remembered locations is biased by changes in eye position,[67] and that remapping neurons reside within the dorsal “action” visual pathway, rather than the ventral “perception” visual pathway.[13,68,69] By demonstrating a strong link between transsaccadic working memory and drawing (a complex fine motor skill), our findings suggest that precise visual working memory across eye movements plays an important role in complex fine motor control.”

      (2) Model fitting across age groups (p. 9).

      It is unclear whether it is appropriate to fit healthy young and healthy elderly participants' data to the same model simultaneously. If the goal of the model fitting is to account for behavioral performance across all conditions, combining these groups may be problematic, as the groups differ significantly in overall performance despite showing similar remapping costs. This suggests that model performance might differ meaningfully between age groups. For example, in Figure 4A, participants 22-42 (presumably the elderly group) show the best fit for the Dual (Saccade) model, implying that the Interference component may contribute less to explaining elderly performance.

      Furthermore, although the most complex model emerges as the best-fitting model, the manuscript should explain how model complexity is penalized or balanced in the model comparison procedure. Additionally, are Fixation Decay and Saccade Update necessarily alternative mechanisms? Could both contribute simultaneously to spatial memory representation? A model that includes both mechanisms-e.g., Dual (Fixation) + Dual (Saccade) + Interference-could be tested to determine whether it outperforms Model 7 to rule out the sole contribution of complexity.

      We thank you for the opportunity to expand upon and clarify our modelling approach. Our decision to use a common generative model for both young and older adults was grounded in the empirical finding that there was no significant interaction between age group and saccade condition for either location or colour memory. While older adults demonstrated lower baseline precision, the specific "saccade cost" remained remarkably consistent across cohorts. This was the justification we proceeded on to use of a common model to assess quantitative differences in parameter estimates while maintaining a consistent mechanistic framework for comparison.

      Moreover, our winning model nests simpler models as special cases, providing the flexibility to naturally accommodate groups where certain components—such as interference—might play a reduced role. This ultimately confirms that the mechanisms for age-related memory deficits in this task reflect more general decline rather than a qualitative failure of the saccadic remapping process.

      This approach is further supported by the properties of the Bayesian model selection (BMS) procedure we used, which inherently penalises the inclusion of unnecessary parameters. Unlike maximum likelihood methods, BMS compares marginal likelihoods, representing the evidence for a model integrated over its entire parameter space. This follows the principle of Bayesian Occam’s Razor, where a model is only favoured if the improvement in fit justifies the additional parameter space; redundant parameters instead "dilute" the probability mass and lower the model evidence.

      Consequently, we contend that a hybrid model combining fixation and saccade mechanisms is unnecessary, as we have already adjudicated between alternative mechanisms of equal complexity. Specifically, Model 6 (Dual Fixation + Interference) and Model 7 (Dual Saccade + Interference) possess an identical number of parameters. The fact that Model 7 emerged as the clear winner—providing substantial evidence against Model 6 with a Bayes Factor of 6.11—demonstrates that our model selection is driven by the specific mechanistic account of the data rather than a simple preference for complexity.

      We have revised the Results and Discussion sections of the manuscript to state these points more explicitly for readers and have included references to established literature regarding the robustness of marginal likelihoods in guarding against overfitting.

      In the Results,

      “By fitting these models to the trial-by-trial response data from all healthy participants (N=42), we adjudicated between competing mechanisms to determine which best explained participant performance (Figure 4). We used random-effects Bayesian model selection to identify the most plausible generative model. This process relies on the marginal likelihood (model evidence), which inherently balances model fit against complexity—a principle often referred to as Occam’s razor.[25–27] The analysis yielded a strong result: the “Dual (Saccade) + Interference” model (Model 7 in Table 1) emerged as the winning model, providing substantial evidence against the next best alternative with a Bayes Factor of 6.11.”

      In the Discussion:

      “Our framework employs Variational Laplace, a method used to recover computational phenotypes in clinical populations like those with substance use disorders,[34,35] and the models we fit using this procedure feature time-dependent parameterisation of variance—conceptually similar to the widely-used Hierarchical Gaussian Filter.[36–39] Importantly, the risk of overfitting is mitigated by the Bayesian Model Selection framework; by utilising the marginal likelihood for model comparison, the procedure inherently penalises excessive model complexity and promotes generalisability.[25–27,40] This generalisability was further evidenced by the model's ability to predict performance on the independent ROCF task, confirming that these parameters represent robust mechanistic phenotypes rather than idiosyncratic fits to the initial dataset.”

      Minor point: On p. 9, line 336, Figure 4A does not appear to include the red dashed vertical line that is mentioned as separating the age groups.

      Thank you for pointing out this inconsistency. We apologise for the oversight; upon further review, we concluded that the red dashed vertical line was unnecessary for the clear presentation of the data. We have therefore removed the line from Figure 4A and deleted the corresponding sentence in the figure caption.

      (3) Clarification of conceptual terminology.

      Some conceptual distinctions are unclear. For example, the relationship between "retinal memory" and "transsaccadic memory," as well as between "allocentric map" and "retinotopic representation," is not fully explained. Are these constructs related or distinct? Additionally, the manuscript uses terms such as "allocentric map," "retinotopic representation," and "reference frame" interchangeably, which creates ambiguity. It would be helpful for the authors to clarify the relationships among these terms and apply them consistently.

      Thank you for pointing this out. We have revised the manuscript to ensure that these terms are applied with greater precision and consistency. Our revisions standardise the terminology based on the following distinctions:

      Reference frames: We distinguish between the eye-centred reference frame (coordinate systems that shift with gaze) and the world-centred reference frame (coordinate systems anchored to the environment).

      Retinotopic representation vs. allocentric map: We clarify that retinotopic representations are encoded within an eye-centred reference frame and are updated with every ocular movement. Conversely, the allocentric map is anchored to stable environmental features, remaining invariant to the observer’s gaze direction or position.

      Retinotopic memory vs. transsaccadic memory: We have removed the term "retinal memory" to avoid ambiguity. We now consistently use retinotopic memory to describe the persistence of visual information in eye-centred coordinates within a single fixation. In contrast, transsaccadic memory refers to the higher-level integration of visual information across saccades, which involves the active updating or remapping of representations to maintain stability.

      To incorporate these clarifications, we have implemented the following changes:

      In the Introduction, the second paragraph has been entirely rewritten to establish these definitions at the outset, providing a clearer theoretical framework for the study.

      “Central to this enquiry is the nature of the coordinate system used for the brain's internal spatial representation. Does the brain maintain a single, world-centred (allocentric) map, or does it rely on a dynamic, eye-centred (retinotopic) representation?[11,13,15,16] In the latter system, retinotopic memory preserves spatial information within a fixation, whereas transsaccadic memory describes the active process of updating these representations across eye movements to achieve spatiotopic stability—the perception of a stable world despite eye movements.[11,16–18] If spatial stability is indeed reconstructed through such remapping, the mechanism remains unresolved: do we retain memories of absolute fixation locations, or do we reconstruct these positions from noisy memories of the intervening saccade vectors? We can test these hypotheses by analysing when and where memory errors occur. Assuming that memory precision declines over time,[19] the resulting error distributions should reveal the specific variables that are represented and updated across each saccade.”

      In the Results, the opening section of the Results has been reorganised to align with this terminology. We have ensured that the hypotheses and behavioural data—specifically the definition of "saccade cost"—are introduced using this consistent conceptual vocabulary to improve the overall coherence of the narrative.

      (4) Rationale for the selective disruption hypothesis (p. 4, lines 153-154). The authors hypothesize that "saccades would selectively disrupt location memory while leaving colour memory intact." Providing theoretical or empirical justification for this prediction would strengthen the argument.

      We have revised the Results to state the hypothesis more explicitly and expanded the Discussion to provide a robust theoretical and empirical rationale:

      In the Results,

      “This design allowed us to isolate and quantify the unique impact of saccades on spatial memory, enabling us to test competing hypotheses regarding spatial representation. If spatial memory were solely underpinned by an allocentric mechanism, precision should remain comparable across all conditions as the representation would be world-centred and unaffected by eye movements. Thus, performance in the no-saccade condition should be comparable to the two-saccade condition. Conversely, if spatial memory relies on a retinotopic representation requiring active updating across eye movements, the two-saccade condition was anticipated to be the most challenging due to cumulative decay in the memory traces used for stimulus reconstruction after each saccade.[22] Critically, we hypothesised that this saccade cost would be specific to the spatial domain; while location requires active remapping via noisy oculomotor signals, non-spatial features like colour are not inherently tied to coordinate transformations and should therefore remain stable (see more in Discussion below).

      Meanwhile, the no-saccade condition was expected to yield the most accurate localisation, relying solely on retinotopic information (retinotopic working memory). These predictions were confirmed in young healthy adults (N = 21, mean age = 24.1 years, ranged between 19 and 34). A repeated measures ANOVA revealed a significant main effect of saccades on location memory (F(2.2,43.9)=33.2, p<0.001, partial η²=0.62), indicating substantial impairment after eye movements (Figure 2A). In contrast, colour memory remained remarkably stable across all saccade conditions (Figure 2B; F(2.2, 44.7) = 0.68, p=0.53, partial η² =0.03).

      This “saccade cost”—the loss of memory precision following an eye movement—indicates that spatial representations require active updating across saccades rather than being maintained in a static, world-centred reference frame.

      Critically, our comparison between spatial and colour memory does not rely on the absolute magnitude of errors, which are measured in different units (degrees of visual angle vs. radians). Instead, we assessed the relative impact of the same saccadic demand on each feature within the same trial. While location recall showed a robust saccade cost, colour recall remained statistically unchanged. To ensure this null effect was not due to a lack of measurement sensitivity, we examined the recency effect; recall performance for the second item was predicted to be better than for the first stimulus in each condition.[23,24] As expected, colour memory for Item 2 was significantly more accurate than for Item 1 (F(1,20) = 6.52, p = 0.02, partial η² = 0.25), demonstrating that the task was sufficiently sensitive to detect standard working memory fluctuations despite the absence of a saccade-induced deficit.”

      In the Discussion, we now write that on p.18:

      “A clear finding was the specificity of the saccade cost to spatial features; it was not observed for non-spatial features like colour, even in neurodegenerative conditions. This discrepancy challenges notions of fixed visual working memory capacity unaffected by saccades.16,44–46 The differential impact on spatial versus non-spatial features in transsaccadic memory aligns with the established "what" and "where" pathways in visual processing.32,33 For objects to remain unified, object features must be bound to stable representations of location across saccades.19 One possibility is that remapping updates both features and location through a shared mechanism, predicting equal saccadic interference for both colour and location in the present study.

      However, our findings suggest otherwise. One potential concern is whether this dissociation simply reflects the inherent spatial noise introduced by fixational eye movements (FEMs), such as microssacades and drifts.47 Because locations are stored in a retinotopic frame, fixational instability necessarily shifts retinal coordinates over time. However, the "saccade cost" here was defined as the error increase relative to a no-saccade baseline of equal duration; because both conditions are subject to the same fixational drift, any FEM-induced noise is effectively subtracted out. Thus, despite the ballistic and non-Gaussian nature of FEMs,48 they cannot account for the fact the saccade cost in the spatial memory, but total absence in the colour domain. Another possibility is that this dissociation reflects differences in baseline task difficulty or dynamic range. Yet, the presence of a robust recency effect in colour memory (Figure 2B) confirms that our paradigm was sensitive to memory-dependent variance and was not limited by floor or ceiling effects.

      The fact that identical eye movements—executed simultaneously and with identical vectors—systematically degraded spatial precision while sparing colour suggests a feature-specific susceptibility to transsaccadic remapping. This supports the view that the computational process of updating an object’s location involves a vector-subtraction mechanism—incorporating noisy oculomotor commands (efference copies)—that introduces specific spatial variance. Because this remapping is a coordinate transformation, the resulting sensorimotor noise does not functionally propagate to non-spatial feature representations. Consequently, features like colour may be preserved or automatically remapped without the precision loss associated with spatial updating.11,49 Our paradigm thus provides a refined tool to investigate the architecture of transsaccadic working memory across distinct object features.”

      (5) Relationship between saccade cost and individual memory performance (p. 4, last paragraph).

      The authors report that larger saccades were associated with greater spatial memory disruption. It would be informative to examine whether individual differences in the magnitude of saccade cost correlate with participants' overall/baseline memory performance (e.g. their memory precision in the no-saccade condition). Such analyses might offer insights into how memory capacity/ability relates to resilience against saccade-induced updating.

      We have now conducted the correlation analysis to determine whether baseline memory capacity (no-saccade condition) predicts resilience to saccade-induced updating. The results indicate that these two factors are independent.

      To clarify the nature of the saccade-induced impairment, we have updated the text as follows:

      p.4: “This “saccade cost”—the loss of memory precision following an eye movement—indicates that spatial representations require active updating across saccades rather than being maintained in a static, world-centred reference frame.”

      p.5: “Further analysis examined whether individual differences in baseline memory precision (no-saccade condition) predicted resilience to saccadic disruption. Crucially, individual saccade costs (defined as the precision loss relative to baseline) did not correlate with baseline precision (rho = 0.20, p = 0.20). This suggests that the noise introduced by transsaccadic remapping acts as an independent, additive source of variance that is not modulated by an individual’s underlying memory capacity. These findings imply a functional dissociation between the mechanisms responsible for maintaining a representation and those involved in its coordinate transformation.”

      (6) Model fitting for the healthy elderly group to reveal memory-deficit factors (pp. 11-12). The manuscript discusses model-based insights into components that contribute to spatial memory deficits in AD and PD, but does not discuss components that contribute to spatial memory deficits in the healthy elderly group. Given that the EC group also shows impairments in certain parameters, explaining and discussing these outcomes of the EC group could provide additional insights into age-related memory decline, which would strengthen the study's broader conclusions.

      This is a very good point. We rewrote the corresponding results section (p.12-13):

      “Modelling reveals the sources of spatial memory deficits in healthy aging and neurodegeneration - To understand the source of the observed deficits, we applied the winning ‘Dual (Saccade) + Interference’ model the data from all participants (YC, EC, AD, and PD). By fitting the model to the entire dataset, we obtained estimates of the parameters for each individual, which then formed the basis for our group-level analysis. To formally test for group differences, we used Parametric Empirical Bayes (PEB), a hierarchical Bayesian approach that compares parameter estimates across groups while accounting for the uncertainty of each estimate [28]. This allowed us to identify which specific cognitive mechanisms, as formalised by the model parameters, were affected by age and disease.

      The Bayesian inversion used here allows us to quantify the posterior mode and variance for each parameter and the covariance for each parameter. From these, we can compute the probabilities that pairs of parameters differ from one another, which we report as P(A>B)—meaning the posterior probability that the parameter for group A was greater than that for group B.

      We first examined the specific parameters differentiating healthy elderly (EC) from young controls (YC) to isolate the factors contributing to non-pathological, age-related decline. The analysis revealed that healthy ageing is primarily characterised by a significant increase in Radial Decay (P(EC > YC) = 0.995), a heightened susceptibility to Interference (P(EC > YC) = 1.000), and a reduction in initial Angular Encoding precision (P(YC < EC) = 0.002; Figure 6). These results suggest that normal ageing degrades the fidelity of the initial memory trace and its resilience over time, while the core computational process of updating information across saccades remains intact.

      Beyond these baseline ageing effects, our clinical cohorts exhibited more severe and condition-dependent impairments. Radial decay showed a clear, graded impairment: AD patients had a greater decay rate than PD patients (P(AD > PD) = 1.000), who in turn were more impaired than the EC group (P(PD > EC) = 0.996). A similar graded pattern was observed for Interference, where AD patients were most susceptible (P(AD > PD) = 0.999), while the PD and EC groups did not significantly differ (P(PD > EC) = 0.532).

      Patients with AD also showed a tendency towards greater angular decay than controls (P(AD > EC) = 0.772), although this fell below the 95% probability threshold. This effect was influenced by a lower decay rate in the PD group compared to the EC group (P(PD < EC) = 0.037). In contrast, group differences in encoding were less pronounced. While YC exhibited significantly higher precision than all other groups, AD patients showed significantly higher angular encoding error than PD patients (P(AD > PD) = 0.985), though neither group differed significantly from the EC group.

      Crucially, parameters related to the saccade itself—saccade encoding and saccade decay—did not differentiate the groups. This indicates that neither healthy ageing nor the early stages of AD and PD significantly impair the fundamental machinery for transsaccadic remapping. Instead, the visuospatial deficits in these conditions arise from specific mechanistic failures: a faster decay of radial position information and increased susceptibility to interference, both of which are present in healthy ageing but significantly amplified by neurodegeneration.”

      In the Discussion, we added:

      “Although saccade updating was an essential component of the winning model, its two key parameters—initial encoding error and decay rate during maintenance—did not significantly differ across groups. This indicates that the core computational process of updating spatial information based on eye movements is largely preserved in healthy aging and neurodegeneration.

      Instead, group differences were driven by deficits in angular encoding error (precision of initial angle from fixation), angular decay, radial decay (decay in memory of distance from fixation), and interference susceptibility. This implies a functional and neuroanatomical dissociation: while the ventral stream (the “what” pathway) shows an age-related decline in the quality and stability of stored representations, the dorsal-stream (the “where” pathway) parietal-frontal circuits responsible for coordinate transformations remain functionally robust.[31–34] These spatial updating mechanisms appear resilient to the normal ageing trajectory and only break down when challenged by the specific pathological processes seen in Alzheimer’s or Parkinson’s disease.”

      (7) Presentation of saccade conditions in Figure 5 (p. 11). In Figure 5, it may be clearer to group the four saccade conditions together within each patient group. Since the main point is that saccadic interference on spatial memory remains robust across patient groups, grouping conditions by patient type rather than intermixing conditions would emphasize this interpretation.

      There are several valid ways to present these plots, but we chose this format because it allows for a direct visual comparison of the post-hoc group differences within each specific task demand. This arrangement clearly illustrates the graded impairment from young controls through to patients with Alzheimer’s disease across every condition. This structure also directly mirrors our two-way ANOVA, which identified significant main effects for both Group and Condition, but crucially, no significant Group x Condition interaction. We felt that grouping the data by participant group would force readers to look across four separate clusters to compare the slopes, making the stability of the saccadic remapping mechanism much harder to grasp at a glance.

      Reviewer #1 (Recommendations for the authors):

      (1) Formatting of statistical parameters.

      The formatting of statistical symbols should be consistent throughout the manuscript. Some instances of F, p, and t are italicized, while others are not. All statistical symbols should be italicized.

      Thank you for pointing this out. We have audited the manuscript. While we have revised the text to address these instances throughout the Results and Methods sections, any remaining minor formatting inconsistencies will be corrected during the final typesetting stage.

      (2) Minor typographical issues.

      (a) Line 532: "are" should be "be."

      (b) Line 654: "cantered" should be "centered."

      (c) Line 213: In "(p(bonf) < 0.001, |t| {greater than or equal to} 5.94)," the t value should be reported with its degrees of freedom, and t should be reported before p. The same applies to line 215.

      Thank you for your careful reading. All corrected.

      Reviewer #2 (Public review):

      We thank you for your positive feedback regarding our eye-tracking methodology and computational approach. We appreciate your critical insights into the feature-specific disruption hypothesis and the task structure. We have substantially revised the results and discussion about the saccadic interference on colour memory. Below we will answer your suggestions point-by-point:

      Reviewer #2 (Recommendations for the authors):

      (1) The study treats colour and location errors as comparable when arguing that saccades selectively disrupt spatial but not colour memory. However, these measures are defined in entirely different units (degrees of visual angle vs radians on a colour wheel) and are not psychophysically or statistically calibrated. Baseline task difficulty, noise level, or dynamic range do not appear to be calibrated or matched across features. As a result, the null effect of saccades on colour could reflect lower sensitivity or ceiling effects rather than implicit feature-specific robustness.

      We agree that direct comparisons of absolute error magnitudes across different dimensions are not appropriate. Our argument for feature-specific disruption relies not on the scale of errors, but on the presence or absence of a saccade cost within identical trials. In our within-subject design, the same saccade vectors produced a systematic increase in location error while leaving colour error statistically unchanged. To address sensitivity, we observed that colour memory was sufficiently precise to show a significant recency effect (p = 0.02). To further quantify the evidence for the null effect, we performed Bayesian repeated measures ANOVAs, which yielded a BF10 = 0.22. This provides substantial evidence that saccades do not disrupt colour precision, regardless of baseline sensitivity.

      We have substantially revised this in Results, Methods and Discussion:

      In the Results:

      “This design allowed us to isolate and quantify the unique impact of saccades on spatial memory, enabling us to test competing hypotheses regarding spatial representation. If spatial memory were solely underpinned by an allocentric mechanism, precision should remain comparable across all conditions as the representation would be world-centred and unaffected by eye movements. Thus, performance in the no-saccade condition should be comparable to the two-saccade condition. Conversely, if spatial memory relies on a retinotopic representation requiring active updating across eye movements, the two-saccade condition was anticipated to be the most challenging due to cumulative decay in the memory traces used for stimulus reconstruction after each saccade.[22] Critically, we hypothesised that this saccade cost would be specific to the spatial domain; while location requires active remapping via noisy oculomotor signals, non-spatial features like colour are not inherently tied to coordinate transformations and should therefore remain stable (see more in Discussion below).

      Meanwhile, the no-saccade condition was expected to yield the most accurate localisation, relying solely on retinotopic information (retinotopic working memory). These predictions were confirmed in young healthy adults (N = 21, mean age = 24.1 years, ranged between 19 and 34). A repeated measures ANOVA revealed a significant main effect of saccades on location memory (F(2.2,43.9)=33.2, p<0.001, partial η²=0.62), indicating substantial impairment after eye movements (Figure 2A). In contrast, colour memory remained remarkably stable across all saccade conditions (Figure 2B; F(2.2, 44.7) = 0.68, p=0.53, partial η² =0.03).

      This “saccade cost”—the loss of memory precision following an eye movement—indicates that spatial representations require active updating across saccades rather than being maintained in a static, world-centred reference frame.

      Critically, our comparison between spatial and colour memory does not rely on the absolute magnitude of errors, which are measured in different units (degrees of visual angle vs. radians). Instead, we assessed the relative impact of the same saccadic demand on each feature within the same trial. While location recall showed a robust saccade cost, colour recall remained statistically unchanged. To ensure this null effect was not due to a lack of measurement sensitivity, we examined the recency effect; recall performance for the second item was predicted to be better than for the first stimulus in each condition.[23,24] As expected, colour memory for Item 2 was significantly more accurate than for Item 1 (F(1,20) = 6.52, p = 0.02, partial η² = 0.25), demonstrating that the task was sufficiently sensitive to detect standard working memory fluctuations despite the absence of a saccade-induced deficit.”

      In the Methods, at the beginning of “Statistical Analysis”, we added

      “Because location and colour recall involve different scales and units, all analyses were performed independently for each feature to avoid cross-dimensional magnitude comparisons.” (p25)

      In the Discussion, we added:

      “A potential concern is whether the observed dissociation between colour and location reflects differences in baseline task difficulty or dynamic range. Yet, the presence of a robust recency effect in colour memory (Figure 2B) confirms that our paradigm was sensitive to memory-dependent variance and was not limited by floor or ceiling effects.”

      (2) Colour and then location are probed serially, without a counter-balanced order. This fixed response order could introduce a systematic bias because location recall is consistently subject to longer memory retention intervals and cognitive interference from the colour decision. The observed dissociation-saccades impair location but not colour, and may therefore reflect task structure rather than implicit feature-specific differences in trans-saccadic memory.

      Thank you for the insightful observation regarding our fixed response order. We acknowledge that that a counterbalanced design is typically preferred to mitigate potential order effects. However, we chose this consistent sequence to ensure the task remained accessible for cognitively impaired patients (i.e., the Alzheimer’s disease (AD) and Parkinson’s disease (PD) cohorts). Conducting an eye-tracking memory task with cognitively impaired patients is challenging, as they may struggle with task engagement or forget complex instructions. During the design phase, we prioritised a consistent structure to reduce the cognitive load and task-switching demands that typically challenge these cohorts.

      Critically, because the saccade cost is a relative measure calculated by comparing conditions with identical timings, any bias from the fixed order is present in both the baseline and saccade trials. The disruption we report is therefore a specific effect of eye movements that goes beyond the noise introduced by the retention interval or the preceding colour report.

      We added the following text in the Methods – experimental procedure (p.22):

      “Recall was performed in a fixed order, with colour reported before location. This sequence was primarily chosen to minimise cognitive load and task-switching demands for the two neurological patient cohorts, ensuring the paradigm remained accessible for individuals with AD and PD. While this order results in a slightly longer retention interval for location recall, the saccade cost was identified by comparing location error across experimental conditions with similar timings but varying saccadic demands.”

      (3) Relatedly, because spatial representations are retinotopic, fixational eye movements (FEMs - microsaccades and drift) displace the retinal coordinates of encoded positions, increasing apparent spatial noise with time delays. Colour memory, however, is feature-based and unaffected by small retinal translations. Thus, any between-condition or between-group differences in FEMs could selectively inflate location error and the associated model parameters (encoding noise, decay, interference), while leaving colour error unchanged. Note that FEMs tend to be slightly ballistic [1,2], hence not well modelled with a Gaussian blur.

      This is a very insightful point. We have now addressed this in detail within the discussion:

      “However, our findings suggest otherwise. One potential concern is whether this dissociation simply reflects the inherent spatial noise introduced by fixational eye movements (FEMs), such as microssacades and drifts.[46] Because locations are stored in a retinotopic frame, fixational instability necessarily shifts retinal coordinates over time. However, the "saccade cost" here was defined as the error increase relative to a no-saccade baseline of equal duration; because both conditions are subject to the same fixational drift, any FEM-induced noise is effectively subtracted out. Thus, despite the ballistic and non-Gaussian nature of FEMs,n [47] they cannot account for the fact the saccade cost in the spatial memory, but total absence in the colour domain. Another possibility is that this dissociation reflects differences in baseline task difficulty or dynamic range. Yet, the presence of a robust recency effect in colour memory (Figure 2B) confirms that our paradigm was sensitive to memory-dependent variance and was not limited by floor or ceiling effects.”

      (4) There is no in silico demonstration that the modelling framework can recover the true generating model from synthetic data or recover accurate parameters under realistic noise levels, which can be challenging in generative models with a hierarchical structure (as per [3], for example). Figure 8b shows that the parameters possess substantial posterior covariance, which raises concerns as to whether they can be reliably disambiguate.

      Many thanks for this comment. We have added a simple recovery analysis as detailed below but are also keen to ensure we fully answer your question—which has more to do with empirical rather than simulated data—and make clear the rationale for this analysis in this instance.

      We added this in Supplementary Materials:

      “Model validation and recovery analysis

      The following section provides a detailed technical assessment of the model inversion scheme, focusing on the discriminability of the model space and the identifiability of individual parameters.

      Recovery analyses of this sort are typically used prior to collecting data to allow one to determine whether, in principle, the data are useful in disambiguating between hypotheses. In this sense, they have a role analogous to a classical power calculation. However, their utility is limited when used post-hoc when data have already been collected, as the question of whether the models can be disambiguated becomes one of whether non-trivial Bayes factors can be identified from those data.

      The reason for including a recovery analysis here is not to identify whether the model inversion scheme identifies a ‘true’ model. The concept of ‘true generative models’ commits to a strong philosophical position which is at odds with the ‘all models are wrong, but some are useful’ perspective held by many in statistics, e.g., (So, 2017). Of note, one can always confound a model recovery scheme by generating the same data in a simple way, and in (one of an infinite number of) more complex ways. A good model inversion scheme will always recover the simple model and therefore would appear to select the ‘wrong’ model in a recovery analysis. However, it is still the best explanation for the data. For these reasons, we do not necessarily expect ‘good’ recoverability in all parameter ranges. This is further confounded by the relationship between the models we have proposed—e.g., an interference model with very low interference will look almost identical to a model with no interference. The important question here is whether they can be disambiguated with real data.

      Instead, the value of a post-hoc recovery analysis here is to evaluate whether there was a sensible choice of model space—i.e., that it was not a priori guaranteed that a single model (and, specifically, the model we found to be the best explanation for the data) would explain the results of all others. To address this, for each model, we simulated 16 datasets, each of which relied upon parameters sampled from the model priors, which included examples of each of the experimental conditions. We then fit each of these datasets to each of the 7 models to construct the confusion matrix shown in the lower panel of Supplementary Figure 3, by accumulating evidence over each of the 16 participants generated according to each ‘true’ model (columns) for each of the possible explanatory models (rows). This shows that no one model, for the parameter ranges sampled here, explains all other datasets. Interestingly, our ‘winning’ model in the empirical analysis is not the best explanation for any of the datasets simulated (including its own). This is reassuring, in that it implies this model winning was not a foregone conclusion and is driven by the data—not just the choice of model space.”

      Your point about the posterior covariance is well founded. As we describe in Supplementary Materials, this is an inherent feature of inverse problems (analogous to EEG source localisation). However, the fact that our posterior densities move significantly away from the prior expectations demonstrates that the data are indeed informative. By adopting a Bayesian framework, we are able to explicitly quantify this uncertainty rather than ignoring it, providing a more transparent account of parameter identifiability. We have added the following in the same section of Supplementary Materials:

      “This problem is an inverse problem—inferring parameters from a non-linear model. We therefore expect a degree of posterior covariance between parameters and, consequently, that they cannot be disambiguated with complete certainty. While some degree of posterior covariance is inherent to inverse models—including established methods like EEG source localisation—the fact that many of the parameters are estimated with posterior densities that do not include their prior expectations implies the data are informative about these.

      The advantage of the Bayesian approach we have adopted here is that we can explicitly quantify posterior covariance between these parameters, and therefore the degree to which they can be disambiguated. While the posterior covariance matrices from empirical data are the relevant measure here, we can better understand the behaviour of the model inversion scheme in relation to the specific models used using the model recovery analysis reported in Supplementary figure 3.

      The middle panel of the figure is key, along with the correlation coefficients reported in the figure caption. Here, we see at least a weak positive correlation (in some cases much stronger) for almost all parameters and limited movement from prior expectations for those parameters that are less convincingly recovered. This reinforces that the ability of the scheme to recover parameters is best assessed in terms of the degree of movement of posterior from prior values following fitting to empirical data.”

      (5) The authors employ Bayes factors (BFs) to disambiguate models, but BFs would also strengthen the claims that location, but not colour, is impacted by saccades. Despite colour being a circular variable, colour error is analysed using ANOVA on linearised differences (radians). The authors should also arguably use circular statistics, such as the von Mises distribution, for the analysis of colour.

      Regarding the use of circular statistics, you are correct that such error distributions are not suitable for ANOVA, and it is better to use circular statistics. However, for the present dataset, we used the mean absolute angular error per condition (ranging from 0 to π radians), which represents the shortest distance on the colour wheel between the target and the response.

      This approach effectively linearises the measure by removing the 2π wrap-around boundary. because the observed errors were relatively small and did not cluster near the π boundary—even in the patient cohorts (Figure 5B)—the "wrap-around" effect of circular space is negligible. Moreover, by analysing the mean error across trials for each condition, rather than trial-wise data, we invoke the Central Limit Theorem. This ensures that the distribution of these means is approximately normal, satisfying the fundamental assumptions of ANOVA. Due to these reasons, we adopted simpler linear models. We confirmed that the data did not violate the assumptions of linear statistics. In this low-noise regime, linear and circular models converge on the same conclusions. This has been revised in Methods:

      “For colour memory, we calculated the absolute angular error, defined as the shortest distance on the colour wheel between the target and the reported colour (range 0 to π radians). For the primary statistical analyses, we utilised the mean absolute error per condition for each participant. By analysing these condition-wise means rather than trial-wise raw data, we invoke the Central Limit Theorem, which ensures that the sampling distribution of these means approximates normality. Because the absolute errors in this paradigm were relatively small and did not approach the π boundary (Figure 5B) even in the clinical cohorts, the data were treated as a continuous measure in our linear ANOVAs and regression models. Moreover, because location and colour recall involve different scales and units, all analyses were performed independently for each feature to avoid cross-dimensional magnitude comparisons.”

      We have also now integrated Bayesian repeated measures ANOVA throughout the manuscript. The Results section for the young healthy adults now reads (p. 4):

      “A repeated measures ANOVA revealed a significant main effect of saccades on location memory (F(3, 20) = 51.52, p < 0.001, partial η²=0.72), with Bayesian analysis providing decisive evidence for the inclusion of the saccade factor (BF<sub>incl</sub> = 3.52 x 10^13, P(incl|data) = 1.00). In contrast, colour memory remained remarkably stable across all saccade conditions (F(3, 20) = 0.57, p = 0.64, partial η² =0.03). This null effect was supported by Bayesian analysis, which provided moderate evidence in favour of the null hypothesis (BF<sub>01</sub> = 8.46, P(excl|data) = 0.89), indicating that the data were more than eight times more likely under the null model than a model including saccade-related impairment.”

      For elderly healthy adults:

      “In contrast, colour memory remained unaffected by saccade demands (F(3, 20) = 0.57, p = 0.65, partial η² =0.03), again supported by the Bayesian analysis: BF<sub>01</sub> = 8.68, P(excl|data) = 0.90.”

      For patient cohorts:

      “Bayesian repeated measures ANOVAs further supported this dissociation, providing moderate evidence for the null hypothesis in the AD group (BF<sub>01</sub> = 3.35, P(excl|data) = 0.77) and weak evidence in the PD group (BF<sub>01</sub> = 2.23, P(excl|data) = 0.69). This indicates that even in populations with established neurodegeneration, the detrimental impact of eye movements is specific to the spatial domain.”

      Related description is also updated in Methods – Statistical Analysis.

      Minor:

      (1) The modelling is described as computational but is arguably better characterised as a heuristic generative model at Marr's algorithmic level. It does not derive from normative computational principles or describe an implementation in neural circuits.

      We appreciate your perspective on the classification of our model within Marr’s hierarchy. We agree that our framework is best characterised as an algorithmic-level generative model. Our objective was to identify the mechanistic principles governing transsaccadic updating rather than to provide a normative derivation or a specific circuit-level implementation.

      To ensure readers do not over-interpret the term ‘computational’, we have added a clarifying statement in the Discussion acknowledging the algorithmic nature of the model. Interestingly, we note that a model predicated on this form of spatial diffusion implies a neural field representation with a spatial connectivity kernel whose limit approximates the second derivative of a Dirac delta function. While a formal neural field implementation is beyond the scope of the present work, our algorithmic results provide the necessary constraints for such future biophysical models.

      p.20: “While we describe the present framework as 'computational', it is more precisely characterised as an algorithmic-level generative model within Marr’s hierarchy. Our focus was on defining the rules of spatial integration and the sources of eye-movement-induced noise, rather than deriving these processes from normative principles or defining their specific neural implementation.”

      (2) I did not find a description of the recruitment and characterization of the AD and PD patients.

      Apologies for this omission. We have now included a detailed description of participant recruitment and clinical characterisation in the Methods section and also updated Table 2:

      “A total of 87 participants completed the study: 21 young healthy adults (YC), 21 older healthy adults (EC), 23 patients with Parkinson’s disease (PD), and 22 patients with Alzheimer’s disease (AD). Their demographic and clinical details are summarised in Table 2. Initially, 90 participants were recruited (22 YC, 21 EC, 25 PD, 22 AD); however, three individuals (1 YC and 2 PD) were excluded from all analyses due to technical issues during data acquisition.

      All participants were recruited locally in Oxford, UK. None were professional artists, had a history of psychiatric illness, or were taking psychoactive medications (excluding standard dopamine replacement therapy for PD patients). Young participants were recruited via the University of Oxford Department of Experimental Psychology recruitment system. Older healthy volunteers (all >50 years of age) were recruited from the Oxford Dementia and Ageing Research (OxDARE) database.

      Patients with PD were recruited from specialist clinics in Oxfordshire. All had a clinical diagnosis of idiopathic Parkinson's disease and no history of other major neurological or psychiatric conditions. While specific dosages of dopamine replacement therapy (e.g., levodopa equivalent doses) were not systematically recorded, all patients were tested while on their regular medication regimen ('ON' state).

      Patients with PD were recruited from clinics in the Oxfordshire area. All had a clinical diagnosis of idiopathic Parkinson’s disease and no history of other major neurological or psychiatric illnesses. While all patients were tested in their regular medication ‘ON’ state, the specific pharmacological profiles—including the exact types of medication (e.g., levodopa, dopamine agonists, or combinations) and dosages—were not systematically recorded. The disease duration and PD severity were also un-recorded for this study.

      Patients with AD were recruited from the Cognitive Disorders Clinic at the John Radcliffe Hospital, Oxford, UK. All AD participants presented with a progressive, multidomain, predominantly amnestic cognitive impairment. Clinical diagnoses were supported by structural MRI and FDG-PET imaging consistent with a clinical diagnosis of AD dementia (e.g., temporo-parietal atrophy and hypometabolism).69 All neuroimaging was reviewed independently by two senior neurologists (S.T. and M.H.).

      Global cognitive function was assessed using the Addenbrooke’s Cognitive Examination-III (ACE-III).70 All healthy participants scored above the standard cut-off of 88, with the exception of one elderly participant who scored 85. In the PD group, two participants scored below the cut-off (85 and 79). In the AD group, six participants scored above 88; these individuals were included based on robust clinical and radiological evidence of AD pathology rather than their ACE-III score alone.”

      (3) YA and OA patients appear to differ in gender distribution.

      We acknowledge the difference in gender distribution between the young (71.4% female) and older adult (57.1% female) cohorts. However, we do not anticipate that gender influences the fundamental computational mechanisms of retinotopic maintenance or transsaccadic remapping. These processes represent low-level visuospatial functions for which there is no established evidence of gender-specific differences in precision or coordinate transformation. We have ensured that the gender distribution for each cohort is clearly listed in the demographics table (Table 2) for full transparency.

      Thank you very much for very insightful feedback!

      Reviewer #3 (Public review):

      Thank you for the positive feedback regarding our inclusion of clinical groups and the identification of computational phenotypes that differentiate these cohorts.

      To address your concerns about the model, we have clarified our use of Bayesian Model Selection, which inherently penalises model complexity to ensure that our results are not driven solely by the number of parameters. We will also provide further evidence regarding model generalisability to address the concern of overfitting.

      Regarding the link with the ROCF, we have revised the manuscript to better highlight the specific relationship between our transsaccadic parameters and the ROCF data and better motivate the inclusion of these results in the main text.

      Below is our response to your suggestions point-by-point:

      (1) The models tested differ in terms of the number of parameters. In general, a larger number of parameters leads to a better goodness of fit. It is not clear how the difference in the number of parameters between the models was taken into account. It is not clear whether the modelling results could be influenced by overfitting (it is not clear how well the model can generalize to new observations).

      To ensure our results were not driven by the number of parameters, we utilised random-effects Bayesian Model Selection (BMS) to adjudicate between our candidate models. Unlike maximum likelihood methods, BMS relies on the marginal likelihood (model evidence), which inherently balances model fit against parsimony—a principle known as the Occam’s Razor (Rasmussen and Ghahramani, 2000). In this framework, a model is only preferred if the improvement in fit justifies the additional parameter space; redundant parameters actually lower model evidence by diluting the probability mass. We would be happy to point toward literature that discusses how these marginal likelihood approximations provide a more robust guard against overfitting than standard metrics like BIC or AIC (MacKay, 2003; Murray and Ghahramani, 2005; Penny, 2012).

      The fact that the "Dual (Saccade) + Interference" model (Model 7) emerged as the winner—with a Bayes Factor of 6.11 against the next best alternative—demonstrates that its complexity was statistically justified by its superior account of the trial-by-trial data.

      Furthermore, to address the risk of overfitting, we established the generalisability of these parameters by using them to predict performance on an independent clinical task. These parameters successfully explained ~62% of the variance in ROCF copy scores—a very distinct, real-world task--confirming that they represent robust computational phenotypes rather than idiosyncratic fits to the initial dataset.

      In the Results (p10):

      “We used random-effects Bayesian model selection to identify the most plausible generative model. This process relies on the marginal likelihood (model evidence), which inherently balances model fit against complexity—a principle often referred to as Occam’s razor.[25–27]”

      In the Discussion (p17):

      “Importantly, the risk of overfitting is mitigated by the Bayesian Model Selection framework; by utilising the marginal likelihood for model comparison, the procedure inherently penalises excessive model complexity and promotes generalisability.[25–27,42] This generalisability was further evidenced by the model's ability to predict performance on the independent ROCF task, confirming that these parameters represent robust mechanistic phenotypes rather than idiosyncratic fits to the initial dataset.”

      (2) Results specificity: it is not clear how specific the modelling results are with respect to constructional ability (measured via the Rey-Osterrieth Complex Figure test). As with any cognitive test, performance can also be influenced by general, non-specific abilities that contribute broadly to test success.

      We agree that constructional performance is influenced by both specific mechanistic constraints and general cognitive abilities. To isolate the unique contribution of transsaccadic updating, we therefore performed a partial correlation analysis across the entire sample. We examined the relationship between location error in the two-saccades condition (our primary behavioural measure of transsaccadic memory) and ROCF copy scores. Even after partialling out the effects of global cognitive status (ACE-III total score), age, and years of education, the correlation remained highly significant (rho = -0.39, p < 0.001).

      This suggests that our model captures a specific computational phenotype—the precision of spatial updating during active visual sampling—rather than acting as a proxy for non-specific cognitive decline. This mechanistic link explains why traditional working memory measures (e.g., digit span or Corsi blocks) frequently fail to predict drawing performance; unlike those tasks, figure copying requires thousands of saccades, making it uniquely sensitive to the precision of the dynamic remapping signals identified by our modelling framework.

      We added the following text in the Discussion (p19):

      “We also found that the relationship between transsaccadic working memory and ROCF performance remains highly significant (rho = -0.39, p < 0.001), even after controlling for age, education, and global cognitive status (ACE-III total score). Consequently, transsaccadic updating may represent a discrete computational phenotype required for visuomotor control, rather than a non-specific proxy for global cognitive decline.[57]”

      Reviewer #3 (Recommendations for the authors):

      (1) The authors mention in the introduction the following: "One key hypothesis is that we use working memory across visual fixations to update perception dynamically", citing the following manuscript:

      Harrison, W. J., Stead, I., Wallis, T. S. A., Bex, P. J. & Mattingley, J. B. A computational 906 account of transsaccadic attentional allocation based on visual gain fields. Proc. Natl. 907 Acad. Sci. U.S.A. 121, e2316608121 (2024).

      However, the manuscript above does not refer explicitly to the involvement of working memory in transaccadic integration of object location in space. Rather, it takes advantage of recent evidence showing how the true location of a visual object is represented in the activity of neurons in primary visual cortex ( A. P. Morris, B. Krekelberg, A stable visual world in primate primary visual cortex. Curr. Biol. 29, 1471-1480.e6 (2019) ). The model hypothesizes that true locations of objects are readily available, and then allocates attention in real-world coordinates, allowing efficient coordination of attention and saccadic eye movements.

      Thank you for clarification. As suggested, we have now included the citation of Morris & Krekelberg (2019) to acknowledge the evidence for stable object locations within the primary visual cortex.

      (2) The authors in the introduction and the title use the terms 'transaccadic memory' and 'spatial working memory'. However, it is not clear whether these can be used interchangeably or are reflecting different constructs.

      Classical measures of visuo-spatial working memory are derived from the Corsi task (or similar), where the location of multiple objects is displayed and subsequently remembered. In such tasks, eye movements and saccades are not generally considered, only memory performance, representing the visuo-spatial span.

      Transaccadic memory tasks are instead explicitly measuring the performance on remembered object locations of features across explicit eye movements, usually using a very limited number of objects (1 or 2, as is the case for the current manuscript).

      While the two constructs share some features, it is not clear whether they represent the same underlying ability or not, especially because in transaccadic tasks, participants are required to perform one or more saccades, thus representing a dual-task case.

      I think the relationship between 'transaccadic memory' and 'spatial working memory' should be clarified in the manuscript.

      Thank you. Yes, we have added this within the Methods - Measurement of saccade cost to clarify that spatial working memory is the broad cognitive construct responsible for short-term maintenance, whereas transsaccadic memory is the specific, dynamic process of remapping representations to maintain stability across eye movements.

      In Methods (p.22):

      “Within this framework, it is important to distinguish between the broad construct of spatial working memory and the specific process of transsaccadic memory. While spatial working memory refers to the general ability to maintain spatial information over short intervals, transsaccadic memory describes the dynamic updating of these representations—termed remapping—to ensure stability across eye movements. Unlike classical 'static' measures of spatial working memory, such as the Corsi block task which focuses on memory span, transsaccadic memory tasks explicitly require the integration of stored visual information with motor signals from intervening saccades. Our paradigm treats transsaccadic updating as a core computational process within spatial working memory, where eye-centred representations are actively reconstructed based on noisy memories of the intervening saccade vectors.”

      (3) In Figure 1, the second row indicates the presentation of item 2. Indeed, in the condition 'saccade-after-item-1', the target in the second row of Figure 1 is displaced, as expected. This clarifies the direction and amplitude of the first saccade requested. However, from Figure 1, it is hard to understand the amplitude and direction of the second requested saccade. I think the figure should be updated, giving a full description of the direction and amplitude of the second saccade as well ('saccade-after-item-2' and 'two-saccades' conditions).

      We agree that making the figure legend more self-contained is beneficial for the reader. While the specific physical parameters and the trial sequence for each condition are detailed in the Results and Methods sections, we have now updated the legend for Figure 1 to explicitly define these details. Specifically, we have clarified that the colour wheel itself served as the target for the second instructed saccade (i.e., the movement from the second fixation cross to the colour wheel location). We have also included the quantitative constraint that all saccade vectors were at least 8.5 degrees of visual angle in amplitude. Given the limited space within a figure legend, we hope these concise additions provide the transparency requested without interrupting the conceptual flow of the diagram.

      Updated Figure 1 legend:

      “Participants were asked to fixate a white cross, wherever it appeared. They had to remember the colour and location of a sequence of two briefly presented coloured squares (Item 1 and 2), each appearing within a white square frame. They then fixated a colour wheel wherever it appeared on the screen, which served as the target for the second instructed saccade (i.e., a movement from the second fixation cross to the colour wheel location). This cued recall of a specific square (Item 1 or Item 2 labelled within the colour wheel). Participants selected the remembered colour on the colour wheel which led to a square of that colour appearing on the screen. They then dragged this square to its remembered location on the screen. Saccadic demands were manipulated by varying the locations of the second frame and the colour wheel, resulting in four conditions in their reliance on retinotopic versus transsaccadic memory: (1) No-Saccade condition providing a baseline measure of within-fixation precision as no eye movements were required. (2) Saccade After Item 1; (3) Saccade After Item 2; (4) Saccades after both items (Two Saccades condition). In all conditions requiring eye movements, saccade vectors were constrained to a minimum amplitude of 8.5° (degrees of visual angle). While the No-Saccade condition isolates retinotopic working memory, conditions (2) to (4) collectively quantify the impact of varying saccadic demands and timings on the maintenance of spatial information, thereby assessing the efficacy of the transsaccadic updating process.”

      (4) The authors write: "Eye tracking analysis confirmed high compliance: participants correctly maintained fixation or executed saccades as instructed on the vast majority of trials (83% {plus minus} 14%). Non-compliant trials were excluded 136 from further analysis." 14% of excluded trials are a substantial fraction of trials, given the task requirements. Is this proportion of excluded trials different between experimental groups, and are experimental groups contributing equally to this proportion?

      We thank the reviewer for pointing this out, and we apologise for the confusion. The 83% trial number was actually across all four cohorts, and all conditions, and it was actually above 90% for YC, EC and even AD, but dropped to 60 ish in PD group.

      We now have conducted a full analysis of compliant trial counts using a mixed ANOVA (4 saccade conditions x 4 cohorts). This analysis revealed a main effect of group (F(3, 80) = 8.06, p < 0.001), which was driven by lower compliance in the PD cohort (mean approx. 25.4 trials per condition) compared to the AD, EC, and YC cohorts (means ranging from 35.8 to 38.9 trials per condition). Crucially, however, the interaction between group and condition was not statistically significant (p = 0.151). This indicates that the relative impact of saccade demands on trial retention was consistent across all four groups.

      Because our primary behavioural measure—the saccade cost—is a within-subject comparison of impairment across conditions, these differences in absolute trial numbers do not introduce a systematic bias into our findings. Furthermore, even with the higher attrition in the PD group, we retained a sufficient number of high-quality trials (minimum mean of ~23 trials in the most demanding condition) to support robust trial-by-trial parameter estimation and valid statistical inference. We have updated the Results and Methods to reflect these details.

      In Results (p4):

      “To mitigate potential confounds, we monitored eye position throughout the experiment. Eye-tracking analysis confirmed high compliance in healthy adults, who followed instructions on the vast majority of trials (Younger Adults: 97.2 ± 5.2 %; Older Adults: 91.3 ± 20.4 %). The mean difference between these groups was negligible, representing just 1.25 trials per condition, and was not statistically significant (t(80) = 0.16, p = 1.000; see more in Methods – Eyetracking data analysis). Non-compliant trials were excluded from all further analyses.”

      In Methods (p27):

      “Eye-tracking analysis confirmed high compliance overall, with participants correctly maintaining fixation or executing saccades on the vast majority of trials (83% across all participants). A mixed ANOVA revealed a main effect of group on trial retention (F(3, 80) = 8.06, p < 0.001, partial η² = 0.23), primarily due to lower compliance in the PD cohort (YC: 97±4%; EC: 91±10%; AD: 95±5%; PD: 63±38%). Importantly, there was no significant interaction between group and saccade condition (F(3.36, 80) = 1.78, p = 0.15, partial η² = 0.008), suggesting that trial attrition was not disproportionately affected by specific task demands in any group.

      We acknowledge that this reduced trial count in the PD group represents a limitation for across-cohort comparison. However, the absolute number of compliant trials in PD group (mean approx. 25 per condition) remained sufficient for robust trial-by-trial parameter estimation. Furthermore, the lack of a significant group-by-condition interaction confirms that the results reported for this cohort remain valid and that our primary finding of a selective spatial memory deficit is robust to these differences in data retention.”

      (5) Modelling

      (a) Degrees of freedom, cross-validation, number of parameters.

      I appreciate the effort in introducing and testing different models. Models of increase in complexity and are based on different assumptions about the main drivers and mechanisms underlying the dependent variable. The models differ in the number of parameters. How are the differences in the number of parameters between models taken into account in the modelling analysis? Is there a cost associated with the extra parameters included in the more complex models?

      (b) Cross-validation and overfitting.

      Overfitting can occur when a model learns the training data but cannot generalize to novel datasets. Cross-validation is one approach that can be used to avoid overfitting. Was cross-validation (or other approaches) implemented in the fitting procedure against overfitting? Otherwise, the inference that can be derived from the modelled parameters can be limited.

      To address your concerns regarding model complexity and overfitting, we would like to clarify our use of Bayesian Model Selection (BMS). Unlike frequentist methods that often rely on cross-validation to assess generalisability, we used random-effects BMS based on the marginal likelihood (model evidence). This approach inherently implements Bayesian Occam’s Razor by integrating out the parameters. Under this framework, the use of the marginal likelihood for model selection provides a mathematically equivalent safeguard to frequentist cross-validation, as it evaluates the model's ability to generalise across the entire parameter space rather than just finding a maximum likelihood fit for the training data. Thus, models are penalised not just for the absolute number of parameters, but for their overall functional flexibility. A more complex model is only preferred if the improvement in model fit is substantial enough to outweigh this inherent penalty. The emergence of Model 7 as the winner (Bayes Factor = 6.11 against the next best alternative) confirms that its additional complexity is statistically justified.

      Furthermore, in this study we provided an external validation of these recovered parameters by demonstrating that they explain 62% of the variance in an independent, real-world, clinical task (ROCF copy). This empirical evidence confirms that our model captures robust mechanistic phenotypes rather than idiosyncratic noise. We have updated the Results and Discussion to explicitly state these.

      In Results: (p10)

      “We used random-effects Bayesian model selection to identify the most plausible generative model. This process relies on the marginal likelihood (model evidence), which inherently balances model fit against complexity—a principle often referred to as Occam’s razor.[26–28]”

      In Discussion: (p17)

      “Importantly, the risk of overfitting is mitigated by the Bayesian Model Selection framework; by utilising the marginal likelihood for model comparison, the procedure inherently penalises excessive model complexity and promotes generalisability.[26–28,43] This generalisability was further evidenced by the model's ability to predict performance on the independent ROCF task, confirming that these parameters represent robust mechanistic phenotypes rather than idiosyncratic fits to the initial dataset.”

      (6) n. of participants.

      (a) The authors write the following: "A total of healthy volunteers (21 young adults, mean age = 24.1 years; 21 older adults, mean age = 72.4 years) participated in this study. Their demographics are shown in Table 1. All participants were recruited locally in Oxford." However, Table 1 reports the data from more than 80 participants, divided into 4 groups. Details about the PD and AD groups are missing. Please clarify.

      We apologize for this lack of clarity in the text. We have rewrote and expand the “Participants” section and corrected Table 2 in the Methods section to reflect the correct number of participants.

      In Methods (p20):

      “A total of 87 participants completed the study: 21 young healthy adults (YC), 21 older healthy adults (EC), 23 patients with Parkinson’s disease (PD), and 22 patients with Alzheimer’s disease (AD). Their demographic and clinical details are summarised in Table 2. Initially, 90 participants were recruited (22 YC, 21 EC, 25 PD, 22 AD); however, three individuals (1 YC and 2 PD) were excluded from all analyses due to technical issues during data acquisition.

      All participants were recruited locally in Oxford, UK. None were professional artists, had a history of psychiatric illness, or were taking psychoactive medications (excluding standard dopamine replacement therapy for PD patients). Young participants were recruited via the University of Oxford Department of Experimental Psychology recruitment system. Older healthy volunteers (all >50 years of age) were recruited from the Oxford Dementia and Ageing Research (OxDARE) database.

      Patients with PD were recruited from specialist clinics in Oxfordshire. All had a clinical diagnosis of idiopathic Parkinson's disease and no history of other major neurological or psychiatric conditions. While specific dosages of dopamine replacement therapy (e.g., levodopa equivalent doses) were not systematically recorded, all patients were tested while on their regular medication regimen ('ON' state).

      Patients with PD were recruited from clinics in the Oxfordshire area. All had a clinical diagnosis of idiopathic Parkinson’s disease and no history of other major neurological or psychiatric illnesses. While all patients were tested in their regular medication ‘ON’ state, the specific pharmacological profiles—including the exact types of medication (e.g., levodopa, dopamine agonists, or combinations) and dosages—were not systematically recorded. The disease duration and PD severity were also un-recorded for this study.

      Patients with AD were recruited from the Cognitive Disorders Clinic at the John Radcliffe Hospital, Oxford, UK. All AD participants presented with a progressive, multidomain, predominantly amnestic cognitive impairment. Clinical diagnoses were supported by structural MRI and FDG-PET imaging consistent with a clinical diagnosis of AD dementia (e.g., temporo-parietal atrophy and hypometabolism).[70] All neuroimaging was reviewed independently by two senior neurologists (S.T. and M.H.).

      Global cognitive function was assessed using the Addenbrooke’s Cognitive Examination-III (ACE-III).[71] All healthy participants scored above the standard cut-off of 88, with the exception of one elderly participant who scored 85. In the PD group, two participants scored below the cut-off (85 and 79). In the AD group, six participants scored above 88; these individuals were included based on robust clinical and radiological evidence of AD pathology rather than their ACE-III score alone.”

      (b) As modelling results rely heavily on the quality of eye movements and eye traces, I believe it is necessary to report details about eye movement calibration quality and eye traces quality for the 4 experimental groups, as noisier data could be expected from naïve and possibly older participants, especially in case of clinical conditions. Potential differences in quality between groups should be discussed in light of the results obtained and whether these could contribute to the observed patterns.

      Thank you for pointing this out. We have revised the Methods about how calibration was done:

      (p27) “Prior to the experiment, a standard nine-point calibration and validation procedure was performed. Participants were instructed to fixate a small black circle with a white centre (0.5 degrees) as it appeared sequentially at nine points forming a 3 x 3 grid across the screen. Calibration was accepted only if the mean validation error was below 0.5 degrees and the maximum error at any single point was below 1.0 degree. If these criteria were not met, or if the experimenter noticed significant gaze drift between blocks, the calibration procedure was repeated. This calibration ensured high spatial accuracy across the entire display area, facilitating the precise monitoring of fixations on item frames and saccadic movements to the response colour wheel.”

      Moreover, as detailed in our response to Point 4, while the PD group exhibited lower compliance, there was no interaction between group and saccade condition for compliance (p = 0.151). This confirms that any noise or trial attrition was distributed evenly across experimental conditions. Consequently, the observed "saccade cost" (the difference in error between conditions) is not an artefact of unequal noise but represents a genuine mechanistic impairment in spatial updating. We have updated the Methods to clarify this distinction.

      Furthermore, our Bayesian framework explicitly estimates precision (random noise) as a distinct parameter from updating cost (saccade cost). This allows the model to partition the variance: even if a clinical group is "noisier" overall, this is captured by the precision parameter, ensuring it does not inflate the specific estimate of saccade-driven memory impairment.

      (7) Figure 5. I suggest reporting these results using boxplots instead of barplots, as the former gives a better overview of the distributions.

      We appreciate the suggestion to use boxplots to better illustrate data distributions. However, we have chosen to retain the current bar plot format due to the visual and statistical complexity of our 4 x 4 x 2 experimental design. Figure 5 represents 16 distinct distributions across four groups and four conditions for both location and colour measures; employing boxplots/violins for this density of data would significantly increase visual clutter and make the figure difficult to parse.

      Furthermore, the primary objective of this figure is to reflect the statistical analysis and illustrate group differences in overall performance and highlight the specific finding that patients with AD were significantly more impaired across all conditions compared to YC, EC, and PD groups. Our statistical focus remains on the mean effects—specifically the significant main effect of group (F(3, 318) = 59.71, p < 0.001) and the critical null-interaction between group and condition (p = 0.90). The error measure most relevant to these comparisons is the standard error of the mean (SEM), rather than the interquartile range (IQR). We think that bar plots provide the most straightforward and scannable representation of these mean differences and the consistent pattern of decay across cohorts for the final manuscript layout.

      To address the reviewer’s request for distributional transparency, we have provided a version of Figure 5 using grouped boxplots in the supplementary material (Supplementary figure 2). We note, however, that the spread of raw data points in these plots does not directly reflect the variance associated with our within-subject statistical comparisons.

      (8) Results specificity, trans-saccadic integration and ROCF. The authors demonstrate that the derived model parameters account for a significant amount of variability in ROCF performance across the experimental groups tested (Figure 8A). However, it remains unclear how specific the modelling results are with respect to the ROCF.

      The ROCF is generally interpreted as a measure of constructional ability. Nevertheless, as with any cognitive test, performance can also be influenced by more general, non-specific abilities that contribute broadly to test success. To more clearly link the specificity between modelling results and constructional ability, it would be helpful to include a test measure for which the model parameters would not be expected to explain performance, for example, a verbal working memory task.

      I am not necessarily suggesting that new data should be collected. However, I believe that the issue of specificity should be acknowledged and discussed as a potential limitation in the current context.

      We appreciate this important point regarding the discriminant validity of our findings. We agree that cognitive performance in clinical populations is often influenced by a general "g-factor" or non-specific executive decline. However, we chose the ROCF Copy task specifically because it is a hallmark clinical measure of constructional ability that effectively serves as a real-world transsaccadic task, requiring participants to integrate spatial information across hundreds of saccades between the model figure and the drawing surface.

      To address the reviewer’s concern regarding specificity, we leveraged the fact that all participants completed the ACE-III, which includes a dedicated verbal memory component (the ACE Memory subscale). We conducted a partial correlation analysis and found that the relationship between transsaccadic working memory and ROCF copy performance remains highly significant (rho = -0.46, p < 0.001), even after controlling for age, education, and the ACE-III Memory subscale score. This suggests that the link between transsaccadic updating and constructional ability is mechanistically specific rather than a byproduct of global cognitive impairment. We have substantially revised the Discussion to highlight this link and the supporting statistical evidence.

      We first updated the last paragraph of Introduction:

      “Finally, by linking these mechanistic parameters to a standard clinical measure of constructional ability (the Rey-Osterrieth Complex Figure task), we demonstrate that transsaccadic updating represents a core computational phenotype underpinning real-world visuospatial construction in both health and neurodegeneration.”

      The new section in Discussion highlighting the ROCF copy link:

      “Importantly, our computational framework establishes a direct mechanistic link between trassaccadic updating and real-world constructional ability. Specifically, higher saccade and angular encoding errors contribute to poorer ROCF copy scores. By mapping these mechanistic estimates onto clinical scores, we found that the parameters derived from our winning model explain approximately 62% of the variance in constructional performance across groups. These findings suggest that the computational parameters identified in the LOCUS task represent core phenotypes of visuospatial ability, providing a mechanistic bridge between basic cognitive theory and clinical presentation.

      This relationship provides novel insights into the cognitive processes underlying drawing, specifically highlighting the role of transsaccadic working memory. Previous research has primarily focused on the roles of fine motor control and eye-hand coordination in this skill.[4,50–55] This is partly because of consistent failure to find a strong relation between traditional memory measures and copying ability.[4,31] For instance, common measures of working memory, such as digit span and Corsi block tasks, do not directly predict ROCF copying performance.[31,56] Furthermore, in patients with constructional apraxia, these memory performance often remain relatively preserved despite significant drawing impairments.[56–58] In literature, this lack of association has often been attributed to “deictic” visual-sampling strategies, characterised by frequent eye movements that treat the environment as an external memory buffer, thereby minimising the need to maintain a detailed internal representation.[4,59] In a real-world copying task, the ROCF requires a high volume of saccades, making it uniquely sensitive to the precision of the dynamic remapping signals identified here. Recent eye-tracking evidence confirms that patients with AD exhibit significantly more saccades and longer fixations during figure copying compared to controls, potentially as a compensatory response to trassaccadic working memory constraints.[56] This high-frequency sampling—averaging between 150 and 260 saccades for AD patients compared to approximately 100 for healthy controls—renders the task highly dependent on the precision of dynamic remapping signals.[56] We also found that the relationship between transsaccadic working memory and ROCF performance remains highly significant (rho = -0.46, p < 0.001), even after controlling for age, education, and ACE-III Memory subscore. Consequently, transsaccadic updating may represent a discrete computational phenotype required for visuomotor control, rather than a non-specific proxy for global cognitive decline.[58]

      In other words, even when visual information is readily available in the world, the act of drawing performance depends critically on working memory across saccades. This reveals a fundamental computational trade-off: while active sampling strategies (characterised with frequent eye-hand movements) effectively reduce the load on capacity-limited working memory, they simultaneously increase the demand for precise spatial updating across eye movements. By treating the external world as an "outside" memory buffer, the brain minimises the volume of information it must hold internally, but it becomes entirely dependent on the reliability with which that information is remapped after each eye movement. This perspective aligns with, rather contradicts, the traditional view of active sampling, which posits that individuals adapt their gaze and memory strategies based on specific task demands.[3,60] Furthermore, this perspective provides a mechanistic framework for understanding constructional apraxia; in these clinical populations, the impairment may not lie in a reduced memory "span," but rather in the cumulative noise introduced by the constant spatial remapping required during the copying process.[58,61]

      Beyond constructional ability, these findings suggest that the primary evolutionary utility of high-resolution spatial remapping lies in the service of action rather than perception. While spatial remapping is often invoked to explain perceptual stability,[11–13,15] the necessity of high-resolution transsaccadic memory for basic visual perception is debated.[13,62–64] A prevailing view suggests that detailed internal models are unnecessary for perception, given the continuous availability of visual information in the external world.[13,44] Our findings support an alternative perspective, aligning with the proposal that high-resolution transsaccadic memory primarily serves action rather than perception.[13] This is consistent with the need for precise localisation in eye-hand coordination tasks such as pointing or grasping.[65] Even when unaware of intrasaccadic target displacements, individuals rapidly adjust their reaching movements, suggesting direct access of the motor system to remapping signals.[66] Further support comes from evidence that pointing to remembered locations is biased by changes in eye position,[67] and that remapping neurons reside within the dorsal “action” visual pathway, rather than the ventral “perception” visual pathway.[13,68,69] By demonstrating a strong link between transsaccadic working memory and drawing (a complex fine motor skill), our findings suggest that precise visual working memory across eye movements plays an important role in complex fine motor control.”

      We are deeply grateful to the reviewers for their meticulous reading of our manuscript and for the constructive feedback provided throughout this process. Your insights have significantly enhanced the clarity and rigour of our work.

      In addition to the changes requested by the reviewers, we wish to acknowledge a reporting error identified during the revision process. In the original Results section, the repeated measures ANOVA statistics for YC included Greenhouse-Geisser corrections, and the between-subjects degrees of freedom were incorrectly reported as within-subjects residuals. Upon re-evaluation of the data, we confirmed that the assumption of sphericity was not violated; therefore, we have removed the unnecessary Greenhouse-Geisser corrections and corrected the degrees of freedom throughout the Results and Methods sections. We have ensured that these statistical updates are reflected accurately in the revised manuscript and that they do not alter the significance or interpretation of any of our primary findings.

      We hope that these revisions address all the concerns raised and provide a more robust account of our findings. We look forward to your further assessment of our work.

    1. eLife Assessment

      Studying the biological roles of polyphosphates in metazoans has been a longstanding challenge to the field given that the polyP synthase has yet to be discovered in metazoans. This important study capitalizes on the sophisticated genetics available in the Drosophila system and uses a combination of methodologies to start to tease apart how polyphosphate participates in Drosophila development and in the clotting of Drosophila hemolymph. The data validating one of these tools (cyto-FLYX ) are solid and well-documented and they will open up a field of research into the functional roles of polyP in a metazoan model. The other tools for tissue specific knockdown of polyP (Mito-FLYX, ER-FLYX, and Nuc-FLYX) have not yet been validated but will be invaluable to the field when they are.

    2. Reviewer #2 (Public review):

      Summary:

      The authors of this paper note that although polyphosphate (polyP) is found throughout biology, the biological roles of polyP have been under-explored, especially in multicellular organisms. The authors created transgenic Drosophila that expressed a yeast enzyme that degrades polyP, targeting the enzyme to different subcellular compartments (cytosol, mitochondria, ER, and nucleus, terming these altered flies Cyto-FLYX, Mito-FLYX, etc.). The authors show the localization of polyP in various wild-type fruit fly cell types and demonstrate that the targeting vectors did indeed result in expression of the polyP degrading enzyme in the cells of the flies. They then go on to examine the effects of polyP depletion using just one of these targeting systems (the Cyto-FLYX). The primary findings from depletion of cytosolic polyP levels in these flies is that it accelerates eclosion and also appears to participate in hemolymph clotting. Perhaps surprisingly, the flies seemed otherwise healthy and appeared to have little other noticeable defects. The authors use transcriptomics to try to identify pathways altered by the cyto-FLYX construct degrading cytosolic polyP, and it seems likely that their findings in this regard will provide avenues for future investigation. And finally, although the authors found that eclosion is accelerated in pupae of Drosophila expressing the Cyto-FLYX construct, the reason why this happens remains unexplained.

      Strengths:

      The authors capitalize on the work of other investigators who had previously shown that expression of recombinant yeast exopolyphosphatase could be targeted to specific subcellular compartments to locally deplete polyP, and they also use a recombinant polyP binding protein (PPBD) developed by others to localize polyP. They combine this with the considerable power of Drosophila genetics to explore the roles of polyP by depleting it in specific compartments and cell types to tease out novel biological roles for polyP in a whole organism. This is a substantial advance.

      Weaknesses:

      Page 4 of Results (paragraph 1): I'm a bit concerned about the specificity of PPBD as a probe for polyP. The authors show that the fusion partner (GST) isn't responsible for the signal, but I don't think they directly demonstrate that PPBD is binding only to polyP. Could it also bind to other anionic substances? A useful control might be to digest the permeabilized cells and tissues with polyphosphatase prior to PPBD staining, and show that the staining is lost.

      In the hemolymph clotting experiments, the authors collected 2 ul of hemolymph and then added 1 ul of their test substance (water or a polyP solution). They state that they added either 0.8 or 1.6 nmol polyP in these experiments (the description in the Results differs from that of the Methods). I calculate this will give a polyP concentration of 0.3 or 0.6 mM. This is an extraordinarily high polyP concentration, and is much in excess of the polyP concentrations used in most of the experiments testing the effects of polyP on clotting of mammalian plasma. Why did the authors choose this high polyP concentration? Did they try lower concentrations? It seems possible that too high a polyP concentration would actually have less clotting activity than the optimal polyP concentration.

      In the revised version of the manuscript, the authors have productively responded to the previous criticisms. Their new data show stronger controls regarding the specificity of PPBD with regard to its interaction with polyP. The authors also have repeated their hemolymph clotting experiments with lower polyP concentrations, which are likely to be more physiological.

    3. Reviewer #3 (Public review):

      Summary:

      Sarkar, Bhandari, Jaiswal and colleagues establish a suite of quantitative and genetic tools to use Drosophila melanogaster as a model metazoan organism to study polyphosphate (polyP) biology. By adapting biochemical approaches for use in D. melanogaster, they identify a window of increased polyP levels during development. Using genetic tools, they find that depleting polyP from the cytoplasm alters the timing of metamorphosis, accelerationg eclosion. By adapting subcellular imaging approaches for D. melanogaster, they observe polyP in the nucleolus of several cell types. They further demonstrate that polyP localizes to cytoplasmic puncta in hemocytes, and further that depleting polyP from the cytoplasm of hemocytes impairs hemolymph clotting. Together, these findings establish D. melanogaster as a tractable system for advancing our understanding of polyP in metazoans.

      Strengths:

      • The FLYX system, combining cell type and compartment-specific expression of ScPpx1, provides a powerful tool for the polyP community.

      • The finding that cytoplasmic polyP levels change during development and affect the timing of metamorphosis is an exciting first step in understanding the role of polyP in metazoan development, and possible polyP-related diseases.

      • Given the significant existing body of work implicating polyP in the human blood clotting cascade, this study provides compelling evidence that polyP has an ancient role in clotting in metazoans.

      Limitations:

      • While the authors demonstrate that HA-ScPpx1 protein localizes to the target organelles in the various FLYX constructs, the capacity of these constructs to deplete polyP from the different cellular compartments is not shown. This is an important control to both demonstrate that the GTS-PPBD labeling protocol works, and also to establish the efficacy of compartment-specific depletion. While not necessary to do for all the constructs, it would be helpful to do this for the cyto-FLYX and nuc-FLYX.

      • The cell biological data in this study clearly indicates that polyP is enriched in the nucleolus in multiple cell types, consistent with recent findings from other labs, and also that polyP affects gene expression during development. Given that the authors also generate the Nuc-FLYX construct to deplete polyP from the nucleus, it is surprising that they test how depleting cytoplasmic but not nuclear polyP affects development. However, providing these tools is a service to the community, and testing the phenotypic consequences of all the FLYX constructs may arguably be beyond the scope of this first study.

      Editors' note: The authors have satisfactorily responded to our most major concerns related to the specificity of PPDB and the physiological levels of polyPs in the clotting experiments. We also recognise the limitations related to the depletion of polyP in other tissues and hope that these data will be made available soon.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Polymers of orthophosphate of varying lengths are abundant in prokaryotes and some eukaryotes, where they regulate many cellular functions. Though they exist in metazoans, few tools exist to study their function. This study documents the development of tools to extract, measure, and deplete inorganic polyphosphates in *Drosophila*. Using these tools, the authors show:

      (1) That polyP levels are negligible in embryos and larvae of all stages while they are feeding. They remain high in pupae but their levels drop in adults.

      (2) That many cells in tissues such as the salivary glands, oocytes, haemocytes, imaginal discs, optic lobe, muscle, and crop, have polyP that is either cytoplasmic or nuclear (within the nucleolus).

      (3) That polyP is necessary in plasmatocytes for blood clotting in Drosophila.

      (4) That ployP controls the timing of eclosion.

      The tools developed in the study are innovative, well-designed, tested, and well-documented. I enjoyed reading about them and I appreciate that the authors have gone looking for the functional role of polyP in flies, which hasn't been demonstrated before. The documentation of polyP in cells is convincing as its role in plasmatocytes in clotting.

      We sincerely thank the reviewer for their encouraging assessment and for recognizing both the innovation of the FLYX toolkit and the functional insights it enables. Their remarks underscore the importance of establishing Drosophila as a tractable model for polyP biology, and we are grateful for their constructive feedback, which further strengthened the manuscript.

      Its control of eclosion timing, however, could result from non-specific effects of expressing an exogenous protein in all cells of an animal.

      We now explicitly state this limitation in the revised manuscript (p.16, l.347–349). The issue is that no catalytic-dead ScPpX1 is available as a control in the field. We plan to generate such mutants through systematic structural and functional studies and will update the FLYX toolkit once they are developed and validated. Importantly, the accelerated eclosion phenotype is reproducible and correlates with endogenous polyP dynamics.

      The RNAseq experiments and their associated analyses on polyP-depleted animals and controls have not been discussed in sufficient detail.  In its current form, the data look to be extremely variable between replicates and I'm therefore unsure of how the differentially regulated genes were identified.

      We thank the reviewer for pointing out the lack of clarity. We have expanded our RNAseq analysis in the revised manuscript (p.20, l.430–434). Because of inter-sample variation (PC2 = 19.10%, Fig. S7B), we employed Gene Set Enrichment Analysis (GSEA) rather than strict DEG cutoffs. This method is widely used when the goal is to capture pathway-level changes under variability (1). We now also highlight this limitation explicitly (p.20, l.430–432) and provide an additional table with gene-specific fold change (See Supplementary Table for RNA Sequencing Sheet 1). Please note that we have moved RNAseq data to Supplementary Fig. 7 and 8 as suggested in the review.

      It is interesting that no kinases and phosphatases have been identified in flies. Is it possible that flies are utilising the polyP from their gut microbiota? It would be interesting to see if these signatures go away in axenic animals.

      This is an interesting possibility. Several observations argue that polyP is synthesized by fly tissues: (i) polyP levels remain very low during feeding stages but build up in wandering third instar larvae after feeding ceases; (ii) PPBD staining is absent from the gut except the crop (Fig. S3O–P); (ii) In C. elegans, intestinal polyP was unaffected when worms were fed polyP-deficient bacteria (2); (iv) depletion of polyP from plasmatocytes alone impairs hemolymph clotting, which would not be expected if gut-derived polyP were the major source and may have contributed to polyP in hemolymph. Nevertheless, we agree that microbiota-derived polyP may contribute, and we plan systematic testing in axenic flies in future work.

      Reviewer #2 (Public review):

      Summary:

      The authors of this paper note that although polyphosphate (polyP) is found throughout biology, the biological roles of polyP have been under-explored, especially in multicellular organisms. The authors created transgenic Drosophila that expressed a yeast enzyme that degrades polyP, targeting the enzyme to different subcellular compartments (cytosol, mitochondria, ER, and nucleus, terming these altered flies Cyto-FLYX, Mito-FLYX, etc.). The authors show the localization of polyP in various wild-type fruit fly cell types and demonstrate that the targeting vectors did indeed result in the expression of the polyP degrading enzyme in the cells of the flies. They then go on to examine the effects of polyP depletion using just one of these targeting systems (the Cyto-FLYX). The primary findings from the depletion of cytosolic polyP levels in these flies are that it accelerates eclosion and also appears to participate in hemolymph clotting. Perhaps surprisingly, the flies seemed otherwise healthy and appeared to have little other noticeable defects. The authors use transcriptomics to try to identify pathways altered by the cyto-FLYX construct degrading cytosolic polyP, and it seems likely that their findings in this regard will provide avenues for future investigation. And finally, although the authors found that eclosion is accelerated in the pupae of Drosophila expressing the Cyto-FLYX construct, the reason why this happens remains unexplained.

      Strengths:

      The authors capitalize on the work of other investigators who had previously shown that expression of recombinant yeast exopolyphosphatase could be targeted to specific subcellular compartments to locally deplete polyP, and they also use a recombinant polyP-binding protein (PPBD) developed by others to localize polyP. They combine this with the considerable power of Drosophila genetics to explore the roles of polyP by depleting it in specific compartments and cell types to tease out novel biological roles for polyP in a whole organism. This is a substantial advance.

      We are grateful to the reviewer for their thorough and thoughtful evaluation. Their balanced summary of our work, recognition of the strengths of our genetic tools, and constructive suggestions have been invaluable in clarifying our experiments and strengthening the conclusions.

      Weaknesses:

      Page 4 of the Results (paragraph 1): I'm a bit concerned about the specificity of PPBD as a probe for polyP. The authors show that the fusion partner (GST) isn't responsible for the signal, but I don't think they directly demonstrate that PPBD is binding only to polyP. Could it also bind to other anionic substances? A useful control might be to digest the permeabilized cells and tissues with polyphosphatase prior to PPBD staining and show that the staining is lost.

      To address this concern, we have done two sets of experiments:

      (1) We generated a PPBD mutant (GST-PPBD<sup>Mut</sup>). We establish that GST-PPBD binds to polyP-2X FITC, whereas GST-PPBD<sup>Mut</sup> and GST do not bind polyP<sub>100</sub>-2X FITC using Microscale Thermophoresis. We found that, unlike the punctate staining pattern of GST-PPBD (wild-type), GST-PPBD<sup>Mut</sup> does not stain hemocytes. This data has been added to the revised manuscript (Fig. 2B-D, p.8, l.151–165).

      (2) A study in C.elegans by Quarles et.al has performed a similar experiment, suggested by the reviewer. In that study, treating permeabilized tissues with polyphosphatase prior to PPBD staining resulted in a decrease of PPBD-GFP signal from the tissues (2). We also performed the same experiment where we subjected hemocytes to GST-PPBD staining with prior incubation of fixed and permeabilised hemocytes with ScPpX1 and heat-inactivated ScPpX1 protein. We find that both staining intensity and the number of punctae are higher in hemocytes left untreated and in those treated with heat-inactivated ScPpX1. The hemocytes pre-treated with ScPpX1 showed reduced staining intensity and number of punctae. This data has been added to the revised manuscript (Fig. 2E-G, p.8, l.166-172).

      Further, Saito et al. reported that PPBD binds to polyP in vitro, as well as in yeast and mammalian cells, with a high affinity of ~45µM for longer polyP chains (35 mer and above) (3). They also show that the affinity of PPBD with RNA and DNA is very low. Furthermore, PPBD could detect differences in polyP labeling in yeasts grown under different physiological conditions that alter polyP levels (3). Taken together, published work and our results suggest that PPBD specifically labels polyP.

      In the hemolymph clotting experiments, the authors collected 2 ul of hemolymph and then added 1 ul of their test substance (water or a polyP solution). They state that they added either 0.8 or 1.6 nmol polyP in these experiments (the description in the Results differs from that of the Methods). I calculate this will give a polyP concentration of 0.3 or 0.6 mM. This is an extraordinarily high polyP concentration and is much in excess of the polyP concentrations used in most of the experiments testing the effects of polyP on clotting of mammalian plasma. Why did the authors choose this high polyP concentration? Did they try lower concentrations? It seems possible that too high a polyP concentration would actually have less clotting activity than the optimal polyP concentration.

      We repeated the assays using 125 µM polyP, consistent with concentrations employed in mammalian plasma studies (4,5). Even at this lower, physiologically relevant concentration, polyP significantly enhanced clot fibre formation (Included as Fig. S5F–I, p.12, l.241–243). This reconfirms the conclusion that polyP promotes hemolymph clotting.

      Author response image 1.

      Reviewer #3 (Public review):

      Summary:

      Sarkar, Bhandari, Jaiswal, and colleagues establish a suite of quantitative and genetic tools to use Drosophila melanogaster as a model metazoan organism to study polyphosphate (polyP) biology. By adapting biochemical approaches for use in D. melanogaster, they identify a window of increased polyP levels during development. Using genetic tools, they find that depleting polyP from the cytoplasm alters the timing of metamorphosis, accelerating eclosion. By adapting subcellular imaging approaches for D. melanogaster, they observe polyP in the nucleolus of several cell types. They further demonstrate that polyP localizes to cytoplasmic puncta in hemocytes, and further that depleting polyP from the cytoplasm of hemocytes impairs hemolymph clotting. Together, these findings establish D. melanogaster as a tractable system for advancing our understanding of polyP in metazoans.

      Strengths:

      (1) The FLYX system, combining cell type and compartment-specific expression of ScPpx1, provides a powerful tool for the polyP community.

      (2) The finding that cytoplasmic polyP levels change during development and affect the timing of metamorphosis is an exciting first step in understanding the role of polyP in metazoan development, and possible polyP-related diseases.

      (3) Given the significant existing body of work implicating polyP in the human blood clotting cascade, this study provides compelling evidence that polyP has an ancient role in clotting in metazoans.

      We sincerely thank the reviewer for their generous and insightful comments. Their recognition of both the technical strengths of the FLYX system and the broader biological implications reinforces our confidence that this work will serve as a useful foundation for the community.

      Limitations:

      (1) While the authors demonstrate that HA-ScPpx1 protein localizes to the target organelles in the various FLYX constructs, the capacity of these constructs to deplete polyP from the different cellular compartments is not shown. This is an important control to both demonstrate that the GTS-PPBD labeling protocol works, and also to establish the efficacy of compartment-specific depletion. While not necessary to do this for all the constructs, it would be helpful to do this for the cyto-FLYX and nuc-FLYX.

      We confirmed polyP depletion in Cyto-FLYX using the malachite green assay (Fig. 3D, p.10, l.212–214). The efficacy of ScPpX1 has also been earlier demonstrated in mammalian mitochondria (6). Our preliminary data from Mito-ScPpX1 expressed ubiquitously with Tubulin-Gal4 showed a reduction in polyP levels when estimated from whole flies (See Author response image 2 below, ongoing investigation). In an independent study focusing on mitochondrial polyP depletion, we are characterizing these lines in detail  and plan to check the amount of polyP contributed to the cellular pool by mitochondria using subcellular fractionation. Direct phenotypic and polyP depletion analyses of Nuc-FLYX and ER-FLYX are also being carried out, but are in preliminary stages. That there is a difference in levels of polyP in various tissues and that we get a very little subscellular fraction for polyP analysis have been a few challenging issues. This analysis requires detailed, independent, and careful analysis, and thus, we refrain from adding this data to the current manuscript.

      Author response image 2.

      Regarding the specificity, Saito et.al. reported that PPBD binds to polyP in vitro, as well as in yeast and mammalian cells with a high affinity of ~45µM for longer polyP chains (35 mer and above) (3). They also show that the affinity of PPBD with RNA and DNA is very low. Further, PPBD could reveal differences in polyP labeling with yeasts grown in different physiological conditions that can alter polyP levels. Now in the manuscript, we included following data to show specificity of PPBD:

      To address this concern we have done two sets of experiments:

      We generated a PPBD mutant (GST-PPBD<sup>Mut</sup>). Using Microscale Thermophoresis, we establish that GST-PPBD binds to polyP<sub>100</sub>-2X-FITC, whereas, GST-PPBD<sup>Mut</sup> and GST do not bind polyP<sub>100</sub>-2X-FITC at all. We found that unlike the punctate staining pattern of GST-PPBD (wild-type), GST-PPBD<sup>Mut</sup> does not stain hemocytes. This data has been added to the revised manuscript (Fig. 2B-D, p.8, l.151–165).

      A study in C.elegans by Quarles et.al has performed a similar experiment suggested by the reviewer. In that study, treating permeabilized tissues with polyphosphatase prior to PPBD staining resulted in decrease of PPBD-GFP signal from the tissues (2). We also performed the same experiment where we subjected hemocytes to GST-PPBD staining with prior incubation of fixed and permeabilised hemocytes with ScPpX1 and heat inactivated ScPpX1 protein. We find that both intensity of staining and number of punctae are higher in hemocytes that were left untreated and the one where heat inactivated ScPpX1 was added. The hemocytes pre-treated with ScPpX1 showed reduced staining intensity and number of punctae. This data has been added to the revised manuscript (Fig. 2E-G, p.8, l.166-172).

      (2) The cell biological data in this study clearly indicates that polyP is enriched in the nucleolus in multiple cell types, consistent with recent findings from other labs, and also that polyP affects gene expression during development. Given that the authors also generate the Nuc-FLYX construct to deplete polyP from the nucleus, it is surprising that they test how depleting cytoplasmic but not nuclear polyP affects development. However, providing these tools is a service to the community, and testing the phenotypic consequences of all the FLYX constructs may arguably be beyond the scope of this first study.

      We agree this is an important avenue. In this first study, we focused on establishing the toolkit and reporting phenotypes with Cyto-FLYX. We are systematically assaying phenotypes from all FLYX constructs, including Nuc-FLYX, in ongoing studies

      Recommendations for the authors:

      Reviewing Editor Comment:

      The reviewers appreciated the general quality of the rigour and work presented in this manuscript. We also had a few recommendations for the authors. These are listed here and the details related to them can be found in the individual reviews below.

      (1) We suggest including an appropriate control to show that PPBD binds polyP specifically.

      We have updated the response section as follows:

      (a) Highlighted previous literature that showed the specificity of PPBD.

      (b) We show that the punctate staining observed by PPBD is not demonstrated by the mutant PPBD (PPBD<sup>Mut</sup>) in which amino acids that are responsible for polyP binding are mutated.

      (c) We show that PPBD<sup>Mut</sup> does not bind to polyP using Microscale Thermophoresis.

      (d) We show that treatment of fixed and permeabilised hemocytes with ScPpX1 reduces the PPBD staining intensity and number of punctae, as compared to tissues left untreated or treated with heat-inactivated ScPpX1.

      We have included these in our updated revised manuscript (Fig. 2B-G, p.8, l.151–157)

      (2) The high concentration of PolyP in the clotting assay might be impeding clotting. The authors may want to consider lowering this in their assays.

      We have addressed this concern in our revised manuscript. We have performed the clotting assays with lower polyP concentrations (concentrations previously used in clotting experiments with human blood and polyP). Data is included in Fig. S5F–I, p.12, l.241–243.

      (3) The RNAseq study: can the authors please describe this better and possibly mine it for the regulation of genes that affect eclosion?

      In our revised manuscript, we have included a broader discussion about the RNAseq analysis done in the article in both the ‘results’ and the ‘discussion’ sections, where we have rewritten the narrative from the perspective of accelerated eclosion. (p.15 l.310-335, p. 20, l.431-446).

      (4) Have the authors considered the possibility that the gut microbiota might be contributing to some of their measurements and assays? It would be good to address this upfront - either experimentally, in the discussion, or (ideally) both.

      This is an exciting possibility. Several observations argue that fly tissues synthesize polyP: (i) polyP levels remain very low during feeding stages but build up in wandering third instar larvae after feeding ceases; (ii) PPBD staining is absent from the gut except the crop (Fig. S3O–P); (iii) in C. elegans, intestinal polyP was unaffected when worms were fed polyP-deficient bacteria (2); (iv) depletion of polyP from plasmatocytes alone impairs hemolymph clotting, which would not be expected if gut-derived polyP were the major source and may have contributed to polyP in hemolymph. Nevertheless, microbiota-derived polyP may contribute, and we plan systematic testing in axenic flies in future work.

      Reviewer #1 (Recommendations for the authors):

      (1) While the authors have shown that the depletion tool results in a general reduction of polyP levels in Figure 3D, it would have been nice to show this via IHC. Particularly since the depletion depends on the strength of the Gal4, it is possible that the phenotypes are being under-estimated because the depletions are weak.

      We agree that different Gal4 lines have different strengths and will therefore affect polyP levels and the strength of the phenotype differently.

      We performed PPBD staining on hemocytes expressing ScPPX; however, we observed very intense, uniform staining throughout the cells, which was unexpected. It seems like PPBD is recognizing overexpressed ScPpX1. Indeed, in an unpublished study by Manisha Mallick (Bhandari lab), it was found that His-ScPpX1 specifically interacts with GST-PPBD in a protein interaction assay (See Author response image 3). Due to these issues, we refrained from IHC/PPBD-based validation.

      Author response image 3.

      (2) The subcellular tools for depletion are neat! I wonder why the authors didn't test them. For example in the salivary gland for nuclear depletion?

      We have addressed this question in the reviewer responses. We are systematically assaying phenotypes from all FLYX constructs, including Mito-FLYX, and Nuc-FLYX, in ongoing independent investigations. As discussed in #1, a possible interaction of ScPpX and PPBD is making this test a bit more challenging, and hence, they each require a detailed investigation.

      (a) Does the absence of clotting defects using Lz-gal4 suggest that PolyP is more crucial in the plasmatocytoes and for the initial clotting process? And that it is dispensible/less important in the crystal cells and for the later clotting process. Or is it that the crystal cells just don't have as much polyP? The image (2E-H) certainly looks like it.

      In hemolymph, the primary clot formation is a result of the clotting factors secreted from the fat bodies and the plasmatocytes. The crystal cells are responsible for the release of factors aiding in successfully hardening the soft clot initially formed. Reports suggest that clotting and melanization of the clot are independent of each other (7). Since Crystal cells do not contribute to clot fibre formation, the absence of clotting defects using LzGAL4-CytoFLYX is not surprising. Alternatively, PolyP may be secreted from all hemocytes and contribute to clotting; however, the crystal cells make up only 5% hemocytes, and hence polyP depletion in those cells may have a negligible effect on blood clotting.

      Crystal cells do show PPBD staining. Whether polyP is significantly lower in levels in the crystal cells as compared to the plasmatocytes needs more systematic investigation. Image (2E-H) is a representative image of the presence of polyP in crystal cells and can not be considered to compare polyP levels in the crystal cells vs Plasmatocytes.

      (b) The RNAseq analyses and data could be better presented. If the data are indeed variable and the differentially expressed genes of low confidence, I might remove that data entirely. I don't think it'll take away from the rest of the work.

      We understand this concern and, therefore, in the revised manuscript, we have included a broader discussion about the RNAseq analysis done in the article in both the ‘results’ and the ‘discussion’ sections, where we have rewritten the narrative from the perspective of accelerated eclosion. (p.15 l.310-335, p. 20, l.431-446). We have also stated the limitations of such studies.

      (c) I would re-phrase the first sentence of the results section.

      We have re-phrased it in the revised manuscript.

      Reviewer #2 (Recommendations for the authors):

      (1) The authors created several different versions of the FLYX system that would be targeted to different subcellular compartments. They mostly report on the effects of cytosolic targeting, but some of the constructs targeted the polyphosphatase to mitochondria or the nucleus.

      They report that the targeting worked, but I didn't see any results on the effects of those constructs on fly viability, development, etc.

      There is a growing literature of investigators targeting polyphosphatase to mitochondria and showing how depleting mitochondrial polyP alters mitochondrial function. What was the effect of the Nuc-FLYX and Mito-FLYX constructs on the flies?

      Also, the authors should probably cite the papers of others on the effects of depleting mitochondrial polyP in other eukaryotic cells in the context of discussing their findings in flies.

      We have addressed this question in the reviewer responses. We did not see any obvious developmental or viability defects with any of the FLYX lines, and only after careful investigation did we come across the clotting defects in the CytoFLYX. We are currently systematically assaying phenotypes from all FLYX constructs, including Mito-FLYX and Nuc-FLYX, in independent ongoing investigations.

      We have discussed the heterologous expression of mitochondrial polyphosphatase in mammalian cells to justify the need for developing Mito-FLYX (p. 10, l. 197-200). In the discussion section, we also discuss the presence and roles of polyP in the nucleus and how Nuc-FLYX can help study such phenomena (p. 19, l. 399-407).

      (2) The authors should number the pages of their manuscript to make it easier for reviewers to refer to specific pages.

      We have numbered our lines and pages in the revised manuscript.

      (3) Abstract: the abbreviation, "polyP", is not defined in the abstract. The first word in the abstract is "polyphosphate", so it should be defined there.

      We have corrected it in the revised version.

      (4) The authors repeatedly use the phrase, "orange hot", to describe one of the colors in their micrographs, but I don't know how this differs from "orange".

      ‘OrangeHot’ is the name of the LUT used in the ImageJ analysis and hence referred to as the colour

      (5) First page of the Introduction: the phrase, "feeding polyP to αβ expression Alzheimer's model of Caenorhabditis elegans" is awkward (it literally means feeding polyP to the model instead of the worms).

      We have revised it. (p.3, l.55-57).

      (6) Page 2 of the Introduction: The authors should cite this paper when they state that NUDT3 is a polyphosphatase: https://pubmed.ncbi.nlm.nih.gov/34788624/

      We have cited the paper in the revised version of the manuscript. (p.4, l. 68-70)

      (7) Page 2 of Results: The authors report the polyP content in the third instar larva (misspelled as "larval") to five significant digits ("419.30"). Their data do not support more than three significant digits, though.

      We have corrected it in the revised manuscript.

      (8) Page 3 of Results (paragraph 1): When discussing the polyP levels in various larval stages, the authors are extracting total polyP from the larvae. It seems that at least some of the polyP may come from gut microbes. This should probably be mentioned.

      This is an interesting possibility. Several observations argue that polyP is synthesized by fly tissues: (i) polyP levels remain very low during feeding stages but build up in wandering third instar larvae after feeding ceases; (ii) PPBD staining is absent from the gut except the crop (Fig. S3O–P); (ii) In C. elegans, intestinal polyP was unaffected when worms were fed polyP-deficient bacteria (2); (iv) depletion of polyP from plasmatocytes alone impairs hemolymph clotting, which would not be expected if gut-derived polyP were the major source and may have contributed to polyP in hemolymph. We mention this limitation in the revised manuscript (p.19-20, l. 425-433).

      (9) Page 3 of Results (paragraph 2): stating that the 4% paraformaldehyde works "best" is imprecise. What do the authors mean by "best"?

      We have addressed this comment in the revised manuscript and corrected it as 4% paraformaldehyde being better among the three methods we used to fix tissues, which also included methanol and Bouin’s fixative  (p.8, l. 152-154).

      (10) Page 4 of Results (paragraph 2, last line of the page): The scientific literature is vast, so one can never be sure that one knows of all the papers out there, even on a topic as relatively limited as polyP. Therefore, I would recommend qualifying the statement "...this is the first comprehensive tissue staining report...". It would be more accurate (and safer) to say something like, "to our knowledge, this is the first..." There is a similar statement with the word "first" on the next page regarding the FLYX library.

      We have addressed this concern and corrected it accordingly in the revised version of the manuscript (p.9, l. 192-193)

      Reviewer #3 (Recommendations for the authors):

      (1) The authors should include in their discussion a comparison of cell biological observations using the polyP binding domain of E. coli Ppx (GST-PPBD) to fluorescently label polyP in cells and tissues with recent work using a similar approach in C. elegans (Quarles et al., PMID:39413779).

      In the revised manuscript, we have cited the work of Quarles et al. and have added a comparison of observations (p.19,l.408-410). In the discussion, we have also focused on multiple other studies about how polyP presence in different subcellular compartments, like the nucleus, can be assayed and studied with the tools developed in this study.

      (2) The gene expression studies of time-matched Cyto-FLYX vs WT larvae is very intriguing. Given the authors' findings that non-feeding third instar Cyto-FLYX larvae are developmentally ahead of WT larvae, can the observed trends be explained by known changes in gene expression that occur during eclosion? This is mentioned in the results section in the context of genes linked to neurons, but a broader discussion of which pathway changes observed can be explained by the developmental stage difference between the WT and FLYX larvae would be helpful in the discussion.

      We have included a broader discussion about the RNAseq analysis done in the article in both the ‘results’ and the ‘discussion’ sections, where we have rewritten the narrative from the perspective of accelerated eclosion. (p.15 l.310-335, p. 20, l.431-446). We have also stated the limitations of such studies.

      (3) The sentence describing NUDT3 is not referenced.

      We have addressed this comment and have cited the paper of NUDT3 in the revised version of the manuscript.(p.4, l. 68-70)

      (4) In the first sentence of the results section, the meaning/validity of the statement "The polyP levels have decreased as evolution progressed" is not clear. It might be more straightforward to give an estimate of the total pmoles polyP/mg protein difference between bacteria/yeast and metazoans.

      In the revised manuscript, we have given an estimate of the polyP content across various species across evolution to uphold the statement that polyP levels have decreased as evolution progressed (p. 5, l. 87-91).

      (5) The description of the malachite green assay in the results section describes it as "calorimetric" but this should read "colorimetric?"

      We have corrected it in the revised manuscript.

      References

      (1) Chicco D, Agapito G. Nine quick tips for pathway enrichment analysis. PLoS Comput Biol. 2022 Aug 11;18(8):e1010348.

      (2) Quarles E, Petreanu L, Narain A, Jain A, Rai A, Wang J, et al. Cryosectioning and immunofluorescence of C. elegans reveals endogenous polyphosphate in intestinal endo-lysosomal organelles. Cell Rep Methods. 2024 Oct 8;100879.

      (3) Saito K, Ohtomo R, Kuga-Uetake Y, Aono T, Saito M. Direct labeling of polyphosphate at the ultrastructural level in Saccharomyces cerevisiae by using the affinity of the polyphosphate binding domain of Escherichia coli exopolyphosphatase. Appl Environ Microbiol. 2005 Oct;71(10):5692–701.

      (4) Smith SA, Mutch NJ, Baskar D, Rohloff P, Docampo R, Morrissey JH. Polyphosphate modulates blood coagulation and fibrinolysis. Proc Natl Acad Sci USA. 2006 Jan 24;103(4):903–8.

      (5) Smith SA, Choi SH, Davis-Harrison R, Huyck J, Boettcher J, Rienstra CM, et al. Polyphosphate exerts differential effects on blood clotting, depending on polymer size. Blood. 2010 Nov 18;116(20):4353–9.

      (6) Abramov AY, Fraley C, Diao CT, Winkfein R, Colicos MA, Duchen MR, et al. Targeted polyphosphatase expression alters mitochondrial metabolism and inhibits calcium-dependent cell death. Proc Natl Acad Sci USA. 2007 Nov 13;104(46):18091–6.

      (7) Schmid MR, Dziedziech A, Arefin B, Kienzle T, Wang Z, Akhter M, et al. Insect hemolymph coagulation: Kinetics of classically and non-classically secreted clotting factors. Insect Biochem Mol Biol. 2019 Jun;109:63–71.

      (8) Jian Guan, Rebecca Lee Hurto, Akash Rai, Christopher A. Azaldegui, Luis A. Ortiz-Rodríguez, Julie S. Biteen, Lydia Freddolino, Ursula Jakob. HP-Bodies – Ancestral Condensates that Regulate RNA Turnover and Protein Translation in Bacteria. bioRxiv 2025.02.06.636932; doi: https://doi.org/10.1101/2025.02.06.636932.

      (9) Lonetti A, Szijgyarto Z, Bosch D, Loss O, Azevedo C, Saiardi A. Identification of an evolutionarily conserved family of inorganic polyphosphate endopolyphosphatases. J Biol Chem. 2011 Sep 16;286(37):31966–74.

    1. eLife Assessment

      This study presents a valuable advance in reconstructing naturalistic speech from intracranial ECoG data using a dual-pathway model. The evidence supporting the claims of the authors is solid. This work will be of interest to cognitive neuroscientists and computer scientists/engineers working on speech reconstruction from neural data.

    2. Reviewer #1 (Public review):

      Summary:

      This paper introduces a dual-pathway model for reconstructing naturalistic speech from intracranial ECoG data. It integrates an acoustic pathway (LSTM + HiFi-GAN for spectral detail) and a linguistic pathway (Transformer + Parler-TTS for linguistic content). Output from the two components are later merged via CosyVoice2.0 voice cloning. Using only 20 minutes of ECoG data per participant, the model achieves high acoustic fidelity and linguistic intelligibility.

      Strengths:

      (1) The proposed dual-pathway framework effectively integrates the strengths of neural-to-acoustic and neural-to-text decoding and aligns well with established neurobiological models of dual-stream processing in speech and language.

      (2) The integrated approach achieves robust speech reconstruction using only 20 minutes of ECoG data per subject, demonstrating the efficiency of the proposed method.

      (3) The use of multiple evaluation metrics (MOS, mel-spectrogram R², WER, PER) spanning acoustic, linguistic (phoneme and word), and perceptual dimensions, together with comparisons against noise-degraded baselines, adds strong quantitative rigor to the study.

      Comments on revisions:

      I thank the authors for their thorough efforts in addressing my previous concerns. I believe this revised version is significantly strengthened, and I have no further concerns.

    3. Reviewer #2 (Public review):

      Summary:

      The study by Li et al. proposes a dual-path framework that concurrently decodes acoustic and linguistic representations from ECoG recordings. By integrating advanced pre-trained AI models, the approach preserves both acoustic richness and linguistic intelligibility, and achieves a WER of 18.9% with a short (~20-minute) recording.

      Overall, the study offers an advanced and promising framework for speech decoding. The method appears sound, and the results are clear and convincing. My main concerns are the need for additional control analyses and for more comparisons with existing models.

      Strengths:

      • This speech-decoding framework employs several advanced pre-trained DNN models, reaching superior performance (WER of 18.9%) with relatively short (~20-minute) neural recording.

      • The dual-pathway design is elegant, and the study clearly demonstrates its necessity: The acoustic pathway enhances spectral fidelity while the linguistic pathway improves linguistic intelligibility.

      Comments on revisions:

      The authors have thoughtfully addressed my previous concerns about the weaknesses. I have no further concerns.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary

      This paper introduces a dual-pathway model for reconstructing naturalistic speech from intracranial ECoG data. It integrates an acoustic pathway (LSTM + HiFi-GAN for spectral detail) and a linguistic pathway (Transformer + Parler-TTS for linguistic content). Output from the two components is later merged via CosyVoice2.0 voice cloning. Using only 20 minutes of ECoG data per participant, the model achieves high acoustic fidelity and linguistic intelligibility.

      Strengths

      (1) The proposed dual-pathway framework effectively integrates the strengths of neural-to-acoustic and neural-to-text decoding and aligns well with established neurobiological models of dual-stream processing in speech and language.

      (2) The integrated approach achieves robust speech reconstruction using only 20 minutes of ECoG data per subject, demonstrating the efficiency of the proposed method.

      (3) The use of multiple evaluation metrics (MOS, mel-spectrogram R², WER, PER) spanning acoustic, linguistic (phoneme and word), and perceptual dimensions, together with comparisons against noisedegraded baselines, adds strong quantitative rigor to the study.

      We thank Reviewer #1 for the supportive comments. In addition, we appreciate Reviewer #1’s thoughtful comments and feedback. By addressing these comments, we believe we have greatly improved the clarity of our claims and methodology. Below we list our point-to-point responses addressing concerns raised by Reviewer #1.

      Weaknesses:

      (1) It is unclear how much the acoustic pathway contributes to the final reconstruction results, based on Figures 3B-E and 4E. Including results from Baseline 2 + CosyVoice and Baseline 3 + CosyVoice could help clarify this contribution.

      We sincerely appreciate the inquiry from Reviewer 1. We thank the reviewer for this suggestion. However, we believe that directly applying CosyVoice to the outputs of Baseline 2 or Baseline 3 in isolation is not methodologically feasible and would not correctly elucidate the contribution of the auditory pathway and might lead to misinterpretation.

      The role of CosyVoice 2.0 in our framework is specifically voice cloning and fusion, not standalone enhancement. It is designed to integrate information from two pathways. Its operation requires two key inputs:

      (1) A voice reference speech that provides the target speaker's timbre and prosodic characteristics. In our final pipeline, this is provided by the denoised output of the acoustic pathway (Baseline 2).

      (2) A target word sequence that specifies the linguistic content to be spoken. This is obtained by transcribing the output of the linguistic pathway (Baseline 3) using Whisper ASR. Therefore, the standalone outputs of Baseline 2 and Baseline 3 are the purest demonstrations of what each pathway contributes before fusion. The significant improvement in WER/PER and MOS in the final output (compared to Baseline 2) and the significant improvement in melspectrogram R² (compared to Baseline 3) together demonstrate the complementary contributions of the two pathways. The fusion via CosyVoice is the mechanism that allows these contributions to be combined. We have added a clearer explanation of CosyVoice's role and the rationale for not testing it on individual baselines in the revised manuscript (Results section: "The fine-tuned voice cloner further enhances...").

      Edits:

      Page 11, Lines 277-282:

      “ Voice cloning is used to bridge the gap between acoustic fidelity and linguistic intelligibility in speech reconstruction. This approach strategically combines the strengths of complementary pathways: the acoustic pathway preserves speaker-specific spectral characteristics while the linguistic pathway maintains lexical and phonetic precision. By integrating these components through neural voice cloning, we achieve balanced reconstruction that overcomes the limitations inherent in isolated systems. CosyVoice 2.0, the voice cloner module serves specifically as a voice cloning and fusion engine, requiring two inputs: (1) a voice reference speech (provided by the denoised output of the acoustic pathway) to specify the target speaker's identity, and (2) a target word sequence (transcribed from the output of the linguistic pathway) to specify the linguistic content. The standalone baseline outputs of the two pathways can be integrated in this way.”

      (2) As noted in the limitations, the reconstruction results heavily rely on pre-trained generative models. However, no comparison is provided with state-of-the-art multimodal LLMs such as Qwen3-Omni, which can process auditory and textual information simultaneously. The rationale for using separate models (Wav2Vec for speech and TTS for text) instead of a single unified generative framework should be clearly justified. In addition, the adaptor employs an LSTM architecture for speech but a Transformer for text, which may introduce confounds in the performance comparison. Is there any theoretical or empirical motivation for adopting recurrent networks for auditory processing and Transformer-based models for textual processing?

      We thank the reviewer for the insightful suggestion regarding multimodal large language models (LLMs) such as Qwen3-Omni. It is important to clarify the distinction between general-purpose interactive multimodal models and models specifically designed for high-fidelity voice cloning and speech synthesis.

      As for the comparison with the state-of-the-art multimodal LLMs:

      Qwen3-Omni and GLM-4-Voice are powerful conversational agents capable of processing multiple modalities including text, speech, image, and video, as described in its documentation (see: https://help.aliyun.com/zh/model-studio/qwen-tts-realtime and https://docs.bigmodel.cn/cn/guide/models/sound-and-video/glm-4-voice). However, it is primarily optimized for interactive dialogue and multimodal understanding rather than for precise, speaker-adaptive speech reconstruction from neural signals. In contrast, CosyVoice 2.0, developed by the same team at Alibaba, is specifically designed for voice cloning and text-to-speech synthesis (see: https://help.aliyun.com/zh/model-studio/text-to-speech). It incorporates advanced speaker adaptation and acoustic modeling capabilities that are essential for reconstructing naturalistic speech from limited neural data. Therefore, our choice of CosyVoice for the final synthesis stage aligns with the goal of integrating acoustic fidelity and linguistic intelligibility, which is central to our study.

      For the selection of LSTM and Transformer in the two pathways:

      The goal of the acoustic adaptor is to reconstruct fine-grained spectrotemporal details (formants, harmonic structures, prosodic contours) with millisecond-to-centisecond precision. These features rely heavily on local temporal dynamics and short-to-medium range dependencies (e.g., within and between phonemes/syllables). In our ablation studies (to be added in the supplementary), we found that Transformer-based adaptors, which inherently emphasize global sentence-level context through self-attention, tended to oversmooth the reconstructed acoustic features, losing critical fine-temporal details essential for naturalness. In contrast, the recurrent nature of LSTMs, with their inherent temporal state propagation, proved more effective at modeling these local sequential dependencies without excessive smoothing, leading to higher mel-spectrogram fidelity. This aligns with the neurobiological observation that early auditory cortex processes sound with precise temporal fidelity. Moreover, from an engineering perspective, LSTM-based decoders have been empirically shown to perform well in sequential prediction tasks with limited data, as evidenced in prior work on sequence modeling and neural decoding (1).

      The goal of the linguistic adaptor is to decode abstract, discrete word tokens. This task benefits from modeling long-range contextual dependencies across a sentence to resolve lexical ambiguity and syntactic structure (e.g., subject-verb agreement). The self-attention mechanism of Transformers is exceptionally well-suited for capturing these global relationships, as evidenced by their dominance in NLP. Our experiments confirmed that a Transformer adaptor outperformed an LSTM-based one in word token prediction accuracy.

      While a unified multimodal LLM could in principle handle both modalities, such models often face challenges in modality imbalance and task specialization. Audio and text modalities have distinct temporal scales, feature distributions, and learning dynamics. By decoupling them into separate pathways with specialized adaptors, we ensure that each modality is processed by an architecture optimized for its inherent structure. This divide-and-conquer strategy avoids the risk of one modality dominating or interfering with the learning of the other, leading to more stable training and better final performance, especially important when adapting to limited neural data.

      Edits:

      Page 9, Lines 214-223:

      “The acoustic pathway, implemented through a bi-directional LSTM neural adaptor architecture (Fig. 1B), specializes in reconstructing fundamental acoustic properties of speech. This module directly processes neural recordings to generate precise time-frequency representations, focusing on preserving speaker-specific spectral characteristics like formant structures, harmonic patterns, and spectral envelope details. Quantitative evaluation confirms its core competency: achieving a mel-spectrogram R² of 0.793 ± 0.016 (Fig. 3B) demonstrates remarkable fidelity in reconstructing acoustic microstructure. This performance level is statistically indistinguishable from original speech degraded by 0dB additive noise (0.771 ± 0.014, p = 0.242, one-sided t-test). We chose a bidirectional LSTM architecture for this adaptor because its recurrent nature is particularly suited to modeling the fine-grained, short- to medium-range temporal dependencies (e.g., within and between phonemes and syllables) that are critical for acoustic fidelity. An ablation study comparing LSTM against Transformerbased adaptors for this task confirmed that LSTMs yielded superior mel-spectrogram reconstruction fidelity (higher R²), as detailed in Table S1, likely by avoiding the oversmoothing of spectrotemporal details sometimes induced by the strong global context modeling of Transformers”.

      “To confirm that the acoustic pathway’s output is causally dependent on the neural signal rather than the generative prior of the HiFi-GAN, we performed a control analysis in which portions of the input ECoG recording were replaced with Gaussian noise. When either the first half, second half, or the entirety of the neural input was replaced by noise, the melspectrogram R² of the reconstructed speech dropped markedly, corresponding to the corrupted segment (Fig. S5). This demonstrates that the reconstruction is temporally locked to the specific neural input and that the model does not ‘hallucinate’ spectrotemporal structure from noise. These results validate that the acoustic pathway performs genuine, input-sensitive neural decoding”.

      Edits:

      Page 10, Lines 272-277:

      “We employed a Transformer-based Seq2Seq architecture for this adaptor to effectively capture the long-range contextual dependencies across a sentence, which are essential for resolving lexical ambiguity and syntactic structure during word token decoding. This choice was validated by an ablation study (Table S2), indicating that the Transformer adaptor outperformed an LSTM-based counterpart in word prediction accuracy”

      (3) The model is trained on approximately 20 minutes of data per participant, which raises concerns about potential overfitting. It would be helpful if the authors could analyze whether test sentences with higher or lower reconstruction performance include words that were also present in the training set.

      Thank you for raising the important concern regarding potential overfitting given the limited size of our training dataset (~20 minutes per participant). To address this point directly, we performed a detailed lexical overlap analysis between the training and test sets.

      The test set contains 219 unique words. Among these:

      127 words (58.0%) appeared in the training set (primarily high-frequency, common words).

      92 words (42.0%) were entirely novel and did not appear in the training set. We further examined whether trials with the best reconstruction (WER = 0) relied more on training vocabulary. Among these top-performing trials, 55.0% of words appeared in the training set. In contrast, the worst-performing trials showed 51.9% overlap in words in the training set. No significant difference was observed, suggesting that performance is not driven by simple lexical memorization.

      The presence of a substantial proportion of novel words (42%) in the test set, combined with the lack of performance advantage for overlapping content, provides strong evidence that our model is generalizing linguistic and acoustic patterns rather than merely memorizing the training vocabulary. High reconstruction performance on unseen words would be improbable under severe overfitting.

      Therefore, we conclude that while some lexical overlap exists (as expected in natural language), the model’s performance is driven by its ability to decode generalized neural representations, effectively mitigating the overfitting risk highlighted by the reviewer.

      (4) The phoneme confusion matrix in Figure 4A does not appear to align with human phoneme confusion patterns. For instance, /s/ and /z/ differ only in voicing, yet the model does not seem to confuse these phonemes. Does this imply that the model and the human brain operate differently at the mechanistic level?

      We thank the reviewer for this detailed observation regarding the difference between our model's phoneme confusion patterns and typical human perceptual confusions (e.g., the lack of /s/-/z/ confusion).

      The reviewer is correct in inferring a mechanistic difference. This divergence is primarily attributable to the Parler-TTS model acting as a powerful linguistic prior. Our linguistic pathway decodes word tokens, which Parler-TTS then converts to speech. Trained on massive corpora to produce canonical pronunciations, Parler-TTS effectively performs an implicit "error correction." For instance, if the neural decoding is ambiguous between the words "sip" and "zip," the TTS model's strong prior for lexical and syntactic context will likely resolve it to the correct word, thereby suppressing purely acoustic confusions like voicing.

      This has important implications for interpreting our model's errors and its relationship to brain function. The phoneme errors in our final output reflect a combination of neural decoding errors and the generative biases of the TTS model, which is optimized for intelligibility rather than mimicking raw human misperception. This does imply our model operates differently from the human auditory periphery. The human brain may first generate a percept with acoustic confusions, which higher-level language regions then disambiguate. Our model effectively bypasses the "confused percept" stage by directly leveraging a pre-trained, high-level language model for disambiguation. This is a design feature contributing to its high intelligibility, not necessarily a flaw. This observation raises a fascinating question: Could a model that more faithfully simulates the hierarchical processing of the human brain (including early acoustic confusions) provide a better fit to neural data at different processing stages? Future work could further address this question.

      Edits:

      add another paragraph in Discussion (Page 14, Lines 397-398):

      “The phoneme confusion pattern observed in our model output (Fig. 4A) differs from classic human auditory confusion matrices. We attribute this divergence primarily to the influence of the Parler-TTS model, which serves as a strong linguistic prior in our pipeline. This component is trained to generate canonical speech from text tokens. When the upstream neural decoding produces an ambiguous or erroneous token sequence, the TTS model’s internal language model likely performs an implicit ‘error correction,’ favoring linguistically probable words and pronunciations. This underscores that our model’s errors arise from a complex interaction between neural decoding fidelity and the generative biases of the synthesis stage”

      (5) In general, is the motivation for adopting the dual-pathway model to better align with the organization of the human brain, or to achieve improved engineering performance? If the goal is primarily engineeringoriented, the authors should compare their approach with a pretrained multimodal LLM rather than relying on the dual-pathway architecture. Conversely, if the design aims to mirror human brain function, additional analysis, such as detailed comparisons of phoneme confusion matrices, should be included to demonstrate that the model exhibits brain-like performance patterns.

      Our primary motivation is engineering improvement, to overcome the fundamental trade-off between acoustic fidelity and linguistic intelligibility that has limited previous neural speech decoding work. The design is inspired by the related works of the convergent representation of speech and language perception (2). However, we do not claim that our LSTM and Transformer adaptors precisely simulate the specific neural computations of the human ventral and dorsal streams. The goal was to build a high-performance, data-efficient decoder. We will clarify this point in the Introduction and Discussion, stating that while the architecture is loosely inspired by previous neuroscience results, its primary validation is its engineering performance in achieving state-of-the-art reconstruction quality with minimal data.

      Edits:

      Page 14, Line 358-373:

      “In this study, we present a dual-path framework that synergistically decodes both acoustic and linguistic speech representations from ECoG signals, followed by a fine-tuned zero-shot text-to-speech network to re-synthesize natural speech with unprecedented fidelity and intelligibility. Crucially, by integrating large pre-trained generative models into our acoustic reconstruction pipeline and applying voice cloning technology, our approach preserves acoustic richness while significantly enhancing linguistic intelligibility beyond conventional methods. Our dual-pathway architecture, while inspired by converging neuroscience insights on speech and language perception, was principally designed and validated as an engineering solution. The primary goal to build a practical decoder that achieves state-of-theart reconstruction quality with minimal data. The framework's success is therefore ultimately judged by its performance metrics, high intelligibility (WER, PER), acoustic fidelity (melspectrogram R²), and perceptual quality (MOS), which directly address the core engineering challenge we set out to solve. Using merely 20 minutes of ECoG recordings, our model achieved superior performance with a WER of 18.9% ± 3.3% and PER of 12.0% ± 2.5% (Fig. 2D, E). This integrated architecture, combining pre-trained acoustic (Wav2Vec2.0 and HiFiGAN) and linguistic (Parler-TTS) models through lightweight neural adaptors, enables efficient mapping of ECoG signals to dual latent spaces. Such methodology substantially reduces the need for extensive neural training data while achieving breakthrough word clarity under severe data constraints. The results demonstrate the feasibility of transferring the knowledge embedded in speech-data pre-trained artificial intelligence (AI) models into neural signal decoding, paving the way for more advanced brain-computer interfaces and neuroprosthetics”.

      Reviewer #2 (Public review):

      Summary:

      The study by Li et al. proposes a dual-path framework that concurrently decodes acoustic and linguistic representations from ECoG recordings. By integrating advanced pre-trained AI models, the approach preserves both acoustic richness and linguistic intelligibility, and achieves a WER of 18.9% with a short (~20-minute) recording.

      Overall, the study offers an advanced and promising framework for speech decoding. The method appears sound, and the results are clear and convincing. My main concerns are the need for additional control analyses and for more comparisons with existing models.

      Strengths:

      (1) This speech-decoding framework employs several advanced pre-trained DNN models, reaching superior performance (WER of 18.9%) with relatively short (~20-minute) neural recording.

      (2) The dual-pathway design is elegant, and the study clearly demonstrates its necessity: The acoustic pathway enhances spectral fidelity while the linguistic pathway improves linguistic intelligibility.

      We thank Reviewer #2 for supportive comments. In addition, we appreciate Reviewer #2’s thoughtful comments and feedback. By addressing these comments, we believe we have greatly improved the clarity of our claims and methodology. Below we list our point-to-point responses addressing concerns raised by Reviewer #2.

      Weaknesses:

      The DNNs used were pre-trained on large corpora, including TIMIT, which is also the source of the experimental stimuli. More generally, as DNNs are powerful at generating speech, additional evidence is needed to show that decoding performance is driven by neural signals rather than by the DNNs' generative capacity.

      Thank you for raising this crucial point regarding the potential for pre-trained DNNs to generate speech independently of the neural input. We fully agree that it is essential to disentangle the contribution of the neural signals from the generative priors of the models. To address this directly, we have conducted two targeted control analyses, as you suggested, and have integrated the results into the revised manuscript (see Fig. S5 and the corresponding description in the Results section):

      (1) Random noise input: We fed Gaussian noise (matched in dimensionality and temporal structure to real ECoG recordings) into the trained adaptors. The outputs were acoustically unstructured and linguistically incoherent, confirming that the generative models alone cannot produce meaningful speech without valid neural input.

      (2) Partial sentence input (real + noise): For the acoustic pathway, we systematically replaced portions of the ECoG input with noise. The reconstruction quality (mel-spectrogram R²) dropped significantly in the corrupted segments, demonstrating that the decoding is temporally locked to the neural signal and does not “hallucinate” speech from noise.

      These results provide strong evidence that our model’s performance is causally dependent on and sensitive to the specific neural input, validating that it performs genuine neural decoding rather than merely leveraging the generative capacity of the pre-trained DNNs.

      The detailed edits are in the “recommendations” below. (See recommendations (1) and (2))

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Clarify the results shown in Figure 4E. The integrated approach appears to perform comparably to Baseline 3 in phoneme class clarity. However, Baseline 3 represents the output of the linguistic pathway alone, which is expected to encode information primarily at the word level.

      We appreciate the reviewer's observation and agree that clarification is needed. The phoneme class clarity (PCC) metric shown in Figure 4E measures whether mis-decoded phonemes are more likely to be confused within their own class (vowel-vowel or consonantconsonant) rather than across classes (vowel-consonant). A higher PCC indicates that the model's errors tend to be phonologically similar sounds (e.g., one vowel mistaken for another), which is a reasonable property for intelligibility.

      We would like to clarify the nature of Baseline 3. As stated in the manuscript (Results section: "The linguistic pathway reconstructs high-intelligibility, higher-level linguistic information"), Baseline 3 is the output of our linguistic pathway. This pathway operates as follows: the ECoG signals are mapped to word tokens via the Transformer adaptor, and these tokens are then synthesized into speech by the frozen Parler-TTS model. Crucially, the input to Parler-TTS is a sequence of word tokens.

      It is important to distinguish between the levels of performance measured: Word Error Rate (WER) reflects accuracy at the lexical level (whole words). The linguistic pathway achieves a low WER by design, as it directly decodes word sequences. Phoneme Error Rate (PER) reflects accuracy at the sublexical phonetic level (phonemes). A low WER generally implies a low PER, because robust word recognition requires reliable phoneme-level representations within the TTS model's prior. This explains why Baseline 3 also exhibits a low PER. However, acoustic fidelity (captured by metrics like mel-spectrogram R²) requires the preservation of fine-grained spectrotemporal details such as pitch, timbre, prosody, and formant structures, information that is not directly encoded at the lexical level and is therefore not a strength of the purely linguistic pathway.

      While Parler-TTS internally models sub-word/phonetic information to generate the acoustic waveform, the primary linguistic information driving the synthesis is at the lexical (word) level. The generated speech from Baseline 3 therefore contains reconstructed phonemic sequences derived from the decoded word tokens, not from direct phoneme-level decoding of ECoG.

      Therefore, the comparable PCC between our final integrated model and Baseline 3 (linguistic pathway) suggests that the phoneme-level error patterns (i.e., the tendency to confuse within-class phonemes) in our final output are largely inherited from the high-quality linguistic prior embedded in the pre-trained TTS model (Parler-TTS). The integrated framework successfully preserves this desirable property from the linguistic pathway while augmenting it with speaker-specific acoustic details from the acoustic pathway, thereby achieving both high intelligibility (low WER/PER) and high acoustic fidelity (high melspectrogram R²).

      We will revise the caption of Figure 4E and the corresponding text in the Results section to make this interpretation explicit.

      Edits:

      Page 12, Lines 317-322:

      “In addition to the confusion matrices, we categorized the phonemes into vowels and consonants to assess the phoneme class clarity. We defined "phoneme class clarity" (PCC) as the proportion of errors where a phoneme was misclassified within the same class versus being misclassified into a different class. The purpose of introducing PCC is to demonstrate that most of the misidentified phonemes belong to the same category (confusion between vowels or consonants), rather than directly comparing the absolute accuracy of phoneme recognition. For instance, a vowel being mistaken for another vowel would be considered a within-class error, whereas a vowel being mistaken for a consonant would be classified as a between-class error” 

      (2) Add results from Baseline 2 + CosyVoice and Baseline 3 + CosyVoice to clarify the contribution of the auditory pathway.

      Thank you for the suggestion. We appreciate the opportunity to clarify the role of CosyVoice in our framework.

      As explained in our response to point (1), CosyVoice 2.0 is designed as a fusion module that requires two inputs: 1) a voice reference (from the acoustic pathway) to specify speaker identity, and 2) a word sequence (from the linguistic pathway) to specify linguistic content. Because it is not a standalone enhancer, applying CosyVoice to a single pathway output (e.g., Baseline 2 or 3 alone) is not quite feasible and would not reflect its intended function and could lead to misinterpretation of each pathway’s contribution.

      Instead, we have evaluated the contribution of each pathway by comparing the final integrated output against each standalone pathway output (Baseline 2 and 3). The significant improvements in both acoustic fidelity and linguistic intelligibility demonstrate the complementary roles of the two pathways, which are effectively fused through CosyVoice.

      (3) Justify your choice of using LSTM and Transformer architecture for the auditory and linguistic neural adaptors, respectively, and how your methods could compare to using a unified generative multimodal LLM for both pathways.

      Thank you for revisiting this important point. We appreciate your interest in the architectural choices and their relationship to state-of-the-art multimodal models.

      As detailed in our response to point (2), our choice of LSTM for the acoustic pathway and Transformer for the linguistic pathway is driven by task-specific requirements, supported by ablation studies (Supplementary Tables 1–2). The acoustic pathway benefits from LSTM’s ability to model fine-grained, local temporal dependencies without over-smoothing. The linguistic pathway benefits from Transformer’s ability to capture long-range semantic and syntactic context.

      Regarding comparison with unified multimodal LLMs (e.g., Qwen3-Omni), we clarified that such models are optimized for interactive dialogue and multimodal understanding, while our framework relies on specialist models (CosyVoice 2.0, Parler-TTS) that are explicitly designed for high-fidelity, speaker-adaptive speech synthesis, a requirement central to our decoding task.

      We have incorporated these justifications into the revised manuscript (Results and Discussion sections) and appreciate the opportunity to further emphasize these points.

      Edits:

      Page 9, Lines 214-223:

      “The acoustic pathway, implemented through a bi-directional LSTM neural adaptor architecture (Fig. 1B), specializes in reconstructing fundamental acoustic properties of speech. This module directly processes neural recordings to generate precise time-frequency representations, focusing on preserving speaker-specific spectral characteristics like formant structures, harmonic patterns, and spectral envelope details. Quantitative evaluation confirms its core competency: achieving a mel-spectrogram R² of 0.793 ± 0.016 (Fig. 3B) demonstrates remarkable fidelity in reconstructing acoustic microstructure. This performance level is statistically indistinguishable from original speech degraded by 0dB additive noise (0.771 ± 0.014, p = 0.242, one-sided t-test). We chose a bidirectional LSTM architecture for this adaptor because its recurrent nature is particularly suited to modeling the fine-grained, short- to medium-range temporal dependencies (e.g., within and between phonemes and syllables) that are critical for acoustic fidelity. An ablation study comparing LSTM against Transformerbased adaptors for this task confirmed that LSTMs yielded superior mel-spectrogram reconstruction fidelity (higher R²), as detailed in Table S1, likely by avoiding the oversmoothing of spectrotemporal details sometimes induced by the strong global context modeling of Transformers”.

      “To confirm that the acoustic pathway’s output is causally dependent on the neural signal rather than the generative prior of the HiFi-GAN, we performed a control analysis in which portions of the input ECoG recording were replaced with Gaussian noise. When either the first half, second half, or the entirety of the neural input was replaced by noise, the melspectrogram R² of the reconstructed speech dropped markedly, corresponding to the corrupted segment (Fig. S5). This demonstrates that the reconstruction is temporally locked to the specific neural input and that the model does not ‘hallucinate’ spectrotemporal structure from noise. These results validate that the acoustic pathway performs genuine, input-sensitive neural decoding”.

      Page 10, Lines 272-277:

      “We employed a Transformer-based Seq2Seq architecture for this adaptor to effectively capture the long-range contextual dependencies across a sentence, which are essential for resolving lexical ambiguity and syntactic structure during word token decoding. This choice was validated by an ablation study (Table S2), indicating that the Transformer adaptor outperformed an LSTM-based counterpart in word prediction accuracy”.

      (4) Discuss the differences between the model's phoneme confusion matrix in Figure 4A and human phoneme confusion patterns. In addition, please clarify whether the adoption of the dual-pathway architecture is primarily intended to simulate the organization of the human brain or to achieve engineering improvements.

      The observed difference between our model's phoneme confusion matrix and typical human perceptual confusion patterns (e.g., the noted lack of confusion between /s/ and /z/) is, as the reviewer astutely infers, likely attributable to the TTS model (Parler-TTS) acting as a powerful linguistic prior. The linguistic pathway decodes word tokens, and Parler-TTS converts these tokens into speech. Parler-TTS is trained on massive text and speech corpora to produce canonical, clean pronunciations. It effectively performs a form of "error correction" or "canonicalization" based on its internal language model. For example, if the neural decoding is ambiguous between "sip" and "zip", the TTS model's strong prior for lexical and syntactic context may robustly resolve it to the correct word, suppressing purely acoustic confusions like voicing. Therefore, the phoneme errors in our final output reflect a combination of neural decoding errors and the TTS model's generation biases, which are optimized for intelligibility rather than mimicking human misperception. We will add this explanation to the paragraph discussing Figure 4A.

      Our primary motivation is engineering improvement, to overcome the fundamental tradeoff between acoustic fidelity and linguistic intelligibility that has limited previous neural speech decoding work. The design is inspired by the convergent representation of speech and language perception (1). However, we do not claim that our LSTM and Transformer adaptors precisely simulate the specific neural computations of the human ventral and dorsal streams. The goal was to build a high-performance, data-efficient decoder. We will clarify this point in the Introduction and Discussion, stating that while the architecture is loosely inspired by previous neuroscience results, its primary validation is its engineering performance in achieving state-of-the-art reconstruction quality with minimal data.

      Edits:

      Pages 2-3, Lines 74-85:

      “Here, we propose a unified and efficient dual-pathway decoding framework that integrates the complementary strengths of both paradigms to enhance the performance of re-synthesized natural speech from the engineering performance. Our method maps intracranial electrocorticography (ECoG) signals into the latent spaces of pre-trained speech and language models via two lightweight neural adaptors: an acoustic pathway, which captures low-level spectral features for naturalistic speech synthesis, and a linguistic pathway, which extracts high-level linguistic tokens for semantic intelligibility. These pathways are fused using a finetuned text-to-speech (TTS) generator with voice cloning, producing re-synthesized speech that retains both the acoustic spectrotemporal details, such as the speaker’s timbre and prosody, and the message linguistic content. The adaptors rely on near-linear mappings and require only 20 minutes of neural data per participant for training, while the generative modules are pre-trained on large unlabeled corpora and require no neural supervision”.

      Page 14, Lines 358-373:

      “In this study, we present a dual-path framework that synergistically decodes both acoustic and linguistic speech representations from ECoG signals, followed by a fine-tuned zero-shot text-to-speech network to re-synthesize natural speech with unprecedented fidelity and intelligibility. Crucially, by integrating large pre-trained generative models into our acoustic reconstruction pipeline and applying voice cloning technology, our approach preserves acoustic richness while significantly enhancing linguistic intelligibility beyond conventional methods. Our dual-pathway architecture, while inspired by converging neuroscience insights on speech and language perception, was principally designed and validated as an engineering solution. The primary goal to build a practical decoder that achieves state-of-the-art reconstruction quality with minimal data. The framework's success is therefore ultimately judged by its performance metrics, high intelligibility (WER, PER), acoustic fidelity (mel-spectrogram R²), and perceptual quality (MOS), which directly address the core engineering challenge we set out to solve. Using merely 20 minutes of ECoG recordings, our model achieved superior performance with a WER of 18.9% ± 3.3% and PER of 12.0% ± 2.5% (Fig. 2D, E). This integrated architecture, combining pre-trained acoustic (Wav2Vec2.0 and HiFi-GAN) and linguistic (Parler-TTS) models through lightweight neural adaptors, enables efficient mapping of ECoG signals to dual latent spaces. Such methodology substantially reduces the need for extensive neural training data while achieving breakthrough word clarity under severe data constraints. The results demonstrate the feasibility of transferring the knowledge embedded in speech-data pre-trained artificial intelligence (AI) models into neural signal decoding, paving the way for more advanced brain-computer interfaces and neuroprosthetics”.

      Reviewer #2 (Recommendations for the authors):

      (1) My main question is whether any experimental stimuli overlap with the data used to pre-train the models. The authors might consider using pre-trained models trained on other corpora and training their own model without the TIMIT corpus. Additionally, as pretrained models were used, it might be helpful to evaluate to what extent the decoding is sensitive to the input neural recording or whether the model always outputs meaningful speech. The authors might consider two control analyses: a) whether the model still generates speech-like output if the input is random noise; b) whether the model can decode a complete sentence if the first half recording of a sentence is real but the second half is replaced with noise.

      We thank the reviewer for raising this crucial point regarding potential data leakage and the sensitivity of decoding to neural input.

      We confirm that the pre-training phase of our core models (Wav2Vec2.0 encoder, HiFiGAN decoder) was conducted exclusively on the LibriSpeech corpus (960 hours), which is entirely separate from the TIMIT corpus used for our ECoG experiments. The subsequent fine-tuning of the CosyVoice 2.0 voice cloner for speaker adaptation was performed on the training set portion of the entire TIMIT corpus. Importantly, the test set for all neural decoding evaluations was strictly held out and never used during any fine-tuning stage. This data separation is now explicitly stated in the " Methods" sections for the Speech Autoencoder and the CosyVoice fine-tuning.

      Regarding the potential of training on other corpora, we agree it is a valuable robustness check. Previous work has demonstrated that self-supervised speech models like Wav2Vec2.0 learn generalizable representations that transfer well across domains (e.g., Millet et al., NeurIPS 2022). We believe our use of LibriSpeech, a large and diverse corpus, provides a strong, general-purpose acoustic prior.

      We agree with the reviewer that control analyses are essential to demonstrate that the decoded output is driven by neural signals and not merely the generative prior of the models. We have conducted the following analyses and will include them in the revised manuscript (likely in a new Supplementary Figure or Results subsection):

      (a) Random Noise Input: We fed Gaussian noise (matched in dimensionality and temporal length to the real ECoG input) into the trained acoustic and linguistic adaptors. The outputs were evaluated. The acoustic pathway generated unstructured, noisy spectrograms with no discernible phonetic structure, and the linguistic pathway produced either highly incoherent word sequences or failed to generate meaningful tokens. The fusion via CosyVoice produced unintelligible babble. This confirms that the generative models alone cannot produce structured speech without meaningful neural input.

      (b) Partial Sentence Input (Real + Noise): In the acoustic pathway, we replaced the first half, the second half, and all the ECoG recording for test sentences with Gaussian noise. The melspectrogram R<sup>2</sup> showed a clear degradation in the reconstructed speech corresponding to the noisy segment. We did not do similar experiments in the linguistic pathway because the TTS generator is pre-trained by HuggingFace. We did not train any parameters of Parler-TTS. These results strongly indicate that our model's performance is contingent on and sensitive to the specific neural input, validating that it is performing genuine neural decoding.

      Edits:

      Page 19, Lines 533-538:

      “The parameters in Wav2Vec2.0 were frozen within this training phase. The parameters in HiFi-GAN were optimized using the Adam optimizer with a fixed learning rate of 10<sub>-5</sub>, 𝛽<sub>!</sub> = 0.9, 𝛽<sub>2</sub> = 0.999. We trained this Autoencoder in LibriSpeech, a 960-hour English speech corpus with a sampling rate of 16kHz, which is entirely separate from the TIMIT corpus used for our ECoG experiments. We spent 12 days in parallel training on 6 Nvidia GeForce RTX3090 GPUs. The maximum training epoch was 2000. The optimization did not stop until the validation loss no longer decreased”.

      Edits:

      Page9, Lines214-223:

      “The acoustic pathway, implemented through a bi-directional LSTM neural adaptor architecture (Fig. 1B), specializes in reconstructing fundamental acoustic properties of speech. This module directly processes neural recordings to generate precise time-frequency representations, focusing on preserving speaker-specific spectral characteristics like formant structures, harmonic patterns, and spectral envelope details. Quantitative evaluation confirms its core competency: achieving a mel-spectrogram R² of 0.793 ± 0.016 (Fig. 3B) demonstrates remarkable fidelity in reconstructing acoustic microstructure. This performance level is statistically indistinguishable from original speech degraded by 0dB additive noise (0.771 ± 0.014, p = 0.242, one-sided t-test). We chose a bidirectional LSTM architecture for this adaptor because its recurrent nature is particularly suited to modeling the fine-grained, short- to medium-range temporal dependencies (e.g., within and between phonemes and syllables) that are critical for acoustic fidelity. An ablation study comparing LSTM against Transformer-based adaptors for this task confirmed that LSTMs yielded superior mel-spectrogram reconstruction fidelity (higher R²), as detailed in Table S1, likely by avoiding the oversmoothing of spectrotemporal details sometimes induced by the strong global context modeling of Transformers”.

      “To confirm that the acoustic pathway’s output is causally dependent on the neural signal rather than the generative prior of the HiFi-GAN, we performed a control analysis in which portions of the input ECoG recording were replaced with Gaussian noise. When either the first half, second half, or the entirety of the neural input was replaced by noise, the melspectrogram R² of the reconstructed speech dropped markedly, corresponding to the corrupted segment (Fig. S5). This demonstrates that the reconstruction is temporally locked to the specific neural input and that the model does not ‘hallucinate’ spectrotemporal structure from noise. These results validate that the acoustic pathway performs genuine, input-sensitive neural decoding”

      (2) For BCI applications, the decoding speed matters. Please report the model's inference speed. Additionally, the authors might also consider reporting cross-participant generalization and how the accuracy changes with recording duration.

      We thank the reviewer for these practical and important suggestions. 

      Inference Speed: You are absolutely right. On our hardware (single NVIDIA GeForce RTX 3090 GPU), the current pipeline has an inference time that is longer than the duration of the target speech segment. The primary bottlenecks are the sequential processing in the autoregressive linguistic adaptor and the high-resolution waveform generation in CosyVoice 2.0. This latency currently limits real-time application. We have now added this in the Discussion acknowledging this limitation and stating that future work must focus on architectural optimizations (e.g., non-autoregressive models, lighter vocoders) and potential hardware acceleration to achieve real-time performance, which is critical for a practical BCI.

      Cross-Participant Generalization: We agree that this is a key question for scalability. Our framework already addresses part of the cross-participant generalization challenge through the use of pre-trained generative modules (HiFi-GAN, Parler-TTS, CosyVoice 2.0), which are pretrained on large corpora and shared across all participants. Only a small fraction of the model, the lightweight neural adaptors, is subject-specific and requires a small amount of supervised fine-tuning (~20 minutes per participant). This design significantly reduces the per-subject calibration burden. As the reviewer implies, the ultimate goal would be pure zero-shot generalization. A promising future direction is to further improve cross-participant alignment by learning a shared neural feature encoder (e.g., using contrastive or self-supervised learning on aggregated ECoG data) before the personalized adaptors. We have added a paragraph in the Discussion outlining this as a major next step to enhance the framework’s practicality and further reduce calibration time.

      Accuracy vs. Recording Duration: Thank you for this insightful suggestion. To systematically evaluate the impact of training data volume on performance, we have conducted additional experiments using progressively smaller subsets of the full training set (i.e., 25%, 50%, and 75%). When we used more than 50% of the training data, performance degrades gracefully rather than catastrophically with less data, which is promising for potential clinical scenarios where data collection may be limited. We add another figure (Fig. S4) to demonstrate this.

      Edits:

      Pages 15-16, Lines 427-452:

      “There are several limitations in our study. The quality of the re-synthesized speech heavily relies on the performance of the generative model, indicating that future work should focus on refining and enhancing these models. Currently, our study utilized English speech sentences as input stimuli, and the performance of the system in other languages remains to be evaluated. Regarding signal modality and experimental methods, the clinical setting restricts us to collecting data during brief periods of awake neurosurgeries, which limits the amount of usable neural activity recordings. Overcoming this time constraint could facilitate the acquisition of larger datasets, thereby contributing to the re-synthesis of higher-quality natural speech. Furthermore, the inference speed of the current pipeline presents a challenge for real-time applications. On our hardware (a single NVIDIA GeForce RTX 3090 GPU), synthesizing speech from neural data takes approximately two to three times longer than the duration of the target speech segment itself. This latency is primarily attributed to the sequential processing in the autoregressive linguistic adaptor and the computationally intensive high-fidelity waveform generation in the vocoder (CosyVoice 2.0). While the current study focuses on offline reconstruction accuracy, achieving real-time or faster-than-real-time inference is a critical engineering goal for viable speech BCI prosthetics. Future work must therefore prioritize architectural optimizations, such as exploring non-autoregressive decoding strategies and more efficient neural vocoders, alongside potential hardware acceleration. Additionally, exploring non-invasive methods represents another frontier; with the accumulation of more data and the development of more powerful generative models, it may become feasible to achieve effective non-invasive neural decoding for speech resynthesis. Moreover, while our framework adopts specialized architectures (LSTM and Transformer) for distinct decoding tasks, an alternative approach is to employ a unified multimodal large language model (LLM) capable of joint acoustic-linguistic processing. Finally, the current framework requires training participant-specific adaptors, which limits its immediate applicability for new users. A critical next step is to develop methods that learn a shared, cross-participant neural feature encoder, for instance, by applying contrastive or selfsupervised learning techniques to larger aggregated ECoG datasets. Such an encoder could extract subject-invariant neural representations of speech, serving as a robust initialization before lightweight, personalized fine-tuning. This approach would dramatically reduce the amount of per-subject calibration data and time required, enhancing the practicality and scalability of the decoding framework for real-world BCI applications”

      “In summary, our dual-path framework achieves high speech reconstruction quality by strategically integrating language models for lexical precision and voice cloning for vocal identity preservation, yielding a 37.4% improvement in MOS scores over conventional methods. This approach enables high-fidelity, sentence-level speech synthesis directly from cortical recordings while maintaining speaker-specific vocal characteristics. Despite current constraints in generative model dependency and intraoperative data collection, our work establishes a new foundation for neural decoding development. Future efforts should prioritize: (1) refining few-shot adaptation techniques, (2) developing non-invasive implementations, (3) expanding to dynamic dialogue contexts, and (4) cross-subject applications. The convergence of neurophysiological data with multimodal foundation models promises transformative advances, not only revolutionizing speech BCIs but potentially extending to cognitive prosthetics for memory augmentation and emotional communication. Ultimately, this paradigm will deepen our understanding of neural speech processing while creating clinically viable communication solutions for those with severe speech impairments”

      Edits: 

      add another section in Methods: Page 22, Line 681:

      “Ablation study on training data volume”.

      “To assess the impact of training data quantity on decoding performance, we conducted an additional ablation experiment. For each participant, we created subsets of the full training set corresponding to 25%, 50%, and 75% of the original data by random sampling while preserving the temporal continuity of speech segments. Personalized acoustic and linguistic adaptors were then independently trained from scratch on each subset, following the identical architecture and optimization procedures described above. All other components of the pipeline, including the frozen pre-trained generators (HiFi-GAN, Parler-TTS) and the CosyVoice 2.0 voice cloner, remained unchanged. Performance metrics (mel-spectrogram R², WER, PER) were evaluated on the same held-out test set for all data conditions. The results (Fig. S4) demonstrate that when more than 50% of the training data is utilized, performance degrades gracefully rather than catastrophically, which is a promising indicator for clinical applications with limited data collection time”.

      (3) I appreciate that the author compared their model with the MLP, but more comparisons with previous models could be beneficial. Even simply summarizing some measures of earlier models, such as neural recording duration, WER, PER, etc., is ok.

      Thank you for this suggestion. We agree that a broader comparison contextualizes our contribution. We also acknowledge that given the differences in tasks, signal modality, and amount of data, it’s hard to draw a direct comparison. The main goal of this table is to summarize major studies, their methods and results for reference. We have now added a new Supplementary Table that summarizes key metrics from several recent and relevant studies in neural speech decoding. The table includes:

      - Neural modality (e.g., ECoG, sEEG, Utah array)

      - Approximate amount of neural data used per subject for decoder training

      - Primary task (perception vs. production)

      -Decoding framework

      -Reported Word Error Rate (WER) or similar intelligibility metrics (e.g., Character Error Rate)

      -Reported acoustic fidelity metrics (if available, e.g., spectral correlation)

      This table includes works such as Anumanchipalli et al., Nature 2019; Akbari et al., Sci Rep 2019; Willett et al., Nature 2023; and other contemporary studies. The table clearly shows that our dual-path framework achieves a highly competitive WER (~18.9%) using an exceptionally short neural recording duration (~20 minutes), highlighting its data efficiency. We will refer to this table in the revised manuscript.

      Edits:

      Page 14, Lines 374-376:

      “Our framework establishes a framework for speech decoding by outperforming prior acousticonly or linguistic-only approaches (Table S3) through integrated pretraining-powered acoustic and linguistic decoding”

      Minor:

      (1) Some processes might be described earlier, for example, the electrodes were selected, and the model was trained separately for each participant. That information was only described in the Method section now.

      Thank you for catching these. We have revised the manuscript accordingly.

      Edits:

      Page4, Lines 89-95:

      “Our proposed framework for reconstructing speech from intracranial neural recordings is designed around two complementary decoding pathways: an acoustic pathway focused on preserving low-level spectral and prosodic detail, and a linguistic pathway focused on decoding high-level textual and semantic content. For every participant, our adaptor is independently trained, and we select speech-responsive electrodes (selection details are provided in the Methods section) to tailor the model to individual neural patterns. These two streams are ultimately fused to synthesize speech that is both natural-sounding and intelligible, capturing the full richness of spoken language. Fig. 1 provides a schematic overview of this dual-pathway architecture”

      (2) Line 224-228 Figure 2 should be Figure 3

      Thank you for catching these. We have revised the manuscript accordingly. The information about participant-specific training and electrode selection is now briefly mentioned in the "Results" overview (section: "The acoustic and linguistic performance..."), with details still in the Methods. The figure reference error has been corrected.

      Edits:

      Page7, Lines 224-228:

      “However, exclusive reliance on acoustic reconstruction reveals fundamental limitations. Despite excellent spectral fidelity, the pathway produces critically impaired linguistic intelligibility. At the word level, intelligibility remains unacceptably low (WER = 74.6 ± 5.5%, Fig. 3D), while MOS and phoneme-level precision fares only marginally better (MOS = 2.878 ± 0.205, Fig. 3C; PER = 28.1 ± 2.2%, Fig. 3E)”.

      (3) For Figure 3C, why does the MOS seem to be higher for baseline 3 than for ground truth? Is this significant?

      This is a detailed observation. Baseline 3 achieves a mean opinion score of 4.822 ± 0.086 (Fig. 3C), significantly surpassing even the original human speech (4.234 ± 0.097, p = 6.674×10⁻33). We believe this trend arises because the TIMIT corpus, recorded decades ago, contains inherent acoustic noise and relatively lower fidelity compared to modern speech corpus. In contrast, the Parler-TTS model used in Baseline 3 is trained on massive, highquality, clean speech datasets. Therefore, it synthesizes speech that listeners may subjectively perceive as "cleaner" or more pleasant, even if it lacks the original speaker's voice. Crucially, as the reviewer implies, our final integrated output does not aim to maximize MOS at the cost of speaker identity; it successfully balances this subjective quality with high intelligibility and restored acoustic fidelity. We will add a brief note explaining this possible reason in the caption of Figure 3C.

      Edits:

      Page9, Lines 235-245:

      “The linguistic pathway reconstructs high-intelligibility, higher-level linguistic information”

      “The linguistic pathway, instantiated through a pre-trained TTS generator (Fig. 1B), excels in reconstructing abstract linguistic representations. This module operates at the phonological and lexical levels, converting discrete word tokens into continuous speech signals while preserving prosodic contours, syllable boundaries, and phonetic sequences. It achieves a mean opinion score of 4.822 ± 0.086 (Fig. 3C) - significantly surpassing even the original human speech (4.234 ± 0.097, p = 6.674×10⁻33) in that the TIMIT corpus, recorded decades ago, contains inherent acoustic noise and relatively lower fidelity compared to modern speech corpus.  Complementing this perceptual quality, objective intelligibility metrics confirm outstanding performance: WER reaches 17.7 ± 3.2%, with PER at 11.0 ± 2.3%”.

      Reference

      (1) Chen M X, Firat O, Bapna A, et al. The best of both worlds: Combining recent advances in neural machine translation[C]//Proceedings of the 56th annual meeting of the association for computational linguistics (Volume 1: Long papers). 2018: 76-86

      (2) P. Chen et al. Do Self-Supervised Speech and Language Models Extract Similar Representations as Human Brain? 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2024). 2225–2229 (2024).

      (3) H. Akbari, B. Khalighinejad, J. L. Herrero, A. D. Mehta, N. Mesgarani, Towards reconstructing intelligible speech from the human auditory cortex. Scientific reports 9, 874 (2019).

      (4) S. Komeiji et al., Transformer-Based Estimation of Spoken Sentences Using Electrocorticography. Int Conf Acoust Spee, 1311-1315 (2022).

      (5) L. Bellier et al., Music can be reconstructed from human auditory cortex activity using nonlinear decoding models. Plos Biology 21,  (2023).

      (6) F. R. Willett et al., A high-performance speech neuroprosthesis. Nature 620,  (2023).

      (7) S. L. Metzger et al., A high-performance neuroprosthesis for speech decoding and avatar control. Nature 620, 1037-1046 (2023).

      (8) J. W. Li et al., Neural2speech: A Transfer Learning Framework for NeuralDriven Speech Reconstruction. Int Conf Acoust Spee, 2200-2204 (2024).

      (9) X. P. Chen et al., A neural speech decoding framework leveraging deep learning and speech synthesis. Nat Mach Intell 6,  (2024).

      (10) M. Wairagkar et al., An instantaneous voice-synthesis neuroprosthesis. Nature,  (2025).

    1. eLife Assessment

      In this important study, the authors engineered and characterised novel genetically encoded calcium indicators (GECIs) and an analytical tool (CaFire) capable of reporting and quantifying various sub-synaptic events, including miniature synaptic events, with a speed and sensitivity approaching that of intracellular electrophysiological recordings. They present compelling data validating this toolset, which will be of interest to neurobiologists studying synaptic calcium dynamics in various model systems.

    2. Reviewer #1 (Public review):

      Summary:

      Chen et al. engineered and characterized a suite of next-generation GECIs for the Drosophila NMJ that allow for the visualization of calcium dynamics within the presynaptic compartment, at presynaptic active zones, and in the postsynaptic compartment. These GECIs include ratiometric presynaptic Scar8m (targeted to synaptic vesicles), ratiometric active zone localized Bar8f (targeted to the scaffold molecule BRP), and postsynaptic SynapGCaMP8m. The authors demonstrate that these new indicators are a large improvement on the widely used GCaMP6 and GCaMP7 series GECIs, with increased speed and sensitivity. They show that presynaptic Scar8m accurately captures presynaptic calcium dynamics with superior sensitivity to the GCaMP6 and GCaMP7 series and with similar kinetics to chemical dyes. The active-zone targeted Bar8f sensor was assessed for the ability to detect release-site specific nanodomain changes, but the authors concluded that this sensor is still too slow to accurately do so. Lastly, the use of postsynaptic SynapGCaMP8m was shown to enable the detection of quantal events with similar resolution to electrophysiological recordings. Finally, the authors developed a Python-based analysis software, CaFire, that enables automated quantification of evoked and spontaneous calcium signals. These tools will greatly expand our ability to detect activity at individual synapses without the need for chemical dyes or electrophysiology.

      Strengths:

      In this study, the authors rigorously compare their newly engineered GECIs to those previously used at the Drosophila NMJ, highlighting improvements in localization, speed, and sensitivity. These comparisons appropriately substantiate the authors claim that their GECIs are superior to the ones currently in use.

      The authors demonstrate the ability of Scar8m to capture subtle changes in presynaptic calcium resulting from differences between MN-Ib and MN-Is terminals and from the induction of presynaptic homeostatic potentiation (PHP), rivaling the sensitivity of chemical dyes.

      The improved postsynaptic SynapGCaMP8m is shown to approach the resolution of electrophysiology in resolving quantal events.

      The authors created a publicly available pipeline that streamlines and standardizes analysis of calcium imaging data.

    3. Reviewer #2 (Public review):

      Summary:

      Calcium ions play a key role in synaptic transmission and plasticity. To improve calcium measurements at synaptic terminals, previous studies have targeted genetically encoded calcium indicators (GECIs) to pre- and postsynaptic locations. Here, Chen et al. improve these constructs by incorporating the latest GCaMP8 sensors and a stable red fluorescent protein to enable ratiometric measurements. Extensive characterization in the Drosophila neuromuscular junction demonstrates favorable performance of these new constructs relative to previous genetically encoded and synthetic calcium indicators in reporting synaptic calcium events. In addition, they develop a new analysis platform, 'CaFire', to facilitate automated quantification. Impressively, by positioning postsynaptic GCaMP8m near glutamate receptors, the authors show that their sensors can report miniature synaptic events with speed and sensitivity approaching that of intracellular electrophysiological recordings. These new sensors and the analysis platform provide a valuable tool for resolving synaptic events using all-optical methods.

      Strength:

      The authors present rigorous characterization of their sensors using well-established assays. They employ immunostaining and super-resolution STED microscopy to confirm correct subcellular targeting. Additionally, they quantify response amplitude, rise and decay kinetics, and provide side-by-side comparisons with earlier-generation GECIs and synthetic dyes. Importantly, they show that the new sensors can reproduce known differences in evoked Ca²⁺ responses between distinct nerve terminals. Finally, they present what appears to be the first simultaneous calcium imaging and intracellular mEPSP recording to directly assess the sensitivity of different sensors in detecting individual miniature synaptic events.

      The revised version contains extensive new data and clarification that fully addressed my previous concerns. In particular, I appreciate the side-by-side comparison with synthetic calcium indicator OGB-1 and the cytosolic version of GCaMP8m (now presented in Figure 3), which compellingly supports the favorable performance of their new sensors.

      Weakness:

      I have only one remaining suggestion about the precision of terminology, which I do think is important. The authors clarified in the revision that they "define SNR operationally as the fractional fluorescence change (ΔF/F).", and basically present ΔF/F values whenever they mentioned about SNR. However, if the intention is to present ΔF/F comparisons, I would strongly suggest replacing all mentions of "SNR" in the manuscript with "ΔF/F" or "fractional/relative fluorescence change".

      SNR and ΔF/F are fundamentally different quantities, each with a well-defined and distinct meaning: SNR measures fluorescence change relative to baseline fluctuations (noise), whereas ΔF/F measures fluorescence change relative to baseline fluorescence (F₀). While larger ΔF/F values often correlate with improved detectability, SNR also depends on additional factors such as indicator brightness, light collection efficiency, camera noise, and the stability of the experimental preparation. Referring to ΔF/F as SNR can therefore be misleading and may cause confusion for readers, particularly those from quantitative imaging backgrounds. Clarifying the terminology by consistently using ΔF/F would improve conceptual accuracy without requiring any reanalysis of the data.

    4. Reviewer #3 (Public review):

      Genetically encoded calcium indicators (GECIs) are essential tools in neurobiology and physiology. Technological constraints in targeting and kinetics of previous versions of GECIs have limited their application at the subcellular level. Chen et al. present a set of novel tools that overcome many of these limitations. Through systematic testing in the Drosophila NMJ, they demonstrate improved targeting of GCaMP variants to synaptic compartments and report enhanced brightness and temporal fidelity using members of the GCaMP8 series. These advancements are likely to facilitate more precise investigation of synaptic physiology. This manuscript could be improved by further testing the GECIs across physiologically relevant ranges of activity, including at high frequency and over long imaging sessions. Moreover, while the authors provide a custom software package (CaFire) for Ca2+ imaging analysis, comparisons to existing tools and more guidance for broader usability are needed.

      In this revised version, Chen et al. answered most of our concerns. The tools developed here will be useful for the community.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1

      Chen et al. engineered and characterized a suite of next-generation GECIs for the Drosophila NMJ that allow for the visualization of calcium dynamics within the presynaptic compartment, at presynaptic active zones, and in the postsynaptic compartment. These GECIs include ratiometric presynaptic Scar8m (targeted to synaptic vesicles), ratiometric active zone localized Bar8f (targeted to the scaffold molecule BRP), and postsynaptic SynapGCaMP8m. The authors demonstrate that these new indicators are a large improvement on the widely used GCaMP6 and GCaMP7 series GECIs, with increased speed and sensitivity. They show that presynaptic Scar8m accurately captures presynaptic calcium dynamics with superior sensitivity to the GCaMP6 and GCaMP7 series and with similar kinetics to chemical dyes. The active-zone targeted Bar8f sensor was assessed for the ability to detect release-site-specific nanodomain changes, but the authors concluded that this sensor is still too slow to accurately do so. Lastly, the use of postsynaptic SynapGCaMP8m was shown to enable the detection of quantal events with similar resolution to electrophysiological recordings. Finally, the authors developed a Python-based analysis software, CaFire, that enables automated quantification of evoked and spontaneous calcium signals. These tools will greatly expand our ability to detect activity at individual synapses without the need for chemical dyes or electrophysiology.

      We thank this Reviewer for the overall positive assessment of our manuscript and for the incisive comments.

      (1) The role of Excel in the pipeline could be more clearly explained. Lines 182-187 could be better worded to indicate that CaFire provides analysis downstream of intensity detection in ImageJ. Moreover, the data type of the exported data, such as .csv or .xlsx, should be indicated instead of 'export to graphical program such as Microsoft Excel'.

      We thank the Reviewer for these comments, many of which were shared by the other reviewers. In response, we have now 1) more clearly explained the role of Excel in the CaFire pipeline (lines 677-681), 2) revised the wording in lines 676-679 to indicate that CaFire provides analysis downsteam of intensity detection in ImageJ, and 3) Clarified the exported data type to Excel (lines 677-681). These efforts have improved the clarity and readability of the CaFire analysis pipeline.

      (2) In Figure 2A, the 'Excel' step should either be deleted or included as 'data validation' as ImageJ exports don't require MS Excel or any specific software to be analysed. (Also, the graphic used to depict Excel software in Figure 2A is confusing.)

      We thank the reviewer for this helpful suggestion. In the Fig. 2A, we have changed the Excel portion and clarified the processing steps in the revised methods. Specifically, we now indicate that ROIs are first selected in Fiji/ImageJ and analyzed to obtain time-series data containing both the time information and the corresponding imaging mean intensity values. These data are then exported to a spreadsheet file (e.g., Excel), which is used to organize the output before being imported into CaFire for subsequent analysis. These changes can be found in the Fig. 2A and methods (lines 676-681).

      (3) Figure 2B should include the 'Partition Specification' window (as shown on the GitHub) as well as the threshold selection to give the readers a better understanding of how the tool works.

      We absolutely agree with this comment, and have made the suggested changes to the Fig. 2B. In particular, we have replaced the software interface panels and now include windows illustrating the Load File, Peak Detection, and Partition functions. These updated screenshots provide a clearer view of how CaFire is used to load the data, detect events, and perform partition specification for subsequent analysis. We agree these changes will give the readers a better understanding of how the tool works, and we thank the reviewer for this comment.

      (4) The presentation of data is well organized throughout the paper. However, in Figure 6C, it is unclear how the heatmaps represent the spatiotemporal fluorescence dynamics of each indicator. Does the signal correspond to a line drawn across the ROI shown in Figure 6B? If so, this should be indicated.

      We apologize that the heatmaps were unclear in Fig panel 6C (Fig. 7C in the Current revision). Each heatmap is derived from a one-pixel-wide vertical line within a miniature-event ROI. These heatmaps correspond to the fluorescence change in the indicated SynapGCaMP variant of individual quantal events and their traces shown in Fig. 7C, with a representative image of the baseline and peak fluorescence shown in Fig. 7B. Specifically, we have added the following to the revised Fig. 7C legend:

      The corresponding heatmaps below were generated from a single vertical line extracted from a representative miniature-event ROI, and visualize the spatiotemporal fluorescence dynamics (ΔF/F) along that line over time.

      (5) In Figure 6D, the addition of non-matched electrophysiology recordings is confusing. Maybe add "at different time points" to the end of the 6D legend, or consider removing the electrophysiology trace from Figure 6D and referring the reader to the traces in Figure 7A for comparison (considering the same point is made more rigorously in Figure 7).

      This is a good point, one shared with another reviewer. We apologize this was not clear, and have now revised this part of the figure to remove the electrophysiological traces in what is now Fig. 7 while keeping the paired ones still in what is now Fig. 8A as suggested by the reviewer. We agree this helps to clarify the quantal calcium transients.

      (6) In GitHub, an example ImageJ Script for analyzing the images and creating the inputs for CaFire would be helpful to ensure formatting compatibility, especially given potential variability when exporting intensity information for two channels. In the Usage Guide, more information would be helpful, such as how to select ∆R/R, ideally with screenshots of the application being used to analyze example data for both single-channel and two-channel images.

      We agree that additional details added to the GitHub would be helpful for users of CaFire. In response, we have now added the following improvements to the GitHub site: 

      - ImageJ operation screenshots

      Step-by-step illustrations of ROI drawing and Multi Measure extraction.

      - Example Excel file with time and intensity values

      Demonstrates the required data format for CaFire import, including proper headers.

      - CaFire loading screenshots for single-channel and dual-channel imaging

      Shows how to import GCaMP into Channel 1 and mScarlet into Channel 2.

      - Peak Detection and Partition setting screenshots

      Visual examples of automatic peak detection, manual correction, and trace partitioning.

      - Instructions for ROI Extraction and CaFire Analysis

      A written guide describing the full workflow from ROI selection to CaFire data export.

      These changes have improved the usability and accessibility of CaFire, and we thank the reviewer for these points.

      Reviewer #2

      Calcium ions play a key role in synaptic transmission and plasticity. To improve calcium measurements at synaptic terminals, previous studies have targeted genetically encoded calcium indicators (GECIs) to pre- and postsynaptic locations. Here, Chen et al. improve these constructs by incorporating the latest GCaMP8 sensors and a stable red fluorescent protein to enable ratiometric measurements. In addition, they develop a new analysis platform, 'CaFire', to facilitate automated quantification. Using these tools, the authors demonstrate favorable properties of their sensors relative to earlier constructs. Impressively, by positioning postsynaptic GCaMP8m near glutamate receptors, they show that their sensors can report miniature synaptic events with speed and sensitivity approaching that of intracellular electrophysiological recordings. These new sensors and the analysis platform provide a valuable tool for resolving synaptic events using all-optical methods.

      We thank the Reviewer for their overall positive evaluation and comments.

      Major comments:

      (1) While the authors rigorously compared the response amplitude, rise, and decay kinetics of several sensors, key parameters like brightness and photobleaching rates are not reported. I feel that including this information is important as synaptically tethered sensors, compared to freely diffusible cytosolic indicators, can be especially prone to photobleaching, particularly under the high-intensity illumination and high-magnification conditions required for synaptic imaging. Quantifying baseline brightness and photobleaching rates would add valuable information for researchers intending to adopt these tools, especially in the context of prolonged or high-speed imaging experiments.

      This is a good point made by the reviewer, and one we agree will be useful for researchers to be aware. First, it is important to note that the photobleaching and brightness of the sensors will vary depending on the nature of the user’s imaging equipment, which can vary significantly between widefield microscopes (with various LED or halogen light sources for illumination), laser scanning systems (e.g., line scans with confocal systems), or area scanning systems using resonant scanners (as we use in our current study). Under the same imaging settings, GCaMP8f and 8m exhibit comparable baseline fluorescence, whereas GCaMP6f and 6s are noticeably dimmer; because our aim is to assess each reagent’s potential under optimal conditions, we routinely adjust excitation/camera parameters before acquisition to place baseline fluorescence in an appropriate dynamic range. As an important addition to this study, motivated by the reviewer’s comments above, we now directly compare neuronal cytosolic GCaMP8m expression with our Scar8m sensor, showing higher sensitivity with Scar8m (now shown in the new Fig. 3F-H).

      Regarding photobleaching, GCaMP signals are generally stable, while mScarlet is more prone to bleaching: in presynaptic area scanned confocal recordings, the mScarlet channel drops by ~15% over 15 secs, whereas GCaMP6s/8f/8m show no obvious bleaching over the same window (lines 549-553). In contrast, presynaptic widefield imaging using an LED system (CCD), GCaMP8f shows ~8% loss over 15 secs (lines 610-611). Similarly, for postsynaptic SynapGCaMP6f/8f/8m, confocal resonant area scans show no obvious bleaching over 60 secs, while widefield shows ~2–5% bleaching over 60 secs (lines 634-638). Finally, in active-zone/BRP calcium imaging (confocal), mScarlet again bleaches by ~15% over 15 s, while GCaMP8f/8m show no obvious bleaching. The mScarlet-channel bleaching can be corrected in Huygens SVI (Bleaching correction or via the Deconvolution Wizard), whereas we avoid applying bleaching correction to the green GCaMP channel when no clear decay is present to prevent introducing artifacts. This information is now added to the methods (lines 548-553).

      (2) In several places, the authors compare the performance of their sensors with synthetic calcium dyes, but these comparisons are based on literature values rather than on side-by-side measurements in the same preparation. Given differences in imaging conditions across studies (e.g., illumination, camera sensitivity, and noise), parameters like indicator brightness, SNR, and photobleaching are difficult to compare meaningfully. Additionally, the limited frame rate used in the present study may preclude accurate assessment of rise times relative to fast chemical dyes. These issues weaken the claim made in the abstract that "...a ratiometric presynaptic GCaMP8m sensor accurately captures .. Ca²⁺ changes with superior sensitivity and similar kinetics compared to chemical dyes." The authors should clearly acknowledge these limitations and soften their conclusions. A direct comparison in the same system, if feasible, would greatly strengthen the manuscript.

      We absolutely agree with these points made the reviewer, and have made a concerted effort to address them through the following:

      We have now directly compared presynaptic calcium responses on the same imaging system using the chemical dye Oregon Green Bapta-1 (OGB-1), one of the primary synthetic calcium indicators used in our field. These experiments reveal that Scar8f exhibits markedly faster kinetics and an improved signal-to-noise ratio compared to OGB-1, with higher peak fluorescence responses (Scar8f: 0.32, OGB-1: 0.23). The rise time constants of the two indicators are comparable (both ~3 msecs), whereas the decay of Scar8f is faster than that of OGB-1 (Scar8f: ~40, OGB-1: ~60), indicating more rapid signal recovery. These results now directly demonstrate the superiority of the new GCaMP8 sensors we have engineered over conventional synthetic dyes, and are now presented in the new Fig. 3A-E of the manuscript.

      We agree with the reviewer that, in the original submission, the relatively slow resonant area scans (~115 fps) limited the temporal resolution of our rise time measurements. To address this, we have re-measured the rise time using higher frame-rate line scans (kHz). For Scar8f, the rise time constant was 6.736 msec at ~115 fps resonant area scanned, but shortened to 2.893 msec when imaged at ~303 fps, indicating that the original protocol underestimated the true kinetics. In addition, for Bar8m, area scans at ~118 fps yielded a rise time constant of 9.019 msec, whereas line scans at ~1085 fps reduced the rise time constant to 3.230 msec. These new measurements are now incorporated into the manuscript ( Figs. 3,4, and 6) to more accurately reflect the fast kinetics of these indicators.

      (3) The authors state that their indicators can now achieve measurements previously attainable with chemical dyes and electrophysiology. I encourage the authors to also consider how their tools might enable new measurements beyond what these traditional techniques allow. For example, while electrophysiology can detect summed mEPSPs across synapses, imaging could go a step further by spatially resolving the synaptic origin of individual mEPSP events. One could, for instance, image MN-Ib and MN-Is simultaneously without silencing either input, and detect mEPSP events specific to each synapse. This would enable synapse-specific mapping of quantal events - something electrophysiology alone cannot provide. Demonstrating even a proof-of-principle along these lines could highlight the unique advantages of the new tools by showing that they not only match previous methods but also enable new types of measurements.

      These are excellent points raised by the reviewer. In response, we have done the following: 

      We have now included a supplemental video as “proof-of-principle” data showing simultaneous imaging of SynapGCaMP8m quantal events at both MN-Is and -Ib, demonstrating that synapse-specific spatial mapping of quantal events can be obtained with this tool (see new Supplemental Video 1). 

      We have also included an additional discussion of the potential and limitations of these tools for new measurements beyond conventional approaches. This discussion is now presented in lines 419-421 in the manuscript.

      (4) For ratiometric measurements, it is important to estimate and subtract background signals in each channel. Without this correction, the computed ratio may be skewed, as background adds an offset to both channels and can distort the ratio. However, it is not clear from the Methods section whether, or how, background fluorescence was measured and subtracted.

      This is a good point, and we agree more clarification about how ratiometric measurements were made is needed. In response, we have now added the following to the Methods section (lines 548-568):

      Time-lapse videos were stabilized and bleach-corrected prior to analysis, which visibly reduced frame-toframe motion and intensity drift. In the presynaptic and active-zone mScarlet channel, a bleaching factor of ~1.15 was observed during the 15 sec recording. This bleaching can be corrected using the “Bleaching correction” tool in Huygens SVI. For presynaptic and active-zone GCaMP signals, there was minimal bleaching over these short imaging periods. Therefore, the bleaching correction step for GCaMP was skipped. Both GCaMP and mScarlet channels were processed using the default settings in the Huygens SVI “Deconvolution Wizard” (with the exception of the bleaching correction option). Deconvolution was performed using the CMLE algorithm with the Huygens default stopping criterion and a maximum of 30 iterations, such that the algorithm either converged earlier or, if convergence was not reached, was terminated at this 30iteration limit; no other iteration settings were used across the GCaMP series. ROIs were drawn on the processed images using Fiji ImageJ software, and mean fluorescence time courses were extracted for the GCaMP and mScarlet channels, yielding F<sub>GCaMP</sub>(t) and F<sub>mScarlet</sub>(t). F(t)s were imported into CaFire with GCaMP assigned to Channel #1 (signal; required) and mScarlet to Channel #2 (baseline/reference; optional). If desired, the mScarlet signal could be smoothed in CaFire using a user-specified moving-average window to reduce high-frequency noise. In CaFire’s ΔR/R mode, the per-frame ratio was computed as R(t)=F<sub>GCaMP</sub>(t) and F<sub>mScarlet</sub>(t); a baseline ratio R0 was estimated from the pre-stimulus period, and the final response was reported as ΔR/R(t)=[R(t)−R0]/R0, which normalizes GCaMP signals to the co-expressed mScarlet reference and thereby reduces variability arising from differences in sensor expression level or illumination across AZs.

      (5) At line 212, the authors claim "... GCaMP8m showing 345.7% higher SNR over GCaMP6s....(Fig. 3D and E) ", yet the cited figure panels do not present any SNR quantification. Figures 3D and E only show response amplitudes and kinetics, which are distinct from SNR. The methods section also does not describe details for how SNR was defined or computed.

      This is another good point. We define SNR operationally as the fractional fluorescence change (ΔF/F). Traces were processed with CaFire, which estimates a per-frame baseline F<sub>0</sub>(t) with a user-configurable sliding window and percentile. In the Load File panel, users can specify both the length of the moving baseline window and the desired percentile; the default settings are a 50-point window and the 30th percentile, representing a 101-point window centered on each time point (previous 50 to next 50 samples) and took the lower 30% of values within that window to estimate F<sub>0</sub>(t). The signal was then computed as ΔF/F=[F(t)−F0(t)]/F0(t). This ΔF/F value is what we report as SNR throughout the manuscript and is now discussed explicitly in the revised methods (lines 686-693).

      (6) Lines 285-287 "As expected, summed ΔF values scaled strongly and positively with AZ size (Fig. 5F), reflecting a greater number of Cav2 channels at larger AZs". I am not sure about this conclusion. A positive correlation between summed ΔF values and AZ size could simply reflect more GCaMP molecules in larger AZs, which would give rise to larger total fluorescence change even at a given level of calcium increase.

      The reviewer makes a good point, one that we agree should be clarified. The reviewer is indeed correct that larger active zones should have more abundant BRP protein, which in turn will lead to a higher abundance of the Bar8f sensor, which should lead to a higher GCaMP response simply by having more of this sensor. However, the inclusion of the ratiometric mScarlet protein should normalize the response accurately, correcting for this confound, in which the higher abundance of GCaMP should be offset (normalized) by the equally (stoichiometric) higher abundance of mScarlet. Therefore, when the ∆R/R is calculated, the differences in GCaMP abundance at each AZ should be corrected for the ratiometric analysis. We now use an improved BRP::mScarlet3::GCaMP8m (Bar8m) and compute ΔR/R with R(t)=F<sub>GCaMP8m</sub>/F<sub>mScarlet3</sub>. ROIs were drawn over individual AZs (Fig. 6B). CaFire estimated R0 with a sliding 101-point window using the lowest 10% of values, and responses were reported as ΔR/R=[R−R0]/R0. Area-scan examples (118 fps) show robust ΔR/R transients (peaks ≈1.90 and 3.28; tau rise ≈9.0–9.3 ms; Fig. 6C, middle).

      We have now made these points more clearly in the manuscript (lines 700-704) and moved the Bar8f intensity vs active zone size data to Table S1. Together, these revisions improve the indicator-abundance confound (via mScarlet normalization). 

      (6) Lines 313-314: "SynapGCaMP quantal signals appeared to qualitatively reflect the same events measured with electrophysiological recordings (Fig. 6D)." This statement is quite confusing. In Figure 6D, the corresponding calcium and ephys traces look completely different and appear to reflect distinct sets of events. It was only after reading Figure 7 that I realized the traces shown in Figure 6D might not have been recorded simultaneously. The authors should clarify this point.

      Yes, we absolutely agree with this point, one shared by Reviewer 1. In response, we have removed the electrophysiological traces in Fig. 6 to clarify that just the calcium responses are shown, and save the direct comparison for the Fig. 7 data (now revised Fig. 8).

      (8) Lines 310-313: "SynapGCaMP8m .... striking an optimal balance between speed and sensitivity", and Lines 314-316: "We conclude that SynapGCaMP8m is an optimal indicator to measure quantal transmission events at the synapse." Statements like these are subjective. In the authors' own comparison, GCaMP8m is significantly slower than GCaMP8f (at least in terms of decay time), despite having a moderately higher response amplitude. It is therefore unclear why GCaMP8m is considered 'optimal'. The authors should clarify this point or explain their rationale for prioritizing response amplitude over speed in the context of their application.

      This is another good point that we agree with, as the “optimal” sensor will of course depend on the user’s objectives. Hence, we used the term “an optimal sensor” to indicate it is what we believed to be the best one for our own uses. However, this point should be clarified and better discussed. In response, we have revised the relevant sections of the manuscript to better define why we chose the 8m sensors to strike an optimal balance of speed and sensitivity for our uses, and go on to discuss situations in which other sensor variants might be better suited. These are now presented in lines 223-236 in the revised manuscript, and we thank the reviewer for making these comments, which have improved our study.

      Minor comments

      (1)  Please include the following information in the Methods section:

      (a) For Figures 3 and 4, specify how action potentials were evoked. What type of electrodes were used, where were they placed, and what amount of current or voltage was applied?

      We apologize for neglecting to include this information in the original submission. We have now added this information to the revised Methods section (lines 537-543).

      (b) For imaging experiments, provide information on the filter sets used for each imaging channel, and describe how acquisition was alternated or synchronized between the green and red channels in ratiometric measurements. Additionally, please report the typical illumination intensity (in mW/mm²) for each experimental condition.

      We thank the reviewer for this helpful comment. We have now added detailed information about the imaging configuration to the Methods (lines 512-528) with the following:

      Ca2+ imaging was conducted using a Nikon A1R resonant scanning confocal microscope equipped with a 60x/1.0 NA water-immersion objective (refractive index 1.33). GCaMP signals were acquired using the FITC/GFP channel (488-nm laser excitation; emission collected with a 525/50-nm band-pass filter), and mScarlet/mCherry signals were acquired using the TRITC/mCherry channel (561-nm laser excitation; emission collected with a 595/50-nm band-pass filter). ROIs focused on terminal boutons of MN-Ib or -Is motor neurons. For both channels, the confocal pinhole was set to a fixed diameter of 117.5 µm (approximately three Airy units under these conditions), which increases signal collection while maintaining adequate optical sectioning. Images were acquired as 256 × 64 pixel frames (two 12-bit channels) using bidirectional resonant scanning at a frame rate of ~118 frames/s; the scan zoom in NIS-Elements was adjusted so that this field of view encompassed the entire neuromuscular junction and was kept constant across experiments. In ratiometric recordings, the 488-nm (GCaMP) and 561-nm (mScarlet) channels were acquired in a sequential dual-channel mode using the same bidirectional resonant scan settings: for each time point, a frame was first collected in the green channel and then immediately in the red channel, introducing a small, fixed frame-to-frame temporal offset while preserving matched spatial sampling of the two channels.

      Directly measuring the absolute laser power at the specimen plane (and thus reporting illumination intensity in mW/mm²) is technically challenging on this resonant-scanning system, because it would require inserting a power sensor into the beam path and perturbing the optical alignment; consequently, we are unable to provide reliable absolute mW/mm² values. Instead, we now report all relevant acquisition parameters (objective, numerical aperture, refractive index, pinhole size, scan format, frame rate, and fixed laser/detector settings) and note that laser powers were kept constant within each experimental series and chosen to minimize bleaching and phototoxicity while maintaining an adequate signal-to-noise ratio. We have now added the details requested in the revised Methods section (lines 512-535), including information about the filter sets, acquisition settings, and typical illumination intensity.

      (2) Please clarify what the thin versus thick traces represent in Figures 3D, 3F, 4C, and 4E. Are the thin traces individual trials from the same experiment, or from different experiments/animals? Does the thick trace represent the mean/median across those trials, a fitted curve, or a representative example?

      We apologize this was not more clear in the original submission. Thin traces are individual stimulus-evoked trials (“sweeps”) acquired sequentially from the same muscle/NMJ in a single preparation; the panel is shown as a representative example of recordings collected across animals. The thick colored trace is the trialaveraged waveform (arithmetic mean) of those thin traces after alignment to stimulus onset and baseline subtraction (no additional smoothing beyond what is stated in Methods). The thick black curve over the decay phase is a single-exponential fit used to estimate τ. Specifically, we fit the decay segment by linear regression on the natural-log–transformed baseline-subtracted signal, which is equivalent to fitting y = y<sub>peak</sub>·e<sup>−t/τdecay</sup> over the decay window (revised Fig.4D and Fig.5C legends).

      (3) Please clarify what the reported sample size (n) represents. Does it indicate the number of experimental repeats, the number of boutons or PSDs, or the number of animals?

      Again, we apologize this was not clear. (n) refers to the number of animals (biological replicates), which is reported in Supplementary Table 1. All imaging was performed at muscle 6, abdominal segment A3. Per preparation, we imaged 1-2 NMJs in total, with each imaging targeting 2–3 terminal boutons at the target NMJ and acquired 2–3 imaging stacks choosing different terminal boutons per NMJ. For the standard stimulation protocol, we delivered 1 Hz stimulation for 1ms and captured 14 stimuli in a 15s time series imaging (lines 730-736).

      Reviewer #3

      Genetically encoded calcium indicators (GECIs) are essential tools in neurobiology and physiology. Technological constraints in targeting and kinetics of previous versions of GECIs have limited their application at the subcellular level. Chen et al. present a set of novel tools that overcome many of these limitations. Through systematic testing in the Drosophila NMJ, they demonstrate improved targeting of GCaMP variants to synaptic compartments and report enhanced brightness and temporal fidelity using members of the GCaMP8 series. These advancements are likely to facilitate more precise investigation of synaptic physiology.

      This is a comprehensive and detailed manuscript that introduces and validates new GECI tools optimized for the study of neurotransmission and neuronal excitability. These tools are likely to be highly impactful across neuroscience subfields. The authors are commended for publicly sharing their imaging software.

      This manuscript could be improved by further testing the GECIs across physiologically relevant ranges of activity, including at high frequency and over long imaging sessions. The authors provide a custom software package (CaFire) for Ca2+ imaging analysis; however, to improve clarity and utility for future users, we recommend providing references to existing Ca2+ imaging tools for context and elaborating on some conceptual and methodological aspects, with more guidance for broader usability. These enhancements would strengthen this already strong manuscript.

      We thank the Reviewer for their overall positive evaluation and comments. 

      Major comments:

      (1) Evaluation of the performance of new GECI variants using physiologically relevant stimuli and frequency. The authors took initial steps towards this goal, but it would be helpful to determine the performance of the different GECIs at higher electrical stimulation frequencies (at least as high as 20 Hz) and for longer (10 seconds) (Newman et al, 2017). This will help scientists choose the right GECI for studies testing the reliability of synaptic transmission, which generally requires prolonged highfrequency stimulation.

      We appreciate this point by the reviewer and agree it would be of interest to evaluate sensor performance with higher frequency stimulation and for a longer duration. In response, we performed a variety of stimulation protocols at high intensities and times, but found the data to be difficult to separate individual responses given the decay kinetics of all calcium sensors. Hence, we elected not to include these in the revised manuscript. However, we have now included an evaluation of the sensors with 20 Hz electrical stimulation for ~1 sec using a direct comparison of Scar8f with OGB-1. These data are now presented in a new Fig. 3D,E and discussed in the manuscript (lines 396-403).

      (2) CaFire.

      The authors mention, in line 182: 'Current approaches to analyze synaptic Ca2+ imaging data either repurpose software designed to analyze electrophysiological data or use custom software developed by groups for their own specific needs.' References should be provided. CaImAn comes to mind (Giovannucci et al., 2019, eLife), but we think there are other software programs aimed at analyzing Ca2+ imaging data that would permit such analysis.

      Thank you for the thoughtful question. At this stage, we’re unable to provide a direct comparison with existing analysis workflows. In surveying prior studies that analyze Drosophila NMJ Ca²⁺ imaging traces, we found that most groups preprocess images in Fiji/ImageJ and then rely on their own custom-made MATLAB or Python scripts for downstream analysis (see Blum et al. 2021; Xing and Wu 2018). Because these pipelines vary widely across labs, a standardized head-to-head evaluation isn’t currently feasible. With CaFire, our goal is to offer a simple, accessible tool that does not require coding experience and minimizes variability introduced by custom scripts. We designed CaFire to lower the barrier to entry, promote reproducibility, and make quantal event analysis more consistent across users. We have added references to the sentence mentioned above.

      Regarding existing software that the reviewer mentioned – CaImAn (Giovannucci et al. 2019): We evaluated CaImAn, which is a powerful framework designed for large-scale, multicellular calcium imaging (e.g., motion correction, denoising, and automated cell/ROI extraction). However, it is not optimized for the per-event kinetics central to our project - such as extracting rise and decay times for individual quantal events at single synapses. Achieving this level of granularity would typically require additional custom Python scripting and parameter tuning within CaImAn’s code-centric interface. This runs counter to CaFire’s design goals of a nocode, task-focused workflow that enables users to analyze miniature events quickly and consistently without specialized programming expertise.

      Regarding Igor Pro (WaveMetrics), (Müller et al. 2012): Igor Pro is another platform that can be used to analyze calcium imaging signals. However, it is commercial (paid) software and generally requires substantial custom scripting to fit the specific analyses we need. In practice, it does not offer a simple, open-source, point-and-click path to per-event kinetic quantification, which is what CaFire is designed to provide.

      The authors should be commended for making their software publicly available, but there are some questions:

      How does CaFire compare to existing tools?

      As mentioned above, we have not been able to adapt the custom scripts used by various labs for our purposes, including software developed in MatLab (Blum et al. 2021), Python (Xing and Wu 2018), and Igor (Müller et al. 2012). Some in the field do use semi-publically available software, including Nikon Elements (Chen and Huang 2017) and CaImAn (Giovannucci et al. 2019). However, these platforms are not optimized for the per-event kinetics central to our project - such as extracting rise and decay times for individual quantal events at single synapses. We have added more details about CaFire, mainly focusing on the workflow and measurements, highlighting the superiority of CaFire, showing that CaFire provides a no-code, standardized pipeline with automated miniature-event detection and per-event metrics (e.g., amplitude, rise time τ, decay time τ), optional ΔR/R support, and auto-partition feature. Collectively, these features make CaFire simpler to operate without programming expertise, more transparent and reproducible across users, and better aligned with the event-level kinetics required for this project.

      Very few details about the Huygens deconvolution algorithms and input settings were provided in the methods or text (outside of MLE algorithm used in STED images, which was not Ca2+ imaging). Was it blind deconvolution? Did the team distill the point-spread function for the fluorophores? Were both channels processed for ratiometric imaging? Were the same settings used for each channel? Importantly, please include SVI Huygens in the 'Software and Algorithms' Section of the methods.

      We thank the reviewer for raising this important point. We have now expanded the Methods to describe our use of Huygens in more detail and have added SVI Huygens Professional (Scientific Volume Imaging, Hilversum, The Netherlands) to the “Software and Algorithms” section. For Ca²⁺ imaging data, time-lapse stacks were processed in the Huygens Deconvolution Wizard using the standard estimation algorithm (CMLE). This is not a blind deconvolution procedure. Instead, Huygens computes a theoretical point-spread function (PSF) from the full acquisition metadata (objective NA, refractive index, voxel size/sampling, pinhole, excitation/emission wavelengths, etc.); if refractive index values are provided and there is a mismatch, the PSF is adjusted to account for spherical aberration. We did not experimentally distill PSFs from bead measurements, as Huygens’ theoretical PSFs are sufficient for our data.

      Both green (GCaMP) and red (mScarlet) channels were processed for ratiometric imaging using the same workflow (stabilization, optional bleaching correction, and deconvolution within Huygens). For each channel, the PSF, background, and SNR were estimated automatically by the same built-in algorithms, so the underlying procedures were identical even though the numerical values differ between channels because of their distinct wavelengths and noise characteristics. Importantly, Huygens normalizes each PSF to unit total intensity, such that the deconvolution itself does not add or remove signal and therefore preserves intensity ratios between channels; only background subtraction and bleaching correction can change absolute fluorescence values. For the mScarlet channel, where we observed modest bleaching (~1.10 over 15 sec), we applied Huygens’ bleaching correction and visually verified that similar structures maintained comparable intensities after correction. For presynaptic GCaMP signals, bleaching over these short recordings was negligible, so we omitted the bleaching-correction step to avoid introducing multiplicative artifacts. This workflow ensures that ratiometric ΔR/R measurements are based on consistently processed, intensity-conserving deconvolved images in both channels.

      The number of deconvolution iterations could have had an effect when comparing GCAMP series; please provide an average number of iterations used for at least one experiment. For example, Figure 3, Syt::GCAMP6s, Scar8f & Scar8m, and, if applicable, the maximum number of permissible iterations.

      We thank the reviewer for this comment. For all Ca²⁺ imaging datasets, deconvolution in Huygens was performed using the recommended default settings of the CMLE algorithm with a maximum of 30 iterations. The stopping criterion was left at the Huygens default, so the algorithm either converged earlier or, if convergence was not reached, terminated at this 30-iteration limit. No other iteration settings were used across the GCaMP series (lines 555-559).

      Please clarify if the 'Express' settings in Huygens changed algorithms or shifted input parameters.

      We appreciate the reviewer’s question regarding the Huygens “Express” settings. For clarity, we note that all Ca²⁺ imaging data reported in this manuscript were deconvolved using the “Deconvolution Wizard”, not the “Deconvolution Express” mode. In the Wizard, we explicitly selected the CMLE algorithm (or GMLE in a few STED-related cases as recommended by SVI), using the recommended maximum of 30 iterations, and other recommended settings while allowing Huygens to auto-estimate background and SNR for each channel.Bleaching correction was toggled manually per channel (applied to mScarlet when bleaching was evident, omitted for GCaMP when bleaching was negligible), as described in the revised Methods (lines 553-559).

      By contrast, the Deconvolution Express tool in Huygens is a fully automated front-end that can internally adjust both the choice of deconvolution algorithm (e.g., CMLE vs. GMLE/QMLE) and key input parameters such as SNR, number of iterations, and quality threshold based on the selected “smart profile” and the image metadata. In preliminary tests on our datasets, Express sometimes produced results that were either overly smoothed or showed subtle artifacts, so we did not use it for any data included in this study. Instead, we relied exclusively on the Wizard with explicitly controlled settings to ensure consistency and transparency across all GCaMP series and ratiometric analyses.

      We suggest including a sample data set, perhaps in Excel, so that future users can beta test on and organize their data in a similar fashion.

      We agree that this would be useful, a point shared by R1 above. In response, we have added a sample data set to the GitHub site and included sample ImageJ data along with screenshots to explain the analysis in more detail. These improvements are discussed in the manuscript (lines 705-708).

      (3) While the challenges of AZ imaging are mentioned, it is not discussed how the authors tackled each one. What is defined as an active zone? Active zones are usually identified under electron microscopy. Arguably, the limitation of GCaMP-based sensors targeted to individual AZs, being unable to resolve local Ca2+ changes at individual boutons reliably, might be incorrect. This could be a limitation of the optical setup being used here. Please discuss further. What sensor performance do we need to achieve this performance level, and/or what optical setup would we need to resolve such signals?

      We appreciate the reviewer’s thoughtful comments and agree that the technical challenges of active zone (AZ) Ca²⁺ imaging merit further clarification. We defined AZs, as is the convention in our field, as individual BRP puncta at NMJs. These BRP puncta co-colocalize with individual puncta of other AZ components, including CAC, RBP, Unc13, etc. ROIs were drawn tightly over individual BRP puncta and only clearly separable spots were included.

      To tackle the specific obstacles of AZ imaging (small signal volume, high AZ density, and limited photon budget at high frame rates), we implemented both improved sensors and optimized analysis (Fig. 6). First, we introduced a ratiometric AZ-targeted indicator, BRP::mScarlet3::GCaMP8m (Bar8m), and computed ΔR/R with ΔR/R with R(t)=F<sub>GCaMP8m</sub>/F<sub>mScarlet3</sub>. ROIs were drawn over individual AZs (Fig. 6B). Under our standard resonant area-scan conditions (~118 fps), Bar8m produces robust ΔR/R transients at individual AZs (example peaks ≈ 3.28; τ<sub>rise</sub>≈9.0 ms; Fig. 6C, middle), indicating that single-AZ signals can be detected reproducibly when AZs are optically resolvable.

      Second, we increased temporal resolution using high-speed Galvano line-scan imaging (~1058 fps), which markedly sharpened the apparent kinetics (τ<sub>rise</sub>≈3.23 ms) and revealed greater between-AZ variability (Fig. 6C, right; 6D–E). Population analyses show that line scans yield much faster rise times than area scans (Fig. 6D) and a dramatically higher fraction of significantly different AZ pairs (8.28% and 4.14% in 8f and 8m areascan vs 78.62% in 8m line-scan, lines 721-725), uncovering pronounced AZ-to-AZ heterogeneity in Ca²⁺ signals. Together, these revisions demonstrate that under our current confocal configuration, AZ-targeted GCaMP8m can indeed resolve local Ca²⁺ changes at individual, optically isolated boutons.

      We have revised the Discussion to clarify that our original statement about the limitations of AZ-targeted GCaMPs refers specifically to this combination of sensor and optical setup, rather than an absolute limitation of AZ-level Ca²⁺ imaging. In our view, further improvements in baseline brightness and dynamic range (ΔF/F or ΔR/R per action potential), combined with sub-millisecond kinetics and minimal buffering, together with optical configurations that provide smaller effective PSFs and higher photon collection (e.g., higher-NA objectives, optimized 2-photon or fast line-scan modalities, and potentially super-resolution approaches applied to AZ-localized indicators), are likely to be required to achieve routine, high-fidelity Ca²⁺ measurements at every individual AZ within a neuromuscular junction.

      (4) In Figure 5: Only GCAMP8f (Bar8f fusion protein) is tested here. Consider including testing with GCAMP8m. This is particularly relevant given that GCAMP8m was a more successful GECI for subcellular post-synaptic imaging in Figure 6.

      We appreciate this point and request by Reviewer 3. The main limitation for detecting local calcium changes at AZs is the speed of the calcium sensor, and hence we used the fastest available (GCaMP8f) to test the Bar8f sensor. While replacing GCaMP8f with GCaMP8m would indeed be predicted to enhance sensitivity (SNR), since GCaMP8m does not have faster kinetics relative to GCaMP8f, it is unlikely to be a more successful GECI for visualizing local calcium differences at AZs. 

      That being said, we agree that the Bar8m tool, including the improved mScarlet3 indicator, would likely be of interest and use to the field. Fortunately, we had engineered the Bar8m sensor while this manuscript was in review, and just recently received transgenic flies. We have evaluated this sensor, as requested by the reviewer, and included our findings in Fig. 1 and 6. In short, while the sensitivity is indeed enhanced in Bar8m compared to Bar8f, the kinetics remain insufficient to capture local AZ signals. These findings are discussed in the revised manuscript (lines 424-442, 719-730), and we appreciate the reviewer for raising these important points.

      In earlier experiments, Bar8f yielded relatively weak fluorescence, so we traded frame rate for image quality during resonant area scans (~60 fps). After switching to Bar8m, the signal was bright enough to restore our standard 118 fps area-scan setting. Nevertheless, even with dual-channel resonant area scans and ratiometric (GCaMP/mScarlet) analysis, AZ-to-AZ heterogeneity remained difficult to resolve. Because Ca²⁺ influx at individual active zones evolves on sub-millisecond timescales, we adopted a high-speed singlechannel Galvano line-scan (~1 kHz) to capture these rapid transients. We first acquired a brief area image to localize AZ puncta, then positioned the line-scan ROI through the center of the selected AZ. This configuration provided the temporal resolution needed to uncover heterogeneity that was under-sampled in area-scan data. Consistent with this, Bar8m line-scan data showed markedly higher AZ heterogeneity (significant AZ-pair rate ~79%, vs. ~8% for Bar8f area scans and ~4% for Bar8m area scans), highlighting Bar8m’s suitability for quantifying AZ diversity. We have updated the text, Methods, and figure legend accordingly (tell reviewer where to find everything).

      (5) Figure 5D and associated datasets: Why was Interquartile Range (IQR) testing used instead of ZScoring? Generally, IQR is used when the data is heavily skewed or is not normally distributed. Normality was tested using the D'Agostino & Pearson omnibus normality test and found that normality was not violated. Please explain your reasoning for the approach in statistical testing. Correlation coefficients in Figures 5 E & F should also be reported on the graph, not just the table. In Supplementary Table 1. The sub-table between 4D-F and 5E-F, which describes the IQR, should be labeled as such and contain identifiers in the rows describing which quartile is described. The table description should be below. We would recommend a brief table description for each sub-table.

      Thank you for this helpful suggestion. We have updated the analysis in two complementary ways. First, we now perform paired two-tailed t-tests between every two AZs within the same preparation (pairwise AZ–AZ comparisons of peak responses). At α<0.05, the fraction of significant AZ pairs is ~79% for Bar8m line-scan data versus ~8% for Bar8f area-scan data, indicating markedly greater AZ-to-AZ diversity when measured at high temporal resolution. Second, for visually marking the outlying AZs, we re-computed the IQR (Q1–Q3) based on the individual values collected from each AZs(15 data points per AZ, 30 AZs for each genotype), and marked AZs whose mean response falls above Q3 or below Q1; IQR is used here solely as a robust dispersion reference rather than for hypothesis testing. Both analyses support the same observation: Bar8m line-scan data reveal substantially higher AZ heterogeneity than Bar8f and Bar8m area-scan data. We have revised the Methods, figure panels, and legends accordingly (t-test details; explicit “IQR (Q1–Q3)” labeling; significant AZ-pair rates reported on the plots) (lines 719-730).

      (6) Figure 6 and associated data. The authors mention: ' SynapGCaMP quantal signals appeared to qualitatively reflect the same events measured with electrophysiological recordings (Fig. 6D).' If that was the case, shouldn't the ephys and optical signal show some sort of correlation? The data presented in Figure 6D show no such correlation. Where do these signals come from? It is important to show the ROIs on a reference image.

      We apologize this was not clear, as similar points were raised by R1 and R2. We were just showing separate (uncorrelated) sample traces of electrophysiological and calcium imaging data. Given how confusing this presentation turned out to be, and the fact that we show the correlated ephys and calcium imaging events in Fig. 7, we have elected to remove the uncorrelated electrophysiological events in Fig. 6 to just focus on the calcium imaging events (now Figures 7 and 8).

      Figure 7B: Were Ca2+ transients not associated with mEPSPs ever detected? What is the rate of such events?

      This is an astute question. Yes indeed, during simultaneous calcium imaging and current clamp electrophysiology recordings, we occasionally observed GCaMP transients without a detectable mEPSP in the electrophysiological trace. This may reflect the detection limit of electrophysiology for very small minis; with our noise level and the technical limitation of the recording rig, events < ~0.2 mV cannot be reliably detected, whereas the optical signal from the same quantal event might still be detected. The fraction of calcium-only events was ~1–10% of all optical miniature events, depending on genotype (higher in lines with smaller average minis). These calcium-only detections were low-amplitude and clustered near the optical threshold (lines 361-365).

      Minor comments

      (1) It should be mentioned in the text or figure legend whether images in Figure 1 were deconvolved, particularly since image pre-processing is only discussed in Figure 2 and after.

      We thank the reviewer for pointing this out. Yes, the confocal images shown in Figure 1 were also deconvolved in Huygens using the CMLE-based workflow described in the revised Methods. We applied deconvolution to improve contrast, reduce out-of-focus blur, and better resolve the morphology of presynaptic boutons, active zones, and postsynaptic structures, so that the localization of each sensor is more clearly visualized. We have now explicitly stated in the Fig. 1 legend and Methods (lines 575-577) that these images were deconvolved prior to display. 

      (2) The abbreviation, SNR, signal-to-noise ratio, is not defined in the text.

      We have corrected this error and thank the reviewer for pointing this out.

      (3) Please comment on the availability of fly stocks and molecular constructs.

      We have clarified that all fly stocks and molecular constructs will be shared upon request (lines 747-750). We are also in the process of depositing the new Scar8f/m, Bar8f/m, and SynapGCaMP sensors to the Bloomington Drosophila Stock Center for public dissemination.

      (4) Please add detection wavelengths and filter cube information for live imaging experiments for both confocal and widefield.

      We thank the reviewer for this helpful suggestion. We have now added the detection wavelengths and filter cube configurations for both confocal and widefield live imaging to the Methods.

      For confocal imaging, GCaMP signals were acquired on a Nikon A1R system using the FITC/GFP channel (488-nm laser excitation; emission collected with a 525/50-nm band-pass filter), and mScarlet signals were acquired using the TRITC/mCherry channel (561-nm laser excitation; emission collected with a 595/50-nm band-pass filter). Both channels were detected with GaAsP detectors under the same pinhole and scan settings described above (lines 512-517).

      For widefield imaging, GCaMP was recorded using a GFP filter cube (LED excitation ~470/40 nm; emission ~525/50 nm), which is now explicitly described in the revised Methods section (lines 632-633).

      (5) Please include a mini frequency analysis in Supplemental Figure S1.

      We apologize for not including this information in the original submission. This is now included in the Supplemental Figure S1.

      (6) In Figure S1B, consider flipping the order of EPSP (currently middle) and mEPSP (currently left), to easily guide the reader through the quantification of Figure S1A (EPSPs, top traces & mEPSPs, bottom traces).

      We agree these modifications would improve readability and clarity. We have now re-ordered the electrophysiological quantifications in Fig. S1B as requested by the reviewer.

      (7) Figure 6C: Consider labeling with sensor name instead of GFP.

      We agree here as well, and have removed “GFP” and instead added the GCaMP variant to the heatmap in Fig. 7C.

      (8) Figure 6E, 7B, 7E: Main statistical differences highlighting sensor performance should be represented on the figures for clarity.

      We did not show these differences in the original submission in an effort to keep the figures “clean” and for clarity, putting the detailed statistical significance in Table S1. However, we agree with the reviewer that it would be easier to see these in the Fig. 6E and 7B,E graphs. This information has now been added the Figs. 7 and 8.

      (9) Please report if the significance tested between the ephys mini (WT vs IIB-/-, WT vs IIA-/-, IIB-/- vs IIA-/-) is the same as for Ca2+ mini (WT vs IIB-/-, WT vs IIA-/-, IIB-/- vs IIA-/-). These should also exhibit a very high correlation (mEPSP (mV) vs Ca2+ mini deltaF/F). These tests would significantly strengthen the final statement of "SynapGCaMP8m can capture physiologically relevant differences in quantal events with similar sensitivity as electrophysiology."

      We agree that adding the more detailed statistical analysis requested by the reviewer would strengthen the evidence for the resolution of quantal calcium imaging using SynapGCaMP8m. We have included the statistical significance between the ephys and calcium minis in Fig. 8 and included the following in the revised methods (lines 358-361), the Fig. 8 legend and Table S1:

      Using two-sample Kolmogorov–Smirnov (K–S) tests, we found that SynapGCaMP8m Ca²⁺ minis (ΔF/F, Fig. 8E) differ significantly across all genotype pairs (WT vs IIB<sup>-/-</sup>, WT vs IIA<sup>-/-</sup>, IIB<sup>-/-</sup> vs IIA<sup>-/-</sup>; all p < 0.0001). The genotype rank order of the group means (±SEM) is IIB<sup>-/-</sup> > WT > IIA<sup>-/-</sup> (0.967 ± 0.036; 0.713 ± 0.021; 0.427 ± 0.017; n=69, 65, 59). For electrophysiological minis (mEPSP amplitude, Fig. 8F), K–S tests likewise show significant differences for the same comparisons (all p < 0.0001) with D statistics of 0.1854, 0.3647, and 0.4043 (WT vs IIB<sup>-/-</sup>, WT vs IIA<sup>-/-</sup>, IIB<sup>-/-</sup> vs IIA<sup>-/-</sup>, respectively). Group means (±SEM) again follow IIB<sup>-/-</sup> > WT > IIA<sup>-/-</sup> (0.824 ± 0.017 mV; 0.636 ± 0.015 mV; 0.383 ± 0.007 mV; n=41 each). These K–S results demonstrate identical significance and rank order across modalities, supporting our conclusion that SynapGCaMP8m resolves physiologically relevant quantal differences with sensitivity comparable to electrophysiology.

      References

      Blum, Ian D., Mehmet F. Keleş, El-Sayed Baz, Emily Han, Kristen Park, Skylar Luu, Habon Issa, Matt Brown, Margaret C. W. Ho, Masashi Tabuchi, Sha Liu, and Mark N. Wu. 2021. 'Astroglial Calcium Signaling Encodes Sleep Need in Drosophila', Current Biology, 31: 150-62.e7.

      Chen, Y., and L. M. Huang. 2017. 'A simple and fast method to image calcium activity of neurons from intact dorsal root ganglia using fluorescent chemical Ca(2+) indicators', Mol Pain, 13: 1744806917748051.

      Giovannucci, Andrea, Johannes Friedrich, Pat Gunn, Jérémie Kalfon, Brandon L. Brown, Sue Ann Koay, Jiannis Taxidis, Farzaneh Najafi, Jeffrey L. Gauthier, Pengcheng Zhou, Baljit S. Khakh, David W. Tank, Dmitri B. Chklovskii, and Eftychios A. Pnevmatikakis. 2019. 'CaImAn an open source tool for scalable calcium imaging data analysis', eLife, 8: e38173.

      Müller, M., K. S. Liu, S. J. Sigrist, and G. W. Davis. 2012. 'RIM controls homeostatic plasticity through modulation of the readily-releasable vesicle pool', J Neurosci, 32: 16574-85.

      Wu, Yifan, Keimpe Wierda, Katlijn Vints, Yu-Chun Huang, Valerie Uytterhoeven, Sahil Loomba, Fran Laenen, Marieke Hoekstra, Miranda C. Dyson, Sheng Huang, Chengji Piao, Jiawen Chen, Sambashiva Banala, Chien-Chun Chen, El-Sayed Baz, Luke Lavis, Dion Dickman, Natalia V. Gounko, Stephan Sigrist, Patrik Verstreken, and Sha Liu. 2025. 'Presynaptic Release Probability Determines the Need for Sleep', bioRxiv: 2025.10.16.682770.

      Xing, Xiaomin, and Chun-Fang Wu. 2018. 'Unraveling Synaptic GCaMP Signals: Differential Excitability and Clearance Mechanisms Underlying Distinct Ca<sup>2+</sup> Dynamics in Tonic and Phasic Excitatory, and Aminergic Modulatory Motor Terminals in Drosophila', eneuro, 5: ENEURO.0362-17.2018.

    1. eLife Assessment

      This important study combines real-time key point tracking with transdermal activation of sensory neurons as a general technique to explore how somatosensory stimulation impacts behavior in freely moving mice. After addressing concerns about classification of the behavioral responses to nociceptor stimulation, the authors now convincingly demonstrate a state-dependence in the behavioral response following nociceptor activation, highlighting how their real-time optogenetic stimulation capabilities can yield new insights into complex sensory processing. This work is a technological advancement that will be of interest to a broad readership, in particular labs studying somatosensation, enabling rigorous investigation of behaviors that were previously difficult or impossible to study.

    2. Reviewer #1 (Public review):

      Summary:

      This study presents a system for delivering precisely controlled cutaneous stimuli to freely moving mice by coupling markerless real-time tracking to transdermal optogenetic stimulation, using the tracking signal to direct a laser via galvanometer mirrors. The principal claims are that the system achieves sub-mm targeting accuracy with a latency of <100 ms. Due to the nature of mouse gait, this enables accurate targeting of forepaws even when mice are moving.

      Strengths:

      The study is of high quality and the evidence for the claims is convincing. There is increasing focus in neurobiology in studying neural function in freely moving animals, engaged in natural behaviour. However, a substantial challenge is how to deliver controlled stimuli to sense organs under such conditions. The system presented here constitutes notable progress towards such experiments in the somatosensory system and is, in my view, a highly significant development that will be of interest to a broad readership.

      My comments on the original submission have been fully addressed.

    3. Reviewer #2 (Public review):

      Parkes et al. combined real-time keypoint tracking with transdermal activation of sensory neurons to examine the effects of recruitment of sensory neurons in freely moving mice. This builds on the authors' previous investigations involving transdermal stimulation of sensory neurons in stationary mice. They illustrate multiple scenarios in which their engineering improvements enable more sophisticated behavioral assessments, including 1) stimulation of animals in multiple states in large arenas, 2) multi-animal nociceptive behavior screening through thermal and optogenetic activation, and 3) stimulation of animals running through maze corridors. Overall, the experiments and the methodology, in particular, is written clearly. The revised manuscript nicely demonstrates a state-dependence in the behavioral response to activation of TrpV1 sensory neurons, which is a nice demonstration of how their real-time optogenetic stimulation capabilities can yield new insights into complex sensory processing.

      Comments on revisions:

      I agree that your revisions have substantially improved the clarity and quality of the work.

    4. Reviewer #3 (Public review):

      Summary:

      To explore the diverse nature of somatosensation, Parkes et al. established and characterized a system for precise cutaneous stimulation of mice as they walk and run in naturalistic settings. This paper provides a framework for real-time body part tracking and targeted optical stimuli with high precision, ensuring reliable and consistent cutaneous stimulation. It can be adapted in somatosensation labs as a general technique to explore somatosensory stimulation and its impact on behavior, enabling rigorous investigation of behaviors that were previously difficult or impossible to study.

      Strengths:

      The authors characterized the closed-loop system to ensure that it is optically precise and can precisely target moving mice. The integration of accurate and consistent optogenetic stimulation of the cutaneous afferents allows systematic investigation of somatosensory subtypes during a variety of naturalistic behaviors. Although this study focused on nociceptors innervating the skin (Trpv1::ChR2 animals), this setup can be extended to other cutaneous sensory neuron subtypes, such as low-threshold mechanoreceptors and pruriceptors. This system can also be adapted for studying more complex behaviors, such as the maze assay and goal-directed movements.

      Weaknesses:

      Although the paper has strengths, its weakness is that some behavioral outputs could be analyzed in more detail to reveal different types of responses to painful cutaneous stimuli. For example, paw withdrawals were detected after optogenetically stimulating the paw (Figures 3E and 3F). Animals exhibit different types of responses to painful stimuli on the hindpaw in standard pain assays, such as paw lifting, biting, and flicking, each indicating a different level of pain. The output of this system is body part keypoints, which are the standard input to many existing tools. Analyzing these detailed keypoints would greatly strengthen this system by providing deeper biological insights into the role of somatosensation in naturalistic behaviors. Additionally, if the laser spot size could be reduced to a diameter of 2 mm², it would allow the activation of a smaller number of cutaneous afferents, or even a single one, across different skin types in the paw, such as glabrous or hairy skin.

      Comments on revisions:

      The authors successfully addressed all of my questions and concerns.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study presents a system for delivering precisely controlled cutaneous stimuli to freely moving mice by coupling markerless real-time tracking to transdermal optogenetic stimulation, using the tracking signal to direct a laser via galvanometer mirrors. The principal claims are that the system achieves sub-mm targeting accuracy with a latency of <100 ms. The nature of mouse gait enables accurate targeting of forepaws even when mice are moving.

      Strengths:

      The study is of high quality and the evidence for the claims is convincing. There is increasing focus in neurobiology in studying neural function in freely moving animals, engaged in natural behaviour. However, a substantial challenge is how to deliver controlled stimuli to sense organs under such conditions. The system presented here constitutes notable progress towards such experiments in the somatosensory system and is, in my view, a highly significant development that will be of interest to a broad readership.

      Weaknesses:

      (1) "laser spot size was set to 2.00 } 0.08 mm2 diameter (coefficient of variation = 3.85)" is unclear. Is the 0.08 SD or SEM? (not stated). Also, is this systematic variation across the arena (or something else)? Readers will want to know how much the spot size varies across the arena - ie SD. CV=4 implies that SD~7 mm. ie non-trivial variation in spot size, implying substantial differences in power delivery (and hence stimulus intensity) when the mouse is in different locations. If I misunderstood, perhaps this helps the authors to clarify. Similarly, it would be informative to have mean & SD (or mean & CV) for power and power density. In future refinements of the system, would it be possible/useful to vary laser power according to arena location?

      We thank the reviewer for their comments and for identifying areas needing more clarity. The previous version was ambiguous: 0.08 refers to the standard deviation (SD). We have removed the ambiguity by stating mean ± SD and reporting a unitless coefficient of variation (CV).

      The revised text reads “laser spot size was set to 2.00 ± 0.08 mm<sup>2</sup> (mean ± SD; coefficient of variation = 0.039).” This makes clear that the variability in spot size is minimal: it is 0.08 mm<sup>2</sup> SD (≈0.03 mm SD in diameter). This should help clarify that spot size variability across the arena is minute and unlikely to contribute meaningfully to differences in stimulus intensity across locations. The power was modulated depending on the experiment, so we provide the unitless CV here in “The absolute optical power and power density were uniform across the glass platform (coefficient of variation 0.035 and 0.029, respectively; Figure 2—figure supplement)”. We are grateful to the reviewer for spotting these omissions.

      The reviewer also asks whether, in the future, it is “possible/useful to vary laser power according to arena location”. This is already possible in our system for infrared cutaneous stimulation using analog modulation (Figure 4). We have added the following sentence to make this clearer: “Laser power could be modulated using the analog control.”

      (2) "The video resolution (1920 x 1200) required a processing time higher than the frame interval (33.33 ms), resulting in real-time pose estimation on a sub-sample of all frames recorded". Given this, how was it possible to achieve 84 ms latency? An important issue for closed-loop research will relate to such delays. Therefore please explain in more depth and (in Discussion) comment on how the latency of the current system might be improved/generalised. For example, although the current system works well for paws it would seem to be less suited to body parts such as the snout that do not naturally have a stationary period during the gait cycle.

      We captured and stored video with a frame-to-frame interval of 33.33 ms (30 fps). DeepLabCut-live! was run in a latency-optimization mode, meaning that new frames are not processed while the network is busy - only the most recent frame is processed when free. The processing latency is measured per processed frame, and intermediate frames are thus skipped while the network is busy. Although a wide field of view and high resolution is required to capture the large environment, increasing the per-frame compute time, the processing latency remained small enough to track and stimulate moving mice. This processing latency of 84 ± 12 ms (mean ± SD) was calculated using the timestamps stored in the output files from DeepLabCut-live!: subtracting the frame acquisition timestamp from the frame processing timestamp across 16,000 processed frames recorded across four mice (4,000 each). In addition, there is a small delay to move the galvanometers and trigger the laser, calculated as 3.3 ± 0.5 ms (mean ± SD; 245 trials). This is described in the manuscript, but can be combined with the processing latency to indicate a total closed-loop delay of ≈87 ms so we have expanded on the ‘Optical system characterization’ subsection in the Methods, adding “We estimated a processing latency of 84 ± 12 ms (mean ± SD) by subtracting…” and that “In the current configuration the end-to-end closed-loop delay is ≈87 ms from the combination of the processing latency and other delays”. To the Discussion, we now comment on how this latency can be reduced and how this can allow for generalization to more rapidly moving body parts.

      Reviewer #2 (Public review):

      Parkes et al. combined real-time keypoint tracking with transdermal activation of sensory neurons to examine the effects of recruitment of sensory neurons in freely moving mice. This builds on the authors' previous investigations involving transdermal stimulation of sensory neurons in stationary mice. They illustrate multiple scenarios in which their engineering improvements enable more sophisticated behavioral assessments, including (1) stimulation of animals in multiple states in large arenas, (2) multi-animal nociceptive behavior screening through thermal and optogenetic activation, and (3) stimulation of animals running through maze corridors. Overall, the experiments and the methodology, in particular, are written clearly. However, there are multiple concerns and opportunities to fully describe their newfound capabilities that, if addressed, would make it more likely for the community to adopt this methodology:

      The characterization of laser spot size and power density is reported as a coefficient of variation, in which a value of ~3 is interpreted as uniform. My interpretation would differ - data spread so that the standard deviation is three times larger than the mean indicates there is substantial variability in the data. The 2D polynomial fit is shown in Figure 2 - Figure Supplement 1A and, if the fit is good, this does support the uniformity claim (range of spot size is 1.97 to 2.08 mm2 and range of power densities is 66.60 to 73.80 mW). The inclusion of the raw data for these measurements and an estimate of the goodness of fit to the polynomials would better help the reader evaluate whether these parameters are uniform across space and how stable the power density is across repeated stimulations of the same location. Even more helpful would be an estimate of whether the variation in the power density is expected to meaningfully affect the responses of ChR2-expressing sensory neurons.

      We thank the reviewer for their comments. As also noted in response to Reviewer 1, the coefficient of variation (CV) is now reported in unitless form (rather than a percentage) to ensure clarity. For avoidance of doubt, the CV is 0.039 (3.9%), so the variation in laser spot size is minimal – there is negligible spot size variability across the system. The ranges are indeed consistent with uniformity. We have included the goodness-of-fit estimates in the appropriate figure legend “fit with a two-dimensional polynomial (area R<sup>2</sup> = 0.91; power R<sup>2</sup> = 0.75)”. This indicates that the polynomials fit well overall.

      The system already allows for control of spot size. To examine whether the variation in the power density affects the responses of ChR2-expressing sensory neurons, we examined this in our previous work that focused more on input-output relationships, demonstrating a steep relationship between spot size (range of 0.02 mm<sup>2</sup> to 2.30 mm<sup>2</sup>) and the probability of paw response, demonstrating a meaningful change in response probability (Schorscher-Petcu et al. eLife, 2021). In future studies, we aim to use this approach to “titrate” cutaneous inputs as mice move through their environments.

      While the error between the keypoint and laser spot error was reported as ~0.7 to 0.8 mm MAE in Figure 2L, in the methods, the authors report that there is an additional error between predicted keypoints and ground-truth labeling of 1.36 mm MAE during real-time tracking. This suggests that the overall error is not submillimeter, as claimed by the authors, but rather on the order of 1.5 - 2.5 mm, which is considerable given the width of a hind paw is ~5-6 mm and fore paws are even smaller. In my opinion, the claim for submillimeter precision should be softened and the authors should consider that the area of the paw stimulated may differ from trial to trial if, for example, the error is substantial enough that the spot overlaps with the edge of the paw.

      We thank the reviewer for identifying a discrepancy in these reported errors. We clarify this below and in the manuscript

      The real-time tracking error is the mean absolute Euclidean distance (MAE) between ground truth and DLC on the left hind paw where likelihood was relatively high. More specifically, ground truth was obtained by manual annotation of the left hind paw center. The corresponding DLC keypoint was evaluated in frames with likelihood >0.8 (the stimulation threshold). Across 1,281 frames from five videos of freely exploring mice (30 fps), the MAE was 1.36 mm.

      The targeting error is the MAE between ground truth and the laser spot location, so should reflect the real-time tracking error plus errors from targeting the laser. More specifically, this metric was determined by comparing the manually determined ground truth keypoint of the left hind paw and the actual center of the laser spot. Importantly, this metric was calculated using four five-minute high-speed videos recorded at 270 fps of mice freely exploring the open arena (463 frames) and frames were selected with a likelihood threshold >0.8. This allowed us to resolve the brief laser pulses but inadvertently introduced a difference in spatial scaling. After rescaling, the values give a targeting error MAE now in line with the real-time tracking error  (see corrected Figure 2L). This is approximately 1.3 mm across all locomotion speeds categories. These errors are small and are limited by the spatial resolution of the cameras. We thank the reviewer for noting this discrepancy and prompting us to get to its root cause.

      We have amended the subtitle on Figure 2L as “Ground truth keypoint to laser spot error” and have avoided the use of submillimeter throughout. We have added the following sentence to clarify this point: “As laser targeting relies on real-time tracking to direct the laser to the specified body part, this metric includes any errors introduced by tracking and targeting”.

      As the major advance of this paper is the ability to stimulate animals during ongoing movement, it seems that the Figure 3 experiment misses an opportunity to evaluate state-dependent whole-body reactions to nociceptor activation. How does the behavioral response relate to the animal's activity just prior to stimulation?

      The reviewers suggest analysis of state-dependent responses. In the Figure 3 experiment, mice were stimulated up to five times when stationary. Analysis of whole body reactions in stationary mice has been described in (Schorscher-Petcu et al. eLife, 2021) and doing this here would be redundant, so instead we now analyse the responses of moving mice in Figure 5. This new analysis shows robust state-dependent responses during movement as suggested by the reviewer. We find two behavioral clusters: one that is for faster, direct (coherent) movement and the other that is for slower assessment (incoherent) movement. Stimulation during the former results in robust and consistent slowing and shift towards assessment, whereas stimulation during the former results in a reduction in assessment. We describe and interpret these new data in the Results and Discussion sections and add information in the Methods and Figure legend, as given below. We believe that demonstrating movement statedependence is a valuable addition to the paper and thank the reviewer for suggesting this.

      Given the characterization of full-body responses to activation of TrpV1 sensory neurons in Figure 4 and in the authors' previous work, stimulation of TrpV1 sensory neurons has surprisingly subtle effects as the mice run through the alternating T maze. The authors indicate that the mice are moving quickly and thus that precise targeting is required, but no evidence is shared about the precision of targeting in this context beyond images of four trials. From the characterization in Figure 2, at max speed (reported at 241 +/- 53 mm/s, which is faster than the high speeds in Figure 2), successful targeting occurs less than 50% of the time. Is the initial characterization consistent with the accuracy in this context? To what extent does inaccuracy in targeting contribute to the subtlety of affecting trajectory coherence and speed? Is there a relationship between animal speed and disruption of the trajectory?

      We thank the reviewer for pointing out the discrepancy in the reported maximum speed. We have corrected the error in the main text: the average maximum speed is 142 ± 26 mm/s (four mice).

      The self-paced T-maze alternation task in Figure 5 demonstrates that mice running in a maze can be stimulated using this method. We did not optimize the particular experimental design to assess the hit accuracy, as this was determined in Figure 2. Instead, we optimized for the pulse frequencies, meaning the galvanometers tracked with processed frames but the laser was triggered whether or not the paw was actually targeted. However, even in this case with the system pulsing in the free-run mode, the laser hit rate was 54 ± 6% (mean ± sem, n = 7 mice). We have weakened references to submillimeter as it was only inferred from other experiments and was not directly measured here. We find in this experiment that stimulation in freely moving mice can cause them to briefly halt and evaluate. In the future, we will use experimental designs to more optimally examine learning.

      The reviewer also asks if there is a relationship between speed and disruption of the trajectory. We find that this is the case as described above with our additional analysis.

      Reviewer #3 (Public review):

      Summary:

      To explore the diverse nature of somatosensation, Parkes et al. established and characterized a system for precise cutaneous stimulation of mice as they walk and run in naturalistic settings. This paper provides a framework for real-time body part tracking and targeted optical stimuli with high precision, ensuring reliable and consistent cutaneous stimulation. It can be adapted in somatosensation labs as a general technique to explore somatosensory stimulation and its impact on behavior, enabling rigorous investigation of behaviors that were previously difficult or impossible to study.

      Strengths:

      The authors characterized the closed-loop system to ensure that it is optically precise and can precisely target moving mice. The integration of accurate and consistent optogenetic stimulation of the cutaneous afferents allows systematic investigation of somatosensory subtypes during a variety of naturalistic behaviors. Although this study focused on nociceptors innervating the skin (Trpv1::ChR2 animals), this setup can be extended to other cutaneous sensory neuron subtypes, such as low-threshold mechanoreceptors and pruriceptors. This system can also be adapted for studying more complex behaviors, such as the maze assay and goal-directed movements.

      Weaknesses:

      Although the paper has strengths, its weakness is that some behavioral outputs could be analyzed in more detail to reveal different types of responses to painful cutaneous stimuli. For example, paw withdrawals were detected after optogenetically stimulating the paw (Figures 3E and 3F). Animals exhibit different types of responses to painful stimuli on the hind paw in standard pain assays, such as paw lifting, biting, and flicking, each indicating a different level of pain. Improving the behavioral readouts from body part tracking would greatly strengthen this system by providing deeper insights into the role of somatosensation in naturalistic behaviors. Additionally, if the laser spot size could be reduced to a diameter of 2 mm², it would allow the activation of a smaller number of cutaneous afferents, or even a single one, across different skin types in the paw, such as glabrous or hairy skin.

      We thank the reviewer for highlighting how our system can be combined with improved readouts of coping behavior to provide deeper insights. Optogenetic and infrared cutaneous stimulation are well established generators of coping behaviors (lifting, flicking, licking, biting, guarding). Detection of these behaviors is an active and evolving field with progress being made regularly (e.g. Jones et al., eLife 2020 [PAWS];  Wotton et al., Mol Pain 2020; Zhang et al., Pain 2022; Oswell et al., bioRxiv 2024 [LUPE]; Barkai et al., Cell Reports Methods 2025 [BAREfoot], along with more general tools like Hsu et al., Nature Communications 2021 [B-SOiD]; Luxem et al., Communications Biology 2022 [VAME]; Weinreb et al,. Nature Methods 2024 [Keypoints-MoSeq]). One output of our system is bodypart keypoints, which are the typical input to many of these tools. We will leave the readers and users of the system to decide which tools are appropriate for their experimental designs - the focus of this current manuscript is describing the novel stimulation approach in moving animals.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) It is hard to see how the rig is arranged from the render of Figure 2AB due to the components being black on black. A particularly useful part of Fig2AB is the aerial view in panel B that shows the light paths. I suggest adding the labelling of Figure 2A also to that. The side/rear views could perhaps be deleted, allowing the aerial view to be larger.

      We appreciate this suggestion and have revised Figure 2B to improve the visibility of the optomechanical components. We have enlarged the side and aerial views, removed the rear view, and added further labels to the aerial view.

      (2) MAE - to interpret the 0.54 result, it would be useful to state the arena size in this paragraph.

      Thank you. We have added the arena size in this paragraph and also added scales in the relevant figure (Figure 2).

      (3) "pairwise correlations of R = 0.999 along both x- and y-axes". Is this correlation between hindpaw keypoint and galvo coordinates?

      Yes, we have added the following to clarify: “...between galvanometer coordinates and hind paw keypoints”

      (4) Latency was 84 ms. Is this mainly/entirely the delay between DLC receiving the camera image and outputting key point coordinates?

      Yes, we hope that the additional detail in the Methods and Discussion described above will now clarify the current closed-loop latencies.

      (5) "Mice move at variable speeds": in this sentence, spell out when "speed" refers to mouse and when it refers to hindpaw. Similarly, Fig 2i. The sentence is potentially confusing to general readers (paws stationary although the mouse is moving). Presumably, it's due to gait. I suggest explaining this here.

      The speed values that relate to the mouse body and paws are now clearer in the main text and in the legend for Figure 2I.

      (6) Figure 2k and associated main text. It is not clear what "success/hit rate" means here.

      We have added the following sentence in the main text: “Hit accuracy refers to the percentage of trials in which the laser successfully targeted (‘hit’) the intended hind paw.” and use hit accuracy throughout instead of success rate.

      (7) Figure 2L. All these points are greater than the "average" 0.54 reported in the text. How is this possible?

      The MAE of 0.54 mm refers to the “predicted and actual laser spot locations” (that is, the difference between where the calibration map should place the laser spot and where it actually fell), while Figure 2L MAE values refers to the error between the ground truth keypoint to laser spot (that is, the error between the human-observed paw target and where the laser spot fell). The latter error will include the former error so is expected to be larger. We have clarified this point throughout the text, for example, stating “As laser targeting relies on real-time tracking to direct the laser to the specified body part, this metric inherently accounts for any errors introduced by the tracking and targeting.”. This is also discussed above in response to Reviewer 2.

      (8) "large circular arena". State the size here

      We have added this to the Figure 2 legend.

      (9) Figure 3c-left. Can the contrast between the mouse and floor be increased here?

      We have improved the contrast in this image.

      (10) Figure 5c. It is unclear what C1, C2, etc refers to. Mice?

      Yes, these refer to mice. We have removed reference to these now as they are not needed.

      (11) Discussion. A comment. There is scope for elaborating on the potential for new research by combining it with new methods for measurements of neural activity in freely moving animals in the somatosensory system.

      Thank you. We agree and have added more detail on this in the discussion stating “The system may be combined with existing tools to record neural activity in freely-moving mice, such as fiber photometry, miniscopes, or large-scale electrophysiology, and manipulations of this neural activity, such as optogenetics and chemogenetics. This can allow mechanistic dissection of cell and circuit biology in the context of naturalistic behaviors.”

      Reviewer #3 (Recommendations for the authors):

      (1) Include the number of animals for behavior assays for the panels (e.g., Figures 4G).

      Where missing, we now state the number of animals in panels.

      (2) If representative responses are shown, such as in Figures 3E and 4F, include the average response with standard deviation so readers can appreciate the variation in the responses.

      We appreciate the suggestion to show variability in the responses. We have made several changes to Figures 3 and 4. Specifically, to illustrate the variability across multiple trials more clearly, Figure 3E now shows representative keypoint traces for each body part from two mice during their 5 trials. For Figure 4, we have re-analyzed the thermal stimulation trials and shown a raster plot of keypoint-based local motion energy (Figure 4E) sorted by response latency for hundreds of trials. Figure 4G now presents the cumulative distribution for all trials and animals for thermal (18 wild-type mice, 315 trials) and optogenetic stimulation trials (9 Trpv1::ChR2 mice, 181 trials). We also now provide means ± SD for the key metrics for optogenetic and thermal stimulation trials in Figure 4 in the Results section. This keeps the manuscript focused on the methodological advances while showing the trial variability.

      (3) "optical targeting of freely-moving mice in a large environments" should be "optical targeting of freely-moving mice in a large environment".

      Corrected

      (4) Define fps when you first mention this in the manuscript.

      Added

      (5) Data needs to be shown for the claim "Mice concurrently turned their heads toward the stimulus location while repositioning their bodies away from it".

      We state this observation to qualify that the stimulation of stationary mice resulted in behavioral responses “consistent with previous studies”. It would be redundant to repeat our full analysis and might distract from the novelty of the current manuscript. We have restricted this sentence to make it clearer: “Consistent with previous studies, we observed the whole-body behaviors like head orienting concurrent with local withdrawal (Browne et al., Cell Reports 2017; Blivis et al., eLife, 2017.)”

    1. eLife Assessment

      This compelling work describes how the cell cycle-regulating phosphatase subunit, RepoMan, is regulated by the oxygen-dependent, metabolite-sensing hydroxylase PHD1. The characterisation of how proline hydroxylation alters signalling at the molecular and cellular level provides important evidence to enhance our understanding of how 2-oxoglutarate-dependent dioxygenases influence the cell cycle and mitosis.

    2. Reviewer #1 (Public review):

      Summary:

      The study by Druker et al. shows that siRNA depletion of PHD1, but not PHD2, increases H3T3 phosphorylation in cells arrested in prometaphase. Additionally, the expression of wild-type RepoMan, but not the RepoMan P604A mutant, restored normal H3T3 phosphorylation localization in cells arrested in prometaphase. Furthermore, the study demonstrates that expression of the RepoMan P604A mutant leads to defects in chromosome alignment and segregation, resulting in increased cell death. These data support a role for PHD1-mediated prolyl hydroxylation in controlling progression through mitosis. This occurs, at least in part, by hydroxylating RepoMan at P604, which regulates its interaction with PP2A during chromosome alignment.

      Strengths:

      The data support most of the conclusions made.

      Comments on revisions:

      Actually, I am still concerned that PHD1 binds to RepoMan endogenously and directly. Furthermore, the authors have not yet provided genetic evidence demonstrating that PHD1 controls progression through mitosis by catalyzing the hydroxylation of RepoMan.

    3. Reviewer #2 (Public review):

      Summary:

      This is a concise and interesting article on the role of PHD1-mediated proline hydroxylation of proline residue 604 on RepoMan and its impact on RepoMan-PP1 interactions with phosphatase PP2A-B56 complex leading to dephosphorylation of H3T3 on chromosomes during mitosis. Through biochemical and imaging tools, the authors delineate a key mechanism in regulation of progression of the cell cycle. The experiments performed are conclusive with well-designed controls.

      Strengths:

      The authors have utilized cutting edge imaging and colocalization detection technologies to infer the conclusions in the manuscript.

      Weaknesses:

      Lack of in vitro reconstitution and binding data.

      Comments on revisions:

      Thank you, authors, for providing the statistics and siRNA validations. While I maintain that the manuscript's claims can benefit a lot from the in vitro experiments and that a Pro-Ala mutation may not be a good mimic for Pro-hydroxylation, I understand the authors' reservations and restrictions regarding the new experiments. Despite the lacunae, the manuscript is a good advance for the field.

    4. Reviewer #3 (Public review):

      Summary:

      The manuscript is a comprehensive molecular and cell biological characterisation of the effects of P604 hydroxylation by PHD1 on RepoMan, a regulatory subunit of the PPIgamma complex. Conclusions are generally supported by results. Overall, a timely study that demonstrates the interplay between hydroxylase signalling and the cell cycle. The study extends the scope of prolyl hydroxylase signalling beyond canonical hypoxia pathways, providing a concrete example of hydroxylation regulating PP1 holoenzyme composition and function during mitosis.

      The work would benefit from additional biochemical validation of direct targeting to characterise the specificity and mode of recognition, but this is beyond the scope of the study.

      Strengths:

      Compelling data, characterisation of how P604 hydroxylation induces the interaction between RepoMan and a phosphatase complex, resulting in loading of RepoMan on Chromatin. Knockdown of PHD1 mimics the disruption of the complex and loss of the regulation of the hydroxylation site by PHD1, resulting in mitotic defects.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study by Druker et al. shows that siRNA depletion of PHD1, but not PHD2, increases H3T3 phosphorylation in cells arrested in prometaphase. Additionally, the expression of wild-type RepoMan, but not the RepoMan P604A mutant, restored normal H3T3 phosphorylation localization in cells arrested in prometaphase. Furthermore, the study demonstrates that expression of the RepoMan P604A mutant leads to defects in chromosome alignment and segregation, resulting in increased cell death. These data support a role for PHD1-mediated prolyl hydroxylation in controlling progression through mitosis. This occurs, at least in part, by hydroxylating RepoMan at P604, which regulates its interaction with PP2A during chromosome alignment.

      Strengths:

      The data support most of the conclusions made. However, some issues need to be addressed.

      Weaknesses:

      (1) Although ectopically expressed PHD1 interacts with ectopically expressed RepoMan, there is no evidence that endogenous PHD1 binds to endogenous RepoMan or that PHD1 directly binds to RepoMan.

      We do not fully agree that this comment is accurate - the implication is that we only show interaction between two exogenously expressed proteins, i.e. both exogenous PHD1 and RepoMan, when in fact we show that tagged PHD1 interacts with endogenous RepoMan. The major technical challenge here is the well-known difficulty of detecting endogenous PHD1 in such cell lines. We agree that co-IP studies do not prove that this interaction is direct and never claim to have shown this, though we do feel that a direct interaction is most likely, albeit not proven.

      (2) There is no genetic evidence indicating that PHD1 controls progression through mitosis by catalyzing the hydroxylation of RepoMan.

      We agree that our current study is primarily a biochemical and cell biological study, rather than a genetic study. Nonetheless, similar biochemical and cellular approaches have been widely used and validated in previous studies in mechanisms regulating cell cycle progression and we are confident in the conclusions drawn based on the data obtained so far.

      (3) Data demonstrating the correlation between dynamic changes in RepoMan hydroxylation and H3T3 phosphorylation throughout the cell cycle are needed.

      We agree that it will be very interesting to analyse in more detail the cell cycle dynamics of RepoMan hydroxylation and H3T3 phosphorylation - along with other cell cycle parameters. We view this as outside the scope of our present study and are actively engaged in raising the additional funding needed to pursue such future experiments.

      (4) The authors should provide biochemical evidence of the difference in binding ability between RepoMan WT/PP2A and RepoMan P604A/PP2A.

      Here again we agree that it will be very interesting to analyse in future the detailed binding interactions between wt and mutant RepoMan and other interacting proteins, including PP2A. We show reduced interaction in cells by PLA (Figure 5A) and in biochemical analysis (Figure 5C). More in vitro analysis is, in our view, outside the scope of our present study and we are actively engaged in raising the additional funding needed to pursue such future experiments.

      (5) PHD2 is the primary proline hydroxylase in cells. Why does PHD1, but not PHD2, affect RepoMan hydroxylation and subsequent control of mitotic progression? The authors should discuss this issue further.

      We agree with the main point underpinning this comment, i.e., that there are still many things to be learned concerning the specific roles and mechanisms of the different PHD enzymes in vivo. We address this in the Discussion section and look forward to addressing these questions experimentally in future studies.

      Reviewer #2 (Public review):

      Summary:

      This is a concise and interesting article on the role of PHD1-mediated proline hydroxylation of proline residue 604 on RepoMan and its impact on RepoMan-PP1 interactions with phosphatase PP2A-B56 complex leading to dephosphorylation of H3T3 on chromosomes during mitosis. Through biochemical and imaging tools, the authors delineate a key mechanism in the regulation of the progression of the cell cycle. The experiments performed are conclusive with well-designed controls.

      Strengths:

      The authors have utilized cutting-edge imaging and colocalization detection technologies to infer the conclusions in the manuscript.

      Weaknesses:

      Lack of in vitro reconstitution and binding data.

      We agree that it will be very interesting to pursue in vitro reconstitution studies and detailed binding data. We view this as outside the scope of our present study and are actively engaged in raising the additional funding needed to pursue such future experiments. We do provide in vitro hydroxylation data in our accompanying manuscript by Jiang et al, 2025 Elife.

      Reviewer #3 (Public review):

      Summary:

      The manuscript is a comprehensive molecular and cell biological characterisation of the effects of P604 hydroxylation by PHD1 on RepoMan, a regulatory subunit of the PPIgamma complex. The identification and molecular characterisation of the hydroxylation site have been written up and deposited in BioRxiv in a separate manuscript. I reviewed the data and came to the conclusion that the hydroxylation site has been identified and characterised to a very high standard by LC-MS, in cells and in vitro reactions. I conclude that we should have no question about the validity of the PHD1-mediated hydroxylation. 

      In the context of the presented manuscript, the authors postulate that hydroxylation on P604 by PHD1 leads to the inactivation of the complex, resulting in the retention of pThr3 in H3. 

      Strengths:

      Compelling data, characterisation of how P604 hydroxylation is likely to induce the interaction between RepoMan and a phosphatase complex, resulting in loading of RepoMan on Chromatin. Loss of the regulation of the hydroxylation site by PHD1 results in mitotic defects.

      Weaknesses:

      Reliance on a Proline-Alanine mutation in RepoMan to mimic an unhydroxylatable protein. The mutation will introduce structural alterations, and inhibition or knockdown of PHD1 would be necessary to strengthen the data on how hydroxylates regulate chromatin loading and interactions with B56/PP2A.

      We do not agree that we rely solely on analysis of the single site pro-ala mutant in RepoMan for our conclusions, since we also present a raft of additional experimental evidence, including knock-down data and experiments using both fumarate and FG. We would also reference the data we present on RepoMan in the parallel study by Jiang et al, which has also published in eLife(https://doi.org/10.7554/eLife.108128.1)). Of course, we agree with the reviewer that even although the mutant RepoMan features only a single amino acid change, this could still result in undetermined structural effects on the RepoMan protein that could conceivably contribute, at least in part, to some of the phenotypic effects observed. We now provide evidence in the current revision (new Figure 5D) that reduced interaction between RepoMan and B56gamma/PP2A is also evident when PHD1 is depleted from cells.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) The manuscript can benefit from improved quality of writing and avoidance of grammatical errors.

      We have checked through the manuscript again and corrected any mistakes we have encountered in the Current revision.

      (2) Although the data in the manuscript is compelling, it is difficult to rule out indirect effects in the interactions. Hence, in vitro binding assays with purified proteins are important to validate the findings, along with in vitro reconstitution of phosphatase activity.

      It is possible that cofactors and / or additional PTMs are required to promote these interactions in vivo. We have provided in vitro hydroxylation analysis and the additional experiments suggested will be the subject of follow-on future studies.

      (3) Proline to alanine is a drastic mutation in the amino acid backbone. The authors could purify PHD1 and reconstitute P604 hydroxylation to show if it performs as expected.

      This is likely to be a challenging experiment technically, given that RepoMan is a component of multiple distinct complexes, some of which are dynamic. We did not feel able to address this within the scope of the current study.

      (4) The confocal images showing the overlap of two fluorescent signals need to show some sort of quantification and statistics to prove that the overlap is significant.

      We now provide Pearson correlation measurements for Figure 2A in new Figure 2B in the Current revision.

      (5) Kindly provide a clearer panel for the Western blot of H3T3ph in Figure 3c.

      We have now included a new panel for this Figure in the Current revision.

      (6) Kindly also include the figures for validation of siRNAs used in the study

      We have added this throughout in supplementary figures.

      Reviewer #3 (Recommendations for the authors):

      (1) The authors have shown that PHD1 and RepoMan interact; can the interaction be "trapped" by the addition of DMOG? Generally, hydroxylase substrates can be trapped, which would add an additional layer of confidence that PHD1 and RepoMan form an enzyme-substrate complex. 

      This is something we are planning to do for follow-up studies using the established methods from the von Kriesgheim laboratory.

      (2) How does P604A mutation affect the interaction with PHD1? One would expect a reduction in interaction. 

      Another interesting point we are planning to investigate in the future.

      (3) The effects of expression of the wt and P604A mutant repoman are well-characterised. Could the authors check the effects of overexpressing PHD1 and deadPHD1, inhibition on the mitosis/H3 phosphorylation? My concerns are that a P-A mutation will disrupt the secondary structure, and although it is a good tool, data should be backed up by increasing/decreasing the hydroxylation of RepoMan over the mutation. Repeat some of the most salient experiments where the P604A mutation has been used and modulate the hydP604 by modulating PHD1 activity/expression (such as Chromatin interaction, PLA assay, B56gamma interaction, H3 phosphorylation localisation, Monastrol release, etc.)

      We agree, the PA mutant can potentially affect the protein structure. In our manuscript we have provided pH3 analysis for PHD inhibition using siRNA, FG4592 and Fumarate. In the Current revision ee also data showing that depletion of PHD1 results in a reduction in interaction between RepoMan and B56gamma/PP2A. This is now presented in new figure 5D.

      (4) I also have a general question, as a point of interest, as the interaction between PHD1 and RepoMan appears to be cell cycle dependent, is it possible that the hydroxylation status cycles as well? Could this explain how some sub-stochiometric hydroxylation events observed may be masked by assessing unsynchronised cells in bulk?

      Indeed, a very good question. We believe this is an interesting question for follow up studies. Given our previous publication showing phosphorylation of PHD1 by CDKs alters substrate binding (Ortmann et al, 2016 JCS), this is our current hypothesis.

    1. eLife Assessment

      This important study explores how the phase of neural oscillations in the alpha band affects visual perception, indicating that perceptual performance varies due to changes in sensory precision rather than decision bias. The evidence is solid in its experimental design and analytical approach, although the limited sample size restricts the generalizability of the findings. This work should interest cognitive neuroscientists who study perception and decision-making.

    2. Reviewer #1 (Public review):

      Summary:

      In their paper entitled "Alpha-Band Phase Modulates Perceptual Sensitivity by Changing Internal Noise and Sensory Tuning," Pilipenko et al. investigate how pre-stimulus alpha phase influences near-threshold visual perception. The authors aim to clarify whether alpha phase primarily shifts the criterion, multiplicatively amplifies signals, or changes the effective variance and tuning of sensory evidence. Six observers completed many thousands of trials in a double-pass Gabor-in-noise detection task while an EEG was recorded. The authors combine signal detection theory, phase-resolved analyses, and reverse correlation to test mechanistic predictions. The experimental design and analysis pipeline provide a clear conceptual scaffold, with SDT-based schematic models that make the empirical results accessible even for readers who are not specialists in classification-image methods.

      Strengths:

      The study presents a coherent and well-executed investigation with several notable strengths. First, the main behavioral and EEG results in Figure 2 demonstrate robust pre-stimulus coupling between alpha phase and d′ across a substantial portion of the pre-stimulus interval, with little evidence that the criterion is modulated to a comparable extent. The inverse phasic relationship between hit and false-alarm rates maps clearly onto the variance-reduction account, and the response-consistency analysis offers an intuitive behavioral complement: when two identical stimuli are both presented at the participant's optimal phase, responses are more consistent than when one or both occur at suboptimal phases. The frontal-occipital phase-difference result suggests a coordinated rather than purely local phase mechanism, supporting the central claim that alpha phase is linked to changes in sensitivity that behave like changes in internal variability rather than simple gain or criterion shifts. Supplementary analyses showing that alpha power has only a limited relationship with d′ and confidence reassure readers that the main effects are genuinely phase-linked rather than a recasting of amplitude differences.

      Second, the reverse-correlation results in Figure 3 extend this story in a satisfying way. The classification images and their Gaussian fits show that at the optimal phase, the weighting of stimulus energy is more sharply concentrated around target-relevant spatial frequencies and orientations, and the bootstrapped parameter distributions indicate that the suboptimal phase is best described by broader tuning and a modest change in gain rather than a pure criterion account. The authors' interpretation that optimal-phase perception reflects both reduced effective internal noise and sharpened sensory tuning is reasonable and well-supported. Overall, the data and figures largely achieve the stated aims, and the work is likely to have an impact both by clarifying the interpretation of alpha-phase effects and by illustrating a useful analytic framework that other groups can adopt.

      Weaknesses:

      The weaknesses are limited and relate primarily to framing and presentation rather than to the substance of the work. First, because contrast was titrated to maintain moderate performance (d′ between 1.2 and 1.8), the phase-linked changes in sensitivity appear modest in absolute terms, which could benefit from explicit contextualization. Second, a coding error resulted in unequal numbers of double-pass stimulus pairs across participants, which affects the interpretability of the response-consistency results. Third, several methodological details could be stated more explicitly to enhance transparency, including stimulus timing specifications, electrode selection criteria, and the purpose of phase alignment in group averaging. Finally, some mechanistic interpretations in the Discussion could be phrased more conservatively to clearly distinguish between measurement and inference, particularly regarding the relationship between reduced internal noise and sharpened tuning, and the physiological implementation of the frontal-occipital phase relationship.

    3. Reviewer #2 (Public review):

      Summary:

      The study of Pilipenko et al evaluated the role of alpha phase in a visual perception paradigm using the framework of signal detection theory and reverse correlation. Their findings suggest that phase-related modulations in perception are mediated by a reduction in internal noise and a moderate increase in tuning to relevant features of the stimuli in specific phases of the alpha cycle. Interestingly, the alpha phase did not affect the criterion. Criterion was related to modulations in alpha power, in agreement with previous research.

      Strengths:

      The experiment was carefully designed, and the analytical pipeline is original and suited to answer the research question. The authors frame the research question very well and propose several models that account for the possible mechanisms by which the alpha phase can modulate perception. This study can be very valuable for the ongoing discussion about the role of alpha activity in perception.

      Weaknesses:

      The sample size collected (N = 6) is, in my opinion, too small for the statistical approach adopted (group level). It is well known that small sample sizes result in an increased likelihood of false positives; even in the case of true positives, effect sizes are inflated (Button et al., 2013; Tamar and Orban de Xivry, 2019), negatively affecting the replicability of the effect.

      Although the experimental design allows for an accurate characterization of the effects at the single-subject level, conclusions are drawn from group-level aggregated measures. With only six subjects, the estimation of between-subject variability is not reliable. The authors need to acknowledge that the sample size is too small; therefore, results should be interpreted with caution.

      Conclusion:

      This study addresses an important and timely question and proposes an original and well-thought-out analytical framework to investigate the role of alpha phase in visual perception. While the experimental design and theoretical motivation are strong, the very limited sample size substantially constrains the strength of the conclusions that can be drawn at the group level.

      Bibliography:

      Button, K., Ioannidis, J., Mokrysz, C. et al. Power failure: why small sample size undermines the reliability of neuroscience. Nat Rev Neurosci 14, 365-376 (2013). https://doi.org/10.1038/nrn3475

      Tamar R Makin, Jean-Jacques Orban de Xivry (2019) Science Forum: Ten common statistical mistakes to watch out for when writing or reviewing a manuscript eLife 8:e48175 https://doi.org/10.7554/eLife.48175

    4. Author response:

      We would like to thank the reviewers for their helpful feedback. We appreciate their recognition of many positive features from our study and plan to address the weaknesses with the following set of changes:

      Reviewer #1 rightly points out that the titration of performance throughout the experiment could reduce the overall size of the phasic effect we observed by compressing the overall range of d’. In our revision, we plan to acknowledge the potential consequence of stimulus titration as well as emphasize that the resultant vector length approach we took to quantify phase-behavior coupling is a better reflection of the effect size than the plot of phase-binned d’. Next, we will include language cautioning the certainty of our double-pass statistics since half of our participants had much fewer double-pass trials due to a coding error. Finally, we can gladly clarify methodological details requested and revise the discussions by phrasing several of our interpretations more conservatively: specifically discussing the possibility that the frontal-occipital phase difference could also arise from two counter-phase sources, and including the possibility that sensory noise reduction and sharpened tuning may be two separate mechanisms.

      Reviewer #2 raises concerns about performing group-level statistical analyses on a small sample size. We acknowledge this as a reasonable concern and will include the single-subject effects of our main analysis in the Supplementary Materials as well as discuss that although the sample size is a limitation of our study, there are several justifications for taking a small-n, large-trial approach given our research question. We would also like to highlight that we feel more confident in the reproducibility of our results given the convergence of evidence across multiple measures (phase-d’ coupling, counter-phasic hit and false alarm rates, response consistency, and classification images) which are all pointing towards a consistent interpretation of a phase effect on internal variability.

    1. eLife Assessment

      IL21R, being a key cytokine receptor for shaping the T follicular helper and B cell functions, utilizes two STAT family members, STAT1 and STAT3. The authors utilize the IL21R ENU-induced mutant, together with relevant in vitro and in vivo experiments, to dissect the function of STAT1 and STAT3. The approach by itself sounds reasonable, but the main conclusions are incompletely supported by the data presented in this manuscript.

    2. Reviewer #1 (Public review):

      Summary:

      King and colleagues generated a mouse with a point mutation in IL21R and investigated the influence on IL-21-mediated T and B cell activation and differentiation. They found that mutant mice show a reduced T and B cell response, with CD4 T cell differentiation into T follicular helper cells being primarily affected.

      Strengths:

      The authors combined in vitro and in vivo analysis, including bone-marrow chimeric mice.

      Weaknesses:

      The effect of the IL21R EINS mutant does not specifically affect STAT1, as clearly shown in Figure 1 H, I. Particularly at lower doses of IL21, which may be more relevant in vivo, the effects are very similar. A second key weakness is the very small Tfh response, a not very clear PD-1 and CXCR5 staining to identify Tfh, and a lack of a steady-state (prior to immunisation) comparison of Tfh numbers in the different mouse strains. The latter makes it impossible to know what fraction of the response is antigen-specific.

    3. Reviewer #2 (Public review):

      Summary:

      In the manuscript, "An IL-21R hypomorph circumvents functional redundancy to define STAT1 signaling in germinal center responses," Cecile King and colleagues identify a cytoplasmic site of the IL-21 receptor that differentially regulates STAT1 and STAT3 activation upon IL-21 stimulation. They further examine the immunological consequences of this site-specific alteration on Tfh differentiation and Tfh-dependent humoral immunity, raising important questions about how gene-knockout models may obscure nuanced functional roles of signaling molecules.

      Strengths:

      The study convincingly highlights a non-redundant role for STAT1 downstream of IL-21-IL-21R signaling in the Tfh differentiation pathway. This conclusion is supported by in vitro analyses of STAT1 and STAT3 activation in CD4 T cells stimulated with IL-21 or IL-6; by in vivo assessments of Tfh and germinal center B cell responses in WT and IL21R-EINS mutant mice, including bone-marrow chimera systems; and by investigating the expression of Tfh-related molecules in WT versus IL21R-EINS CD4 T cells.

      Weaknesses:

      Although the experiments were carefully executed with appropriate controls, a key question remains unresolved: whether the Tfh differentiation defect in IL21R-EINS mice is directly attributable to reduced STAT1 activation. Rescue experiments that restore STAT1 signaling in IL21R-EINS TCR-transgenic CD4 T cells would provide strong evidence linking the mutation to impaired STAT1 activation and, consequently, defective Tfh differentiation. Without such evidence, it remains formally possible that additional, uncharacterized mutations introduced during ENU mutagenesis contribute to the phenotypes observed, particularly given the discrepancies between IL21R knockout and IL21R-EINS mutant mice.

    1. eLife Assessment

      This is a useful study presenting solid data indicating that the bacterial GTPases EngA and ObgE enable single-step reconstitution of functional 50S ribosomal subunits under near-physiological conditions. The study elegantly bridges the gap between the non-physiological aspects of the previous two-step reconstitution method and the extract-dependent iSAT system to enable ribosome assembly under translation-compatible conditions; however, it is limited by reliance on rRNA and proteins extracted from native ribosomes and does not achieve a true bottom-up reconstruction from all synthetic components. The evidence is incomplete in not characterizing the spectrum of reporter polypeptides produced and not comparing their rate and yield of synthesis from reconstituted ribosomes to that obtained with pure native ribosomes; and the impact of the study is limited by not including reporters to examine the fidelity of initiation, elongation or termination achieved with the reconstituted ribosomes.

    2. Reviewer #1 (Public review):

      Summary:

      This study presents evidence that the addition of the two GTPases EngA and ObgE to reactions comprised of rRNAs and total ribosomal proteins purified from native bacterial ribosomes can bypass the requirements for non-physiological temperature shifts and Mg<sup>+2</sup> ion concentrations for in vitro reconstitution of functional E. coli ribosomes.

      Strengths:

      This advance allows ribosome reconstitution in a fully reconstituted protein synthesis system containing individually purified recombinant translation factors, with the reconstituted ribosomes substituting for native purified ribosomes to support protein synthesis. This work potentially represents an important development in the long-term effort to produce synthetic cells.

      Weaknesses:

      While much of the evidence is solid, the analysis is incomplete in certain respects that detract from the scientific quality and significance of the findings:

      (1) The authors do not describe how the native ribosomal proteins (RPs) were purified, and it is unclear whether all subassemblies of RPs have been disrupted in the purification procedure. If not, additional chaperones might be required beyond the two GTPases described here for functional ribosome assembly from individual RPs.

      (2) Reconstitution studies in the past have succeeded by using all recombinant, individually purified RPs, which would clearly address the issue in the preceding comment and also eliminate the possibility that an unknown ribosome assembly factor that co-purifies with native ribosomes has been added to the reconstitution reactions along with the RPs.

      (3) They never compared the efficiency of the reconstituted ribosomes to native ribosomes added to the "PURE" in vitro protein synthesis system, making it unclear what proportion of the reconstituted ribosomes are functional, and how protein yield per mRNA molecule compares to that given by the PURE system programmed with purified native ribosomes.

      (4) They also have not examined the synthesized GFP protein by SDS-PAGE to determine what proportion is full-length.

      (5) The previous development of the PURE system included examinations of the synthesis of multiple proteins, one of which was an enzyme whose specific activity could be compared to that of the native enzyme. This would be a significant improvement to the current study. They could also have programmed the translation reactions containing reconstituted ribosomes with (i) total native mRNA and compared the products in SDS-PAGE to those obtained with the control PURE system containing native ribosomes; (ii) with specifc reporter mRNAs designed to examine dependence on a Shine-Dalgarno sequence and the impact of an in-frame stop codon in prematurely terminating translation to assess the fidelity of initiation and termination events; and (iii) an mRNA with a programmed frameshift site to assess elongation fidelity displayed by their reconstituted ribosomes.

    3. Reviewer #2 (Public review):

      This study presents a significant advance in the field of in vitro ribosome assembly by demonstrating that the bacterial GTPases EngA and ObgE enable single-step reconstitution of functional 50S ribosomal subunits under near-physiological conditions-specifically at 37 {degree sign}C and with total Mg²⁺ concentrations below 10 mM.

      This achievement directly addresses a long-standing limitation of the traditional two-step in vitro assembly protocol (Nierhaus & Dohme, PNAS 1974), which requires non-physiological temperatures (44-50 {degree sign}C), and high Mg²⁺ concentrations (~20 mM). Inspired by the integrated Synthesis, Assembly, and Translation (iSAT) platform (Jewett et al., Mol Syst Biol 2013), leveraging E. coli S150 crude extract, which supplies essential assembly factors, the authors hypothesize that specific ribosome biogenesis factors-particularly GTPases present in such extracts-may be responsible for enabling assembly under mild conditions. Through systematic screening, they identify EngA and ObgE as the minimal pair sufficient to replace the need for temperature and Mg²⁺ shifts when using phenol-extracted (i.e., mature, modified) rRNA and purified TP70 proteins.

      However, several important concerns remain:

      (1) Dependence on Native rRNA Limits Generalizability

      The current system relies on rRNA extracted from native ribosomes via phenol, which retains natural post-transcriptional modifications. As the authors note (lines 302-304), attempts to assemble active 50S subunits using in vitro transcribed rRNA, even in the presence of EngA and ObgE, failed. This contrasts with iSAT, where in vitro transcribed rRNA can yield functional (though reduced-activity, ~20% of native) ribosomes, presumably due to the presence of rRNA modification enzymes and additional chaperones in the S150 extract. Thus, while this study successfully isolates two key GTPase factors that mimic part of iSAT's functionality, it does not fully recapitulate iSAT's capacity for de novo assembly from unmodified RNA. The manuscript should clarify that the in vitro assembly demonstrated here is contingent on using native rRNA and does not yet achieve true bottom-up reconstruction from synthetic parts. Moreover, given iSAT's success with transcribed rRNA, could a similar systematic omission approach (e.g., adding individual factors) help identify the additional components required to support unmodified rRNA folding?

      (2) Imprecise Use of "Physiological Mg²⁺ Concentration"

      The abstract states that assembly occurs at "physiological Mg²⁺ concentration" (<10 mM). However, while this total Mg²⁺ level aligns with optimized in vitro translation buffers (e.g., in PURE or iSAT systems), it exceeds estimates of free cytosolic [Mg²⁺] in E. coli (~1-2 mM). The authors should clarify that they refer to total Mg²⁺ concentrations compatible with cell-free protein synthesis, not necessarily intracellular free ion levels, to avoid misleading readers about true physiological relevance.

      In summary, this work elegantly bridges the gap between the two-step method and the extract-dependent iSAT system by identifying two defined GTPases that capture a core functionality of cellular extracts: enabling ribosome assembly under translation-compatible conditions. However, the reliance on native rRNA underscores that additional factors - likely present in iSAT's S150 extract - are still needed for full de novo reconstitution from unmodified transcripts. Future work combining the precision of this defined system with the completeness of iSAT may ultimately realize truly autonomous synthetic ribosome biogenesis.

    4. Author response

      Public Reviews:

      Reviewer #1 (Public review):

      This study presents evidence that the addition of the two GTPases EngA and ObgE to reactions comprised of rRNAs and total ribosomal proteins purified from native bacterial ribosomes can bypass the requirements for non-physiological temperature shifts and Mg<sup>+2</sup> ion concentrations for in vitro reconstitution of functional E. coli ribosomes.

      Strengths:

      This advance allows ribosome reconstitution in a fully reconstituted protein synthesis system containing individually purified recombinant translation factors, with the reconstituted ribosomes substituting for native purified ribosomes to support protein synthesis. This work potentially represents an important development in the long-term effort to produce synthetic cells.

      Weaknesses:

      While much of the evidence is solid, the analysis is incomplete in certain respects that detract from the scientific quality and significance of the findings:

      (1) The authors do not describe how the native ribosomal proteins (RPs) were purified, and it is unclear whether all subassemblies of RPs have been disrupted in the purification procedure. If not, additional chaperones might be required beyond the two GTPases described here for functional ribosome assembly from individual RPs.

      Native ribosomal proteins (RPs) were prepared from native ribosomes, according to the well-established protocol described by Dr. Knud H. Nierhaus [Nierhaus, K. H. Reconstitution of ribosomes in Ribosomes and protein synthesis: A Practical Approach (Spedding G. eds.) 161-189, IRL Press at Oxford University Press, New York (1990)]. In this method, ribosome proteins are subjected to dialysis in 6 M urea buffer, a strong denaturing condition that may completely disrupt ribosomal structure and dissociate all ribosomal protein subassemblies. To make this point clear, we will describe the ribosomal protein (RP) preparation procedure in the manuscript, rather than merely referring to the book.

      In addition, we would like to clarify one point related to this comment. The focus of the present study is to show that the presence of two factors is required for single-step ribosome reconstitution under translation-compatible, cell-free conditions. We do not intend to claim that these two factors are absolutely sufficient for ribosome reconstitution. Hence, we will revise the manuscript to more explicitly state what this work does and does not conclude.

      (2) Reconstitution studies in the past have succeeded by using all recombinant, individually purified RPs, which would clearly address the issue in the preceding comment and also eliminate the possibility that an unknown ribosome assembly factor that co-purifies with native ribosomes has been added to the reconstitution reactions along with the RPs.

      As noted in the response to the Comment (1), the focus of the present study is the requirement of the two factors for functional ribosome assembly. Therefore, we consider that it is not necessary to completely exclude the possibility that unknown ribosome assembly factors are present in the RP preparation. Nevertheless, we agree that it is important to clarify what factors, if any, are co-present in the RP fraction. To address this, we plan to add proteomic analysis results of the TP70 preparation.

      We also agree that additional, as-yet-unidentified components, including factors involved in rRNA modification, could plausibly further improve assembly efficiency. We will explicitly note this possibility in the Discussion.

      Finally, extending the system to the use of in vitro-transcribed rRNA and fully recombinant ribosomal proteins could be essentially a next step of this study, and we are currently exploring these directions in our laboratory. However, we consider them beyond the scope of the present study and will provide them as future perspectives of this study in the Discussion.

      (3) They never compared the efficiency of the reconstituted ribosomes to native ribosomes added to the "PURE" in vitro protein synthesis system, making it unclear what proportion of the reconstituted ribosomes are functional, and how protein yield per mRNA molecule compares to that given by the PURE system programmed with purified native ribosomes.

      We consider that it is feasible to estimate the GFP synthesis rate from the increase in fluorescence over time under conditions where the template mRNA is in excess, and to compare this rate directly between reconstituted and native ribosomes. We will therefore consider performing this experiment. This comparison should provide insight into what fraction of ribosomes reconstituted in our system are functionally active.

      By contrast, quantifying protein yield per mRNA molecule is substantially more challenging. The translation system is complex, and the apparent yield per mRNA can vary depending on factors such as differences in polysome formation efficiency. In addition, the PURE system is a coupled transcription–translation setup that starts from DNA templates, which further complicates rigorous normalization on a per-mRNA basis. Because the main focus of this study is to determine how many functionally active ribosomes can be reconstituted under translation-compatible conditions, we plan to address this comment by carrying out the former experiment.

      (4) They also have not examined the synthesized GFP protein by SDS-PAGE to determine what proportion is full-length.

      Because we can add an affinity tag to the GFP reporter, it should be feasible to selectively purify the synthesized protein from the reaction mixture and analyze it by SDS–PAGE. We therefore plan to perform this experiment.

      (5) The previous development of the PURE system included examinations of the synthesis of multiple proteins, one of which was an enzyme whose specific activity could be compared to that of the native enzyme. This would be a significant improvement to the current study. They could also have programmed the translation reactions containing reconstituted ribosomes with (i) total native mRNA and compared the products in SDS-PAGE to those obtained with the control PURE system containing native ribosomes; (ii) with specifc reporter mRNAs designed to examine dependence on a Shine-Dalgarno sequence and the impact of an in-frame stop codon in prematurely terminating translation to assess the fidelity of initiation and termination events; and (iii) an mRNA with a programmed frameshift site to assess elongation fidelity displayed by their reconstituted ribosomes.

      Following the recommendation, we plan to test the synthesis of at least one additional protein with enzymatic activity, in addition to GFP, so that the activity of the translated product can be assessed.

      We agree that comparing translation products using total mRNA, testing dependence on the Shine–Dalgarno sequence, and performing dedicated assays to evaluate initiation/elongation/termination fidelity are all attractive and valuable studies. However, we consider these to be beyond the scope of the present manuscript. We will therefore describe them explicitly as future directions in the Discussion.

      At the same time, we anticipate that mass spectrometric (MS) analysis of GFP and the enzyme product(s) that we attempt to synthesize could partially address concerns related to product integrity (e.g., truncations) and, to some extent, translational fidelity. We therefore plan to carry out MS analysis of these translated products.

      Reviewer #2 (Public review):

      This study presents a significant advance in the field of in vitro ribosome assembly by demonstrating that the bacterial GTPases EngA and ObgE enable single-step reconstitution of functional 50S ribosomal subunits under near-physiological conditions-specifically at 37 {degree sign}C and with total Mg²⁺ concentrations below 10 mM.

      This achievement directly addresses a long-standing limitation of the traditional two-step in vitro assembly protocol (Nierhaus & Dohme, PNAS 1974), which requires non-physiological temperatures (44-50 {degree sign}C), and high Mg²⁺ concentrations (~20 mM). Inspired by the integrated Synthesis, Assembly, and Translation (iSAT) platform (Jewett et al., Mol Syst Biol 2013), leveraging E. coli S150 crude extract, which supplies essential assembly factors, the authors hypothesize that specific ribosome biogenesis factors-particularly GTPases present in such extracts-may be responsible for enabling assembly under mild conditions. Through systematic screening, they identify EngA and ObgE as the minimal pair sufficient to replace the need for temperature and Mg²⁺ shifts when using phenol-extracted (i.e., mature, modified) rRNA and purified TP70 proteins.

      However, several important concerns remain:

      (1) Dependence on Native rRNA Limits Generalizability

      The current system relies on rRNA extracted from native ribosomes via phenol, which retains natural post-transcriptional modifications. As the authors note (lines 302-304), attempts to assemble active 50S subunits using in vitro transcribed rRNA, even in the presence of EngA and ObgE, failed. This contrasts with iSAT, where in vitro transcribed rRNA can yield functional (though reduced-activity, ~20% of native) ribosomes, presumably due to the presence of rRNA modification enzymes and additional chaperones in the S150 extract. Thus, while this study successfully isolates two key GTPase factors that mimic part of iSAT's functionality, it does not fully recapitulate iSAT's capacity for de novo assembly from unmodified RNA. The manuscript should clarify that the in vitro assembly demonstrated here is contingent on using native rRNA and does not yet achieve true bottom-up reconstruction from synthetic parts. Moreover, given iSAT's success with transcribed rRNA, could a similar systematic omission approach (e.g., adding individual factors) help identify the additional components required to support unmodified rRNA folding?

      We fully recognize the reviewer’s point that our current system has not yet achieved a true bottom-up reconstruction. Although we intended to state this clearly in the manuscript, the fact that this concern remains indicates that our description was not sufficiently explicit. We will therefore revisit the organization and wording of the manuscript and revise it to ensure that this limitation is clearly communicated to readers.

      (2) Imprecise Use of "Physiological Mg²⁺ Concentration"

      The abstract states that assembly occurs at "physiological Mg²⁺ concentration" (<10 mM). However, while this total Mg²⁺ level aligns with optimized in vitro translation buffers (e.g., in PURE or iSAT systems), it exceeds estimates of free cytosolic [Mg²⁺] in E. coli (~1-2 mM). The authors should clarify that they refer to total Mg²⁺ concentrations compatible with cell-free protein synthesis, not necessarily intracellular free ion levels, to avoid misleading readers about true physiological relevance.

      We agree that this is a very reasonable point. We will therefore revise the manuscript to clarify that we are referring to the total Mg²⁺ concentration compatible with cell-free protein synthesis, rather than the intracellular free Mg²⁺ level under physiological conditions.

      In summary, this work elegantly bridges the gap between the two-step method and the extract-dependent iSAT system by identifying two defined GTPases that capture a core functionality of cellular extracts: enabling ribosome assembly under translation-compatible conditions. However, the reliance on native rRNA underscores that additional factors - likely present in iSAT's S150 extract - are still needed for full de novo reconstitution from unmodified transcripts. Future work combining the precision of this defined system with the completeness of iSAT may ultimately realize truly autonomous synthetic ribosome biogenesis.

    1. eLife Assessment

      This important study reports characterisation of hepatocyte molecular pathways affected by a glycyrrhizin derivative in both in vivo and in vitro mouse models of alcohol-associated liver disease. The authors show convincing evidence indicating that IPP delta isomerase 1 (Idi1) is an intermediate in these pharmacological effects, via the binding of the glycyrrhizin derivative to an upstream regulator of Idi1, HSD11B1, although some more quantitative analyses and better organisation of data would strengthen the study. The findings would be of interest to immunologists and pharmacologists interested in liver inflammation and its amelioration.

    2. Reviewer #1 (Public review):

      Summary:

      In this article by Xiao et al., the authors aimed to identify the precise targets by which magnesium isoglycyrrhizinate (MgIG) functions to improve liver injury in response to ethanol treatment. The authors found through a series of in vivo and molecular approaches that MgIG treatment attenuates alcohol-induced liver injury through a potential SREBP2-IdI1 axis. This manuscript adds to a previous set of literature showing MgIG improves liver function across a variety of etiologies, and also provides mechanistic insight into its mechanism of action.

      Strengths:

      (1) The authors use a combination of approaches from both in-vivo mouse models to in-vitro approaches with AML12 hepatocytes to support the notion that MgIG does improve liver function in response to ethanol treatment.

      (2) The authors use both knockdown and overexpression approaches, in vivo and in vitro, to support most of the claims provided.

      (3) Identification of HSD11B1 as the protein target of MgIG, as well as confirmation of direct protein-protein interactions between HSD11B1/SREBP2/IDI1, is novel.

      Weaknesses:

      Major weaknesses can be classified into 3 groups:

      (1) The results do not support some claims made.

      (2) Qualitative analyses of some of the lipid measures, as opposed to more quantitative analyses.

      (3) There are no appropriate readouts of Srebp2 translocation and/or activity.

      More specific comments:

      (1) A few of the claims made are not supported by the references provided. For instance, line 76 states MgIG has hepatoprotective properties and improved liver function, but the reference provided is in the context of myocardial fibrosis.

      (2) MgIG is clinically used for the treatment of liver inflammatory disease in China and Japan. In the first line of the abstract, the authors noted that MgIG is clinically approved for ALD. In which countries is MgIG approved for clinical utility in this space?

      (3) Serum TGs are not an indicator of liver function. Alterations in serum TGs can occur despite changes in liver function.

      (4) There are discrepancies in the results section and the figure legends. For example, line 302 states Idil is upregulated in alcohol fed mice relative to the control group. The figure legend states that the comparison for Figure 2A is that of ALD+MgIG and ALD only.

      (5) Oil Red O staining provided does not appear to be consistent with the quantification in Figure 1D. ORO is nonspecific and can be highly subjective. The representative image in Figure 1C appears to have a much greater than 30% ORO (+) area.

      (6) The connection between Idil expression in response to EtOH/PA treatment in AML12 cells with viability and apoptosis isn't entirely clear. MgIG treatment completely reduces Idi1 expression in response to EtOH/PA, but only moderate changes, at best, are observed in viability and apoptosis. This suggests the primary mechanism related to MgIG treatment may not be via Idi1.

      (7) The nile red stained images also do not appear representative with its quantification. Several claims about more or less lipid accumulation across these studies are not supported by clear differences in nile red.

      (8) The authors make a comment that Hsd11b1 expression is quite low in AML12 cells. So why did the authors choose to knockdown Hsd11b1 in this model?

      (9) Line 380 - the claim that MGIG weakens the interaction between HSD11b1 and SREBP2 cannot be made solely based on one Western blot.

      (10) It's not clear what the numbers represent on top of the Western blots. Are these averages over the course of three independent experiments?

      (11) The claim in line 382 that knockdown of Hsd11b1 resulted in accumulation of pSREBP2 is not supported by the data provided in Figure 6D.

      (12) None of the images provided in Figure 6E support the claims stated in the results. Activation of SREBP2 leads to nuclear translocation and subsequent induction of genes involved in cholesterol biosynthesis and uptake. Manipulation of Hsd11b1 via OE or KD does not show any nuclear localization with DAPI.

      (13) The entire manuscript is focused on this axis of MgIG-Hsd11b1-Srebp2, but no Srebp2 transcriptional targets are ever measured.

      (14) Acc1 and Scd1 are Srebp1 targets, not Srebp2.

      (15) A major weakness of this manuscript is the lack of studies providing quantitative assessments of Srebp2 activation and true liver lipid measurements.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors investigated magnesium isoglycyrrhizinate (MgIG)'s hepatoprotective actions in chronic-binge alcohol-associated liver disease (ALD) mouse models and ethanol/palmitic acid-challenged AML-12 hepatocytes. They found that MgIG markedly attenuated alcohol-induced liver injury, evidenced by ameliorated histological damage, reduced hepatic steatosis, and normalized liver-to-body weight ratios. RNA sequencing identified isopentenyl diphosphate delta isomerase 1 (IDI1) as a key downstream effector. Hepatocyte-specific genetic manipulations confirmed that MgIG modulates the SREBP2-IDI1 axis. The mechanistic studies suggested that MgIG could directly target HSD11B1 and modulate the HSD11B1-SREBP2-IDI1 axis to attenuate ALD. This manuscript is of interest to the research field of ALD.

      Strengths:

      The authors have performed both in vivo and in vitro studies to demonstrate the action of magnesium isoglycyrrhizinate on hepatocytes and an animal model of alcohol-associated liver disease.

      Weaknesses:

      The data were not well-organised, and the paper needs proofreading again, with a focus on the use of scientific language throughout.

      Here are several comments:

      (1) In Supplemental Figure 1A, all the treatment arms (A-control, MgIG-25 mg/kg, MgIG-50 mg/kg) showed body weight loss compared to the untreated controls. However, Figure 1E showed body weight gain in the treatment arms (A-control and MgIG-25 mg/kg), why? In Supplemental Figure 1A, the mice with MgIG (25 mg/kg) showed the lowest body weight, compared to either A-control or MgIG (50 mg/kg) treatment. Can the authors explain why MgIG (25 mg/kg) causes bodyweight loss more than MgIG (50 mg/kg)? What about the other parameters (ALT, ALS, NAS, etc.) for the mice with MgIG (50 mg/kg)?

      (2) IL-6 is a key pro-inflammatory cytokine significantly involved in ALD, acting as a marker of ALD severity. Can the authors explain why MgIG 1.0 mg/ml shows higher IL-6 gene expression than MgIG (0.1-0.5 mg/ml)? Same question for the mRNA levels of lipid metabolic enzymes Acc1 and Scd1.

      (3) For the qPCR results of Hsd11b1 knockdown (siRNA) and Hsd11b1 overexpression (plasmid) in AML-12 cells (Figure 5B), what is the description for the gene expression level (Y axis)? Fold changes versus GAPDH? Hsd11b1 overexpression showed non-efficiency (20-23, units on Y axis), even lower than the Hsd11b1 knockdown (above 50, units on Y axis). The authors need to explain this. For the plasmid-based Hsd11b1 overexpression, why does the scramble control inhibit Hsd11b1 gene expression (less than 2, units on the Y axis)? Again, this needs to be explained.

    4. Author response:

      Thank you for your letter and for the constructive feedback from the reviewers on our manuscript (eLife-RP-RA-2025-109174). We appreciate the time and expertise you and the reviewers have dedicated to improving our work.

      We have carefully considered all comments and have developed a comprehensive revision plan. To address the primary concerns, we will conduct several new experiments designed to provide robust support for our key conclusions. Other points will be addressed through textual revisions, including the addition of existing ADMET data and an expanded discussion section.

      We are confident that these revisions will fully satisfy the reviewers' concerns and significantly strengthen the manuscript. Our detailed experimental plan and point-by-point responses are provided below.

      (1) Addressing "Qualitative analyses of some of the lipid measures, as opposed to more quantitative analyses"

      Supplementary experiments and analyses

      We will add the assessment of hepatic triglyceride and total cholesterol levels in liver tissues from control, experimental, and drug-treated mice, thereby providing further quantitative validation.

      (2) Addressing "SREBP2"

      Supplementary experiments and analyses

      We will include a luciferase assay to determine whether alcohol plus PA induces SREBP2 activation in AML-12 cells.

      As suggested, we will assess the expression levels of SREBP2 downstream target genes (Hmgcr, Hmgcs, Ldlr, and Lcn2) in both in vitro and in vivo models.

      (3) Timeline and process arrangement of supplementary experiments

      To comprehensively address these issues, we plan to purchase the following required reagents and have formulated the following experimental plan:

      Author response table 1.

      Given the time required for reagent acquisition and the execution of these in vitro and in vivo experiments, we kindly request an extension of the revision deadline by 8 weeks. This will ensure the comprehensive and high-quality completion of all necessary studies.

      We will fully commit to delivering a thoroughly revised manuscript that robustly addresses all reviewer comments and aligns with the high standards of eLife. We greatly appreciate your guidance and flexibility.

    1. eLife Assessment

      This important study examines how mismatched light and temperature cycles shape Drosophila locomotor timing and temperature-dependent timeless splicing, and leverages long-term early/late selection lines to probe evolutionary plasticity. The strength of evidence is incomplete at present, mainly because startle/masking under step cues could confound the behavioural readouts, and tim's involvement remains correlative. The authors should address masking in the behaviour analyses and provide causal support for tim's role.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript addresses an important question: how do circadian clocks adjust to a complex rhythmic environment with multiple daily rhythms? The focus is on the temperature and light cycles (TC and LD) and their phase relationship. In nature, TC usually lags the LD cycle, but the phase delay can vary depending on seasonal and daily weather conditions. The authors present evidence that circadian behavior adjusts to different TC/LD phase relationships, that temperature-sensitive tim splicing patterns might underlie some of these responses, and that artificial selection for preferential evening or morning eclosion behavior impacts how flies respond to different LD/TC phase relationship

      Strength:

      Experiments are conducted on control strains and strains that have been selected in the laboratory for preferential morning or evening eclosion phenotypes. This study is thus quite unique as it allows us to probe whether this artificial selection impacted how animals respond to different environmental conditions, and thus gives hints on how evolution might shape circadian oscillators and their entrainment. The authors focused on circadian locomotor behavior and timeless (tim) splicing because warm and cold-specific transcripts have been described as playing an important role in determining temperature-dependent circadian behavior. Not surprisingly, the results are complex, but there are interesting observations. In particular, the "late" strain appears to be able to adjust more efficiently its evening peak in response to changes in the phase relationship between temperature and light cycles, but the morning peak seems less responsive in this strain. Differences in the circadian pattern of expression of different tim mRNA isoforms are found under specific LD/TC conditions.

      Weaknesses:

      These observations are interesting, but in the absence of specific genetic manipulations, it is difficult to establish a causative link between tim molecular phenotypes and behavior. The study is thus quite descriptive. It would be worth testing available tim splicing mutants, or mutants for regulators of tim splicing, to understand in more detail and more directly how tim splicing determines behavioral adaptation to different phase relationships between temperature and light cycles. Also, I wonder whether polymorphisms in or around tim splicing sites, or in tim splicing regulators, were selected in the early or late strains.

      I also have a major methodological concern. The authors studied how the evening and morning phases are adjusted under different conditions and different strains. They divided the daily cycle into 12h morning and 12h evening periods, and calculated the phase of morning and evening activity using circular statistics. However, the non-circadian "startle" responses to light or temperature transitions should have a very important impact on phase calculation, and thus at least partially obscure actual circadian morning and evening peak phase changes. Moreover, the timing of the temperature-up startle drifts with the temperature cycles, and will even shift from the morning to the evening portion of the divided daily cycle. Its amplitude also varies as a function of the LD/TC phase relationship. Note that the startle responses and their changes under different conditions will also affect SSD quantifications.

      For the circadian phase, these issues seem, for example, quite obvious for the morning peak in Figure 1. According to the phase quantification on panel D, there is essentially no change in the morning phase when the temperature cycle is shifted by 6 hours compared to the LD cycle, but the behavior trace on panel B clearly shows a phase advance of morning anticipation. Comparison between the graphs on panels C and D also indicates that there are methodological caveats, as they do not correlate well.

      Because of the various masking effects, phase quantification under entrainment is a thorny problem in Drosophila. I would suggest testing other measurements of anticipatory behavior to complement or perhaps supersede the current behavior analysis. For example, the authors could employ the anticipatory index used in many previous studies, measure the onset of morning or evening activity, or, if more reliable, the time at which 50% of anticipatory activity is reached. Termination of activity could also be considered. Interestingly, it seems there are clear effects on evening activity termination in Figure 3. All these methods will be impacted by startle responses under specific LD/TC phase relationships, but their combination might prove informative.

    3. Reviewer #2 (Public review):

      Summary:

      The authors aimed to dissect the plasticity of circadian outputs by combining evolutionary biology with chronobiology. By utilizing Drosophila strains selected for "Late" and "Early" adult emergence, they sought to investigate whether selection for developmental timing co-evolves with plasticity in daily locomotor activity. Specifically, they examined how these diverse lines respond to complex, desynchronized environmental cues (temperature and light cycles) and investigated the molecular role of the splicing factor Psi and timeless isoforms in mediating this plasticity.

      Major strengths and weaknesses:

      The primary strength of this work is the novel utilization of long-term selection lines to address fundamental questions about how organisms cope with complex environmental cues. The behavioral data are compelling, clearly demonstrating that "Late" and "Early" flies possess distinct capabilities to track temperature cycles when they are desynchronized from light cycles.

      However, a significant weakness lies in the causal links proposed between the molecular findings and these behavioral phenotypes. The molecular insights (Figures 2, 4, 5, and 6) rely on mRNA extracted from whole heads. As head tissue is dominated by photoreceptor cells and glia rather than the specific pacemaker neurons (LNv, LNd) driving these behaviors, this approach introduces a confound. Differential splicing observed here may reflect the state of the compound eye rather than the central clock circuit, a distinction highlighted by recent studies (e.g., Ma et al., PNAS 2023).

      Furthermore, while the authors report that Psi mRNA loses rhythmicity under out-of-sync conditions, this correlation does not definitively prove that Psi oscillation is required for the observed splicing patterns or behavioral plasticity. The amplitude of the reported Psi rhythm is also low (~1.5 fold) and variable, raising questions about its functional significance in the absence of manipulation experiments (such as constitutive expression) to test causality.

      Appraisal of aims and conclusions:

      The authors successfully demonstrate the co-evolution of emergence timing and activity plasticity, achieving their aim on the behavioral level. However, the conclusion that the specific molecular mechanism involves the loss of Psi rhythmicity driving timeless splicing changes is not yet fully supported by the data. The current evidence is correlative, and without spatial resolution (specific clock neurons) or causal manipulation, the mechanistic model remains speculative.

      This study is likely to be of significant interest to the chronobiology and evolutionary biology communities as it highlights the "enhanced plasticity" of circadian clocks as an adaptive trait. The findings suggest that plasticity to phase lags - common in nature where temperature often lags light - may be a key evolutionary adaptation. Addressing the mechanistic gaps would significantly increase the utility of these findings for understanding the molecular basis of circadian plasticity.

    4. Reviewer #3 (Public review):

      Summary:

      This study attempts to mimic in the laboratory changing seasonal phase relationships between light and temperature and determine their effects on Drosophila circadian locomotor behavior and on the underlying splicing patterns of a canonical clock gene, timeless. The results are then extended to strains that have been selected over many years for early or late circadian phase phenotypes.

      Strengths:

      A lot of work, and some results showing that the phasing of behavioral and molecular phenotypes is slightly altered in the predicted directions in the selected strains.

      Weaknesses:

      The experimental conditions are extremely artificial, with immediate light and temperature transitions compared to the gradual changes observed in nature. Studies in the wild have shown how the laboratory reveals artifacts that are not observed in nature. The behavioral and molecular effects are very small, and some of the graphs and second-order analyses of the main effects appear contradictory. Consequently, the Discussion is very speculative as it is based on such small laboratory effects

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript addresses an important question: how do circadian clocks adjust to a complex rhythmic environment with multiple daily rhythms? The focus is on the temperature and light cycles (TC and LD) and their phase relationship. In nature, TC usually lags the LD cycle, but the phase delay can vary depending on seasonal and daily weather conditions. The authors present evidence that circadian behavior adjusts to different TC/LD phase relationships, that temperature-sensitive tim splicing patterns might underlie some of these responses, and that artificial selection for preferential evening or morning eclosion behavior impacts how flies respond to different LD/TC phase relationship

      Strength:

      Experiments are conducted on control strains and strains that have been selected in the laboratory for preferential morning or evening eclosion phenotypes. This study is thus quite unique as it allows us to probe whether this artificial selection impacted how animals respond to different environmental conditions, and thus gives hints on how evolution might shape circadian oscillators and their entrainment. The authors focused on circadian locomotor behavior and timeless (tim) splicing because warm and cold-specific transcripts have been described as playing an important role in determining temperature-dependent circadian behavior. Not surprisingly, the results are complex, but there are interesting observations. In particular, the "late" strain appears to be able to adjust more efficiently its evening peak in response to changes in the phase relationship between temperature and light cycles, but the morning peak seems less responsive in this strain. Differences in the circadian pattern of expression of different tim mRNA isoforms are found under specific LD/TC conditions.

      We sincerely thank the reviewer for this generous assessment and for recognizing several key strengths of our study. We are particularly gratified that the reviewer values our use of long-term laboratory-selected chronotype lines (350+ generations), which provide a unique evolutionary perspective on how artificial selection reshapes circadian responses to complex LD/TC phase relationships—precisely our core research question.

      Weaknesses:

      These observations are interesting, but in the absence of specific genetic manipulations, it is difficult to establish a causative link between tim molecular phenotypes and behavior. The study is thus quite descriptive. It would be worth testing available tim splicing mutants, or mutants for regulators of tim splicing, to understand in more detail and more directly how tim splicing determines behavioral adaptation to different phase relationships between temperature and light cycles. Also, I wonder whether polymorphisms in or around tim splicing sites, or in tim splicing regulators, were selected in the early or late strains.

      We thank the reviewer for this insightful comment. We agree that our current data do not establish a direct causal link between tim splicing (or Psi) and behaviour, and we appreciate that some of our wording (e.g. “linking circadian gene splicing to behavioural plasticity” or describing tim splicing as a “pivotal node”) may have suggested unintended causal links. In the revision, we will (i) explicitly state in the Abstract, Introduction, and early Discussion that the main aim was to test whether selection for timing of eclosion is accompanied by correlated evolution of temperature‑dependent tim splicing patterns and evening activity plasticity under complex LD/TC regimes, and (ii) consistently describe the molecular findings as correlational and hypothesis‑generating rather than causal. We will also add phrases throughout the text to point the reader more clearly to existing passages where we already emphasize “correlated evolution” and explicitly label our mechanistic ideas as “we speculate” / “we hypothesize” and as future experiments.

      We fully agree that studies using tim splicing mutants or manipulations of splicing regulators under in‑sync and out‑of‑sync LD/TC regimes will be essential to ascertain what role tim variants play under such environmental conditions, and we will highlight this as a key future direction. At the same time, we emphasize that the long‑term selection lines provide a complementary perspective to classical mutant analyses by revealing how behavioural and molecular phenotypes can exhibit correlated evolution under a specific, chronobiologically relevant selection pressure (timing of emergence).

      Finally, we appreciate the suggestion regarding polymorphisms. Whole‑genome analyses of these lines in a PhD thesis from our group (Ghosh, 2022, unpublished, doctoral dissertation) reveal significant SNPs in intronic regions of timeless in both Early and Late populations, as well as SNPs in CG7879, a gene implicated in alternative mRNA splicing, in the Late line. Because these analyses are ongoing and not yet peer‑reviewed, we do not present them as main results.

      I also have a major methodological concern. The authors studied how the evening and morning phases are adjusted under different conditions and different strains. They divided the daily cycle into 12h morning and 12h evening periods, and calculated the phase of morning and evening activity using circular statistics. However, the non-circadian "startle" responses to light or temperature transitions should have a very important impact on phase calculation, and thus at least partially obscure actual circadian morning and evening peak phase changes. Moreover, the timing of the temperature-up startle drifts with the temperature cycles, and will even shift from the morning to the evening portion of the divided daily cycle. Its amplitude also varies as a function of the LD/TC phase relationship. Note that the startle responses and their changes under different conditions will also affect SSD quantifications.

      We thank the reviewer for this perceptive methodological concern, which we had anticipated and systematically quantified but had not included in the original submission. The reviewer is absolutely correct that non-circadian startle responses to zeitgeber transitions could confound both circular phase (CoM) calculations and SSD quantifications, particularly as TC drift creates shifting startle locations across morning/evening windows.

      We will be including startle response quantification (previously conducted but unpublished) as new a Supplementary figure, systematically measuring SSD in 1-hour windows immediately following each of the four environmental transitions (lights-ON, lights-OFF, temperature rise and temperature fall) across all six LDTC regimes (2-12hr TC-LD lags) for all 12 selection lines (early<sub>1-4</sub>, control<sub>1-4</sub>, late<sub>1-4</sub>).

      Author response image 1.

      Startle responses in selection lines under LDTC regimes: SSD calculated to assess startle response to each of the transitions (1-hour window after the transition used for calculations). Error bars are 95% Tukey’s confidence intervals for the main effect of selection in a two-factor ANOVA design with block as a random factor. Non-overlapping error bars indicate significant differences among the values. SSD values between in-sync and out-of-sync regimes for a range of phase relationships between LD and TC cycles (A) LDTC 2-hr, (B) LDTC 4-hr, (C) LDTC 6-hr, (D) LDTC 8-hr, (E) LDTC 10-hr, (F) LDTC 12-hr.

      Key findings directly addressing the reviewer's concerns:

      (1) Morning phase advances in LDTC 8-12hr regimes are explained by quantified nocturnal startle activity around temperature rise transitions occurring within morning windows. Critically, these startles show no selection line differences, confirming they represent equivalent non-circadian confounds across lines.

      (2) Early selection lines exhibit significantly heightened startle responses specifically to temperature rise in LDTC 4hr and 6hr regimes (early > control ≥ late), demonstrating that startle responses themselves exhibit correlated evolution with emergence timing—an important novel finding that strengthens our evolutionary story.

      (3) Startle responses differed among selection lines only for the temperature rise transition under two of the regimes used, LDTC 4 hr and 6 hr regimes. Under LDTC 4 hr, temperature rise transition falls in the morning window and despite early having significantly greater startle than late, the overall morning SSD (over 12 hours morning window) did not differ significantly among the selection lines for this regime. Thus, eliminating the startle window would make the selection lines more similar to one another. On the other hand, under LDTC 6 hour regime, the startle response to temperature rise falls in the evening 12 hour window. In this case too, early showed higher startle than control and late. A higher startle in early would thus, contribute to the observed differences among selection lines. We agree with the reviewer that eliminating this startle peak would lead to a clearer interpretation of the change in circadian evening activity.

      We deliberately preserved all behavioural data without filtering out startle windows since it would require arbitrary cutoffs like 1 hr, 2 hr or 3 hours post transitions or until the startle peaks declines in different selection lines under different regimes. In the revised version, we will add complementary analyses excluding the startle windows to obtain mean phase and SSD values which are unaffected by the startle responses.

      For the circadian phase, these issues seem, for example, quite obvious for the morning peak in Figure 1. According to the phase quantification on panel D, there is essentially no change in the morning phase when the temperature cycle is shifted by 6 hours compared to the LD cycle, but the behavior trace on panel B clearly shows a phase advance of morning anticipation. Comparison between the graphs on panels C and D also indicates that there are methodological caveats, as they do not correlate well.

      Because of the various masking effects, phase quantification under entrainment is a thorny problem in Drosophila. I would suggest testing other measurements of anticipatory behavior to complement or perhaps supersede the current behavior analysis. For example, the authors could employ the anticipatory index used in many previous studies, measure the onset of morning or evening activity, or, if more reliable, the time at which 50% of anticipatory activity is reached. Termination of activity could also be considered. Interestingly, it seems there are clear effects on evening activity termination in Figure 3. All these methods will be impacted by startle responses under specific LD/TC phase relationships, but their combination might prove informative.

      We agree that phase quantification under entrained conditions in Drosophila is challenging and that anticipatory indices, onset/offset measures, and T50 metrics each have particular strengths and weaknesses. In designing our analysis, we chose to avoid metrics that require arbitrary or subjective criteria (e.g. defining activity thresholds or durations for anticipation, or visually marking onset/offset), because these can substantially affect the estimated phase and reduce comparability across regimes and genotypes. Instead, we used two fully quantitative, parameter-free measures applied to the entire waveform within defined windows: (i) SSD to capture waveform change in shape/amplitude and (ii) circular mean phase of activity (CoM) restricted to the 12 h morning and 12 h evening windows. By integrating over the entire window, these measures are less sensitive to the exact choice of threshold and to short-lived, high-amplitude startles at transitions, and they treat all bins within the window in a consistent, reproducible way across all LDTC regimes and lines. Panels C (SSD) and D (CoM) are intentionally complementary, not redundant: SSD reflects how much the waveform changes in shape and amplitude, whereas CoM reflects the timing of the center of mass of activity. Under conditions where masking alters amplitude and introduces short-lived bouts without a major shift of the main peak, it is expected that SSD and CoM will not correlate linearly across regimes.

      We will be including a detailed calculation of how CoM is obtained in our methods for the revised version.  

      Reviewer #2 (Public review):

      Summary:

      The authors aimed to dissect the plasticity of circadian outputs by combining evolutionary biology with chronobiology. By utilizing Drosophila strains selected for "Late" and "Early" adult emergence, they sought to investigate whether selection for developmental timing co-evolves with plasticity in daily locomotor activity. Specifically, they examined how these diverse lines respond to complex, desynchronized environmental cues (temperature and light cycles) and investigated the molecular role of the splicing factor Psi and timeless isoforms in mediating this plasticity.

      Major strengths and weaknesses:

      The primary strength of this work is the novel utilization of long-term selection lines to address fundamental questions about how organisms cope with complex environmental cues. The behavioral data are compelling, clearly demonstrating that "Late" and "Early" flies possess distinct capabilities to track temperature cycles when they are desynchronized from light cycles.

      We sincerely thank the reviewer for this enthusiastic recognition of our study's core strengths. We are particularly gratified that the reviewer highlights our novel use of long-term selection lines (350+ generations) as the primary strength, enabling us to address fundamental evolutionary questions about circadian plasticity under complex environmental cues. We thank them for identifying our behavioral data as compelling (Figs 1, 3), which robustly demonstrate selection-driven divergence in temperature cycle tracking.

      However, a significant weakness lies in the causal links proposed between the molecular findings and these behavioral phenotypes. The molecular insights (Figures 2, 4, 5, and 6) rely on mRNA extracted from whole heads. As head tissue is dominated by photoreceptor cells and glia rather than the specific pacemaker neurons (LNv, LNd) driving these behaviors, this approach introduces a confound. Differential splicing observed here may reflect the state of the compound eye rather than the central clock circuit, a distinction highlighted by recent studies (e.g., Ma et al., PNAS 2023).

      We thank the reviewer for highlighting this important methodological consideration. We fully agree that whole-head extracts do not provide spatial resolution to distinguish central pacemaker neurons (~100-200 total) from compound eyes and glia, and that cell-type-specific profiling represents the critical next experimental step. As mentioned in our response to Reviewer 1, we appreciate the issue with our phrasing and will be revising it accordingly to more clearly describe that we do not claim any causal connections between expression of the tim splice variants in particular circadian neurons and their contribution of the phenotype observed.

      We chose whole-head extracts for practical reasons aligned with our study's specific goals:

      (1) Fly numbers: Our artificially selected populations are maintained at large numbers (~1000s per line). Whole-head extracts enabled sampling ~150 flies per time point = ~600 flies per genotype per environmental, providing means to faithfully sample the variation that may exist in such randomly mating populations.

      (2) Established method for characterizing splicing patterns: The majority of temperature-dependent period/timeless splicing studies have successfully used whole-head extracts (Majercak et al., 1999; Shakhmantsir et al., 2018; Martin Anduaga et al., 2019) to characterize splicing dynamics under novel conditions.

      (3) Novel environmental regimes: Our primary molecular contribution was documenting timeless splicing patterns under previously untested LDTC phase relationships (TC 2-12hr lags relative to LD) and testing whether these exhibit selection-dependent differences consistent with behavioral divergence.

      Furthermore, while the authors report that Psi mRNA loses rhythmicity under out-of-sync conditions, this correlation does not definitively prove that Psi oscillation is required for the observed splicing patterns or behavioral plasticity. The amplitude of the reported Psi rhythm is also low (~1.5 fold) and variable, raising questions about its functional significance in the absence of manipulation experiments (such as constitutive expression) to test causality.

      We thank the reviewer for this insightful comment and appreciate that our phrasing has been misleading. We will especially pay attention to this issue, raised by two reviewers, and clearly highlight our results as correlated evolution and hypothesis-generating.

      We appreciate the reviewer highlighting these points and would like to draw attention to the following points in our Discussion section:

      “Psi and levels of tim-cold and tim-sc (Foley et al., 2019). We observe that this correlation is most clearly upheld under temperature cycles wherein tim-medium and Psi peak in-phase while the cold-induced transcripts start rising when Psi falls (Figure 8A1&2). Under LDTC in-sync conditions this relationship is weaker, even though Psi is rhythmic, potentially due to light-modulated factors influencing timeless splicing (Figure 8B1&2). This is in line with Psi’s established role in regulating activity phasing under TC 12:12 but not LD 12:12 (Foley et al., 2019). This is also supported by the fact that while tim-medium and tim-cold are rhythmic under LD 12:12 (Shakhmantsir et al., 2018), Psi is not (datasets from Kuintzle et al., 2017; Rodriguez et al., 2013). Assuming this to be true across genetic backgrounds and sexes and combined with our similar findings for these three transcripts under LDTC out-of-sync (Figure 2B3, D3&E3), we speculate that Psi rhythmicity may not be essential for tim-medium or tim-cold rhythmicity especially under conditions wherein light cycles are present along with temperature cycles (Figure 8C1&2). Our study opens avenues for future experiments manipulating PSI expression under varying light-temperature regimes to dissect its precise regulatory interactions. We hypothesize that flies with Psi knocked down in the clock neurons should exhibit a less pronounced shift of the evening activity under the range LDTC out-of-sync conditions for which activity is assayed in our study. On the other hand, its overexpression should cause larger delays in response to delayed temperature cycles due to the increased levels of tim-medium translating into delay in TIM protein accumulation.”

      Appraisal of aims and conclusions:

      The authors successfully demonstrate the co-evolution of emergence timing and activity plasticity, achieving their aim on the behavioral level. However, the conclusion that the specific molecular mechanism involves the loss of Psi rhythmicity driving timeless splicing changes is not yet fully supported by the data. The current evidence is correlative, and without spatial resolution (specific clock neurons) or causal manipulation, the mechanistic model remains speculative.

      This study is likely to be of significant interest to the chronobiology and evolutionary biology communities as it highlights the "enhanced plasticity" of circadian clocks as an adaptive trait. The findings suggest that plasticity to phase lags - common in nature where temperature often lags light - may be a key evolutionary adaptation. Addressing the mechanistic gaps would significantly increase the utility of these findings for understanding the molecular basis of circadian plasticity.

      Thank you for this thoughtful appraisal affirming our successful demonstration of co-evolution between emergence timing and circadian activity plasticity.

      Reviewer #3 (Public review):

      Summary:

      This study attempts to mimic in the laboratory changing seasonal phase relationships between light and temperature and determine their effects on Drosophila circadian locomotor behavior and on the underlying splicing patterns of a canonical clock gene, timeless. The results are then extended to strains that have been selected over many years for early or late circadian phase phenotypes.

      Strengths:

      A lot of work, and some results showing that the phasing of behavioural and molecular phenotypes is slightly altered in the predicted directions in the selected strains.

      We thank the reviewer for acknowledging the substantial experimental effort across 7 environmental regimes (6 LDTC phase relationships + LDTC in-phase), 12 replicate populations (early<sub>1-4</sub>, control<sub>1-4</sub>, late<sub>1-4</sub>), and comprehensive behavioural + molecular phenotyping.

      Weaknesses:

      The experimental conditions are extremely artificial, with immediate light and temperature transitions compared to the gradual changes observed in nature. Studies in the wild have shown how the laboratory reveals artifacts that are not observed in nature. The behavioural and molecular effects are very small, and some of the graphs and second-order analyses of the main effects appear contradictory. Consequently, the Discussion is very speculative as it is based on such small laboratory effects.

      We thank the reviewer for these important points regarding ecological validity, effect sizes, and interpretation scope.

      (1) Behavioural effects are robust across population replicates in selection lines (not small/weak)

      Our study assayed 12  populations total (4 replicate populations each of early, control, and late selection lines) under 7 LDTC regimes. Critically, selection effects were consistent across all 4 replicate populations within each selection line for every condition tested. In these randomly mating large populations, the mixed model ANOVA reveals highly significant selection×regime interactions [F(5,45)=4.1, p=0.003; Fig 3E, Table S2], demonstrating strong, replicated evolutionary divergence in evening temperature sensitivity.

      (2) Molecular effects test critical evolutionary hypothesis

      As stated in our Introduction, "selection can shape circadian gene splicing and temperature responsiveness" (Low et al., 2008, 2012). Our laboratory-selected chronotype populations—known to exhibit evolved temperature responsiveness (Abhilash et al., 2019, 2020; Nikhil et al., 2014; Vaze et al., 2012)—provide an apt system to test whether selection for temporal niche leads to divergence in timeless splicing. With ~600 heads per environmental regime per selection line, we detect statistically robust, selection line-specific temporal profiles [early4 advanced timeless phase (Fig 4A4); late4 prolonged tim-cold (Fig 5A4); significant regime×selection×time interactions (Tables S3-S5)], providing initial robust evidence of correlated molecular evolution under novel LDTC regimes.

      (3) Systematic design fills critical field gap

      Artificial conditions like LD/DD have been useful in revealing fundamental zeitgeber principles. Our systematic 2-12hr TC-LD lags directly implement Pittendrigh & Bruce (1959) + Oda & Friesen (2011) validated design, which discuss how such experimental designs can provide a more comprehensive understanding of zeitgeber integration compared to studies with only one phase jump between two zeitgebers.

      (4) Ramping regimes as essential next step

      Gradual ramping regimes better mimic nature and represent critical future experiments. New Discussion addition in the revised version: "Ramping LDTC regimes can test whether selection-specific zeitgeber hierarchy persists under naturalistic gradients." While ramping experiments are essential, we would like to emphasize that we aimed to use this experimental design as a tool to test if evening activity exhibits greater temperature sensitivity and if this property of the circadian system can undergo correlated evolution upon selection for timing of eclosion/emergence.

      (5) New startle quantification addresses masking

      Our startle quantification (which will be added as a new supplementary figure) confirms circadian evening tracking persists despite quantified, selection-independent masking in most of the regimes.

    1. eLife Assessment

      The study presents a valuable resource of proline hydroxylation proteins for molecular biology studies in oxygen-sensing and cell signaling with the characterization of Repo-man proline hydroxylation site. The evidence supporting the claim of the authors is solid, although further clarification of the overall efficiency of the HILIC analysis, the specificity/sensitivity of immonium ion analysis, as well as quantification of proline hydroxylation identifications will be helpful. The work will be of interest to researchers studying post-translational modification, oxygen sensing, and cell signaling.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript by Hao Jiang et al described a systematic approach to identify proline hydroxylation proteins. The authors implemented a proteomic strategy with HILIC-chromatographic separation and reported an identification with high confidence of 4993 sites from HEK293 cells (4 replicates) and 3247 sites from RCC4 cells with 1412 sites overlapping between the two cell lines. A small fraction of about 200 sites from each cell line were identified with HyPro immonium ion. The authors investigated the conditions and challenges of using HyPro immonium ions as a diagnostic tool. The study focused the validation analysis of Repo-man (CDCA2) proline hydroxylation comparing MS/MS spectra, retention time and diagnostic ions of purified proteins with corresponding synthetic peptides. Using SILAC analysis and recombinant enzyme assay, the study evaluated Repo-man HyPro604 as a target of PHD1 enzyme.

      Strengths:

      The study involved extensive LCMS runs for in-depth characterization of proline hydroxylation proteins including four replicated analysis of 293 cells and three replicated analysis of RCC4 cells with 32 HILIC fractions in each analysis. The identification of over 4000 confident proline hydroxylation sites from the two cell lines would be a valuable resource for the community. The characterization of Repo-man proline hydroxylation is a novel finding.

      Weaknesses:

      As a study mainly focused on methodology, there are some potential technical weaknesses discussed below.

      (1) The study applied HILIC-based chromatographic separation with a goal to enrich and separate hydroxyproline containing peptides. The separation effects for peptides from 293 cells and RCC4 cells seems somewhat different (Figure 2A and Figure S1A), which may indicate that the application efficiency of the strategy may be cell line dependent.

      (2) The study evaluated the HyPro immonium ion as a diagnostic ion for HyPro identification showcasing multiple influential factors and potential challenges. It is important to note that with only around 5% of the identifications had HyPro immonium ion, it would be very challenging to implement this strategy in a global LCMS analysis to either validate or invalidate HyPro identifications. In comparison, acetyllysine immonium ion was previously reported to be a useful marker for acetyllysine peptides (PMID: 18338905) and the strategy offered a sensitivity of 70% with a specificity of 98%.

      (3) The authors aimed to identify potential PHD targets by comparing the HyPro proteins identified with or without PHD inhibitor FG-4592 treatment. The workflow followed a classification strategy, rather than a typical quantitative proteomics approach for comprehensive analysis.

      (4) The authors performed inhibitor treatment and in vitro PHD1 enzyme assay to validate that Repo-man can be hydroxylated by PHD1. It remains unknown if PHD1 expression in cells is sufficient to stimulate Repo-man hydroxylation.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Jiang et al. developed a robust workflow for identifying proline hydroxylation sites in proteins. They identified proline hydroxylation sites in HEK293 and RCC4 cells, respectively. The authors found that the more hydrophilic HILIC fractions were enriched in peptides containing hydroxylated proline residues. These peptides showed differences in charge and mass distribution compared to unmodified or oxidized peptides. The intensity of the diagnostic hydroxyproline iminium ion depended on parameters including MS collision energy, parent peptide concentration, and the sequence of amino acids adjacent to the modified proline residue. Additionally, they demonstrate that a combination of retention time in LC and optimized MS parameter settings reliably identifies proline hydroxylation sites in peptides, even when multiple proline residues are present

      Strengths:

      Overall, the manuscript presents an advanced, standardized protocol for identifying proline hydroxylation. The experiments were well designed, and the developed protocol is straightforward, which may help resolve confusion in the field.

      Comments on revisions:

      All of my concerns have been resolved by the authors. It is ready for publication.

    4. Reviewer #3 (Public review):

      Summary:

      The authors present a new method for detecting and identifying proline hydroxylation sites within the proteome. This tool utilizes traditional LC-MS technology with optimized parameters, combined with HILIC-based separation techniques. The authors show that they pick up known hydroxy-proline sites and also validate a new site discovered through their pipeline.

      Strengths:

      The manuscript utilizes state-of-the-art mass spectrometric techniques with optimized collision parameters to ensure proper detection of the immonium ions, which is an advance compared to other similar approaches before. The use of synthetic control peptides on the HILIC separation step clearly demonstrates the ability of the method to reliably distinguish hydroxy-proline from oxidized methionine - containing peptides. Using this method, they identify a site on CDCA2, which they go on to validate in vitro and also study its role in regulation of mitotic progression in an associated manuscript.

      Weaknesses:

      Despite the authors claim about the specificity of this method in picking up the intended peptides, there is a good amount of potential false positives that also happen to get picked (owing to the limitations of MS-based readout), and the authors' criteria for downstream filtering of such peptides requires further clarification. In the same vein, greater and more diverse cell-based validation approach will be helpful to substantiate the claims regarding enrichment of peptides in the described pathway analyses. Experiments must show reproducibility and contain appropriate controls wherever necessary.

      Comments on revisions:

      I thank the authors for their clarifications and opinions on my questions and suggestions. Based on the response, the following points are important while considering the significance of this manuscript:

      - The manuscript provides a novel method to detect and identify proline hydroxylation residues in the proteome. While this provides several advances over previous methods, the probability of false positives, loss of true positives and incomplete removal of the interference of methionine oxidation in this strategy need to be addressed clearly in the discussion section of the manuscript, so that the strengths and limitations of this method are made aware to the reader.

      - Going by the standards of publication in eLife, reproducibility is very important in the experiments done. Hence, I strongly recommend that the authors perform the experiments in triplicate with error bars to confirm reproducibility. Graphs with single data points do not convey that, and this is very important for eLife.

      - As for Figure 9C, the authors have rejected the request for a control lane in the figure. It may sound trivial to the authors, but for completeness of the experiment, all applicable controls must be performed and shown alongside the main data. It is essential to show the PHD1 only control to rule out the possibility of the contribution of any non-specific signal in the dot blot by PHD1.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript by Hao Jiang et al described a systematic approach to identify proline hydroxylation proteins. The authors implemented a proteomic strategy with HILIC-chromatographic separation and reported an identification of 4993 sites from HEK293 cells (4 replicates) and 3247 sites from RCC4 sites (3 replicates) with 1412 sites overlapping between the two cell lines. From the analysis, the authors identified 225 sites and 184 sites respectively from 293 and RCC4 cells with HyPro diagnostic ion. The identifications were validated by analyzing a few synthetic peptides, with a specific focus on Repo-man (CDCA2) through comparing MS/MS spectra, retention time, and diagnostic ions. With SILAC analysis and recombinant enzyme assay, the study showed that Repo-man HyPro604 is a target of the PHD1 enzyme.

      Strengths:

      The study involved extensive LC-MS analysis and was carefully implemented. The identification of over 4000 confident proline hydroxylation sites would be a valuable resource for the community. The characterization of Repo-man proline hydroxylation is a novel finding.

      Weaknesses:

      However, as a study mainly focused on methodology, the findings from the experimental data did not convincingly demonstrate the sensitivity and specificity of the workflow for site-specific identification of proline hydroxylation in global studies.

      Proline hydroxylation is an enzymatic post translational protein modification, catalysed by prolyl Hydroxylases (PHDs), which can have profound biological significance, e.g. altering protein half-life and/or the stability of protein-protein interactions. Furthermore, there has been controversy in the field as to the true number of protein targets for PHDs in cells. Thus, there is a clear need for methods to enable the robust identification of genuine PHD targets and to reliably map sites of PHD-catalysed proline hydroxylation in proteins. We believe, therefore, that our methodology, as reported here in Jiang et al., is an important contribution towards this goal. We note that our methodology has already been used successfully by others

      (https://doi.org/10.1016/j.mcpro.2025.100969). While further improvements in this methodology may of course be developed in future, we are not currently aware of any superior methods that have been reported previously in the literature. The criticism made by the reviewer notably does not include reference to any such alternative published methodology that interested researchers can use which would offer superior results to the approach we document in this study.

      Major concerns:

      (1) The study applied HILIC-based chromatographic separation with a goal of enriching and separating hydroxyproline-containing peptides. However, as the authors mentioned, such an approach is not specific to proline hydroxylation. In addition, many other chromatography techniques can achieve deep proteome fractionation such as high pH reverse phase fractionation, strong-cation exchange etc. There was no data in this study to demonstrate that the strategy offered improved coverage of proline hydroxylation proteins, as the identifications of the HyPro sites could be achieved through deep fractionation and a highly sensitive LCMS setup. The data of Figure 2A and S1A were somewhat confusing without a clear explanation of the heat map representations. 

      The data we present in this study demonstrate clearly that peptides with hydroxylated prolines are enriched in specific HILIC fractions (F10-F18), in comparison with total unfractionated peptides derived from cell extracts. We also refer the reviewer to our previously published study by Bensaddek et al (International Journal of Mass Spectrometry: doi:10.1016/j.ijms.2015.07.029), which was reference 41 in this study, in which we compared directly the performance of both HILIC and strong anionic exchange chromatography, (hSAX). This showed that HILIC provided superior enrichment to hSAX for enrichment of peptides containing hydroxylated proline residues. To clarify this point for readers, we have now included a specific reference to our previous study at the start of the Results section in our current revision. Currently, we use HILIC to provide a degree of enrichment for proline hydroxylated peptides because we are not aware of alternative chromatographic methods that in our hands provide better results.

      We have included descriptions of the information shown in the heatmaps in the associated figure legends and captions.

      (2) The study reported that the HyPro immonium ion is a diagnostic ion for HyPro identification. However, the data showed that only around 5% of the identifications had such a diagnostic ion. In comparison, acetyl-lysine immonium ion was previously reported to be a useful marker for acetyllysine peptides (PMID: 18338905), and the strategy offered a sensitivity of 70% with a specificity of 98%. In this study, the sensitivity of HyPro immonium ion was quite low. The authors also clearly demonstrated that the presence of immonium ion varied significantly due to MS settings, peptide sequence, and abundance. With further complications from L/I immonium ions, it became very challenging to implement this strategy in a global LC-MS analysis to either validate or invalidate HyPro identifications.

      The reviewer appears to have misunderstood the point we make with regard to the identification of the immonium ion and its use as a diagnostic marker for proline hydroxylation in MS analyses. We do not claim that this immonium ion is an essential diagnostic marker for proline hydroxylation. As the reviewer notes, with respect to the acetyl-lysine modification, the corresponding immonium ion is often used in MS studies as a diagnostic for identification of specific post translational modifications. Previous studies have reported that the immonium ion for hydroxylated proline is detected when the transcription factor HIF is analysed, but is often absent with other putative PHD targets, which has been used as an argument that these targets are not genuine proline hydroxylation sites. We are not, therefore, introducing the idea in this study that the hydroxy-proline immonium ion is a required diagnostic marker for proline hydroxylation, but instead demonstrating that detection of this ion, at least in some peptide sequences, may require the use of higher MS collision energies than are typically required for routine peptide identification. We believe that this is an interesting observation that can help to clear up discussions in the literature regarding the true prevalence of PHD-catalysed proline hydroxylation in different target proteins. Our data suggest that, in future MS studies analysing suspected PHD target proteins, two different collision energy might need to be used, i.e., normal collision energy for the routine identification of a peptide, combined with use of a higher collision energy if the hydroxy-proline immonium ion was not already detected.

      (3) The study aimed to apply the HILIC-based proteomics workflow to identify HyPro proteins regulated by the PHD enzyme. However, the quantification strategy was not rigorous. The study just considered the HyPro proteins not identified by FG-4592 treatment as potential PHD targeted proteins. There are a few issues. First, such an analysis was not quantitative without reproducibility or statistical analysis. Second, it did not take into consideration that data-dependent LC-MS analysis was not comprehensive and some peptide ions may not be identified due to background interferences. Lastly, FG-4592 treatment for 24 hrs could lead to wide changes in gene expressions and protein abundances. Therefore, it is not informative to draw conclusions based on the data for bioinformatic analysis.

      We refer the reviewer to the data we present in this study using SILAC analysis, combined with our MS workflow. to achieve a more accurate quantitative picture of proline hydroxylation levels. While we agree that the point the reviewer makes is valid, regarding our data dependent LC-MS/MS analysis potentially not being comprehensive, this means, however, that we are potentially underestimating the true prevalence of proline hydroxylated peptides, not overestimating the level of these modified peptides. We also refer the reviewer to the accompanying study by Druker et al., (eLife 2025; doi.org/10.7554/eLife.108131.1)  in which we present a detailed follow-on study demonstrating the functional significance of the novel proline hydroxylation site we detected in the protein RepoMan (CDCA2). Therefore, even if we have not achieved a fully comprehensive analysis of all proline hydroxylated peptides catalysed by PHD enzymes, we believe that we have advanced the field by documenting a workflow that is able to identify and validate novel PHD targets.

      (4) The authors performed an in vitro PHD1 enzyme assay to validate that Repo-man can be hydroxylated by PHD1. However, Figure 9 did not show quantitatively PHD1-induced increase in Repo-man HyPro abundance and it is difficult to assess its reaction efficiency to compare with HIF1a HyPro.

      The analysis shown in Figure 9 was not intended to quantify the efficiency of in vitro hydroxylation of RepoMan by PHD1, but rather to answer the question, ‘Can recombinant PHD1 alone hydroxylate P604 on RepoMan in vitro, yes or no?’. The data show that the answer here is ‘yes’. Clearly, the HIF peptide is a more efficient substrate in vitro for recombinant PHD1 than the RepoMan peptide and we have now included a statement in the Discussion that addresses the significance of this observation more directly.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Jiang et al. developed a robust workflow for identifying proline hydroxylation sites in proteins. They identified proline hydroxylation sites in HEK293 and RCC4 cells, respectively. The authors found that the more hydrophilic HILIC fractions were enriched in peptides containing hydroxylated proline residues. These peptides showed differences in charge and mass distribution compared to unmodified or oxidized peptides. The intensity of the diagnostic hydroxyproline iminium ion depended on parameters including MS collision energy, parent peptide concentration, and the sequence of amino acids adjacent to the modified proline residue. Additionally, they demonstrate that a combination of retention time in LC and optimized MS parameter settings reliably identifies proline hydroxylation sites in peptides, even when multiple proline residues are present.

      Strengths:

      Overall, the manuscript presents an advanced, standardized protocol for identifying proline hydroxylation. The experiments were well designed, and the developed protocol is straightforward, which may help resolve confusion in the field.

      Weaknesses:

      (1) The authors should provide a summary of the standard protocol for identifying proline hydroxylation sites in proteins that can easily be followed by others.

      This is a good suggestion and we have now included a figure (Figure 10) with a summary of our workflow in the current revision.

      (2) Cockman et al. proposed that HIF-α is the only physiologically relevant target for PHDs. Their approach is considered the gold standard for identifying PHD targets. Therefore, the authors should discuss the major progress they made in this manuscript that challenges Cockman's conclusion.

      While we had mentioned the Cockman et al., paper in the Introduction, we had not focussed on this somewhat controversial issue. However, in response to the Reviewer’s request, we have now added a comment in the Discussion section in the current revision of how our new data address the proposal discussed previously by Cockman et al. In brief, we believe that our findings are not consistent with a model in which PHDs have no protein targets other than HIFs.

      Reviewer #3 (Public review): 

      Summary:

      The authors present a new method for detecting and identifying proline hydroxylation sites within the proteome. This tool utilizes traditional LC-MS technology with optimized parameters, combined with HILIC-based separation techniques. The authors show that they pick up known hydroxy-proline sites and also validate a new site discovered through their pipeline.

      Strengths:

      The manuscript utilizes state-of-the-art mass spectrometric techniques with optimized collision parameters to ensure proper detection of the immonium ions, which is an advance compared to other similar approaches before. The use of synthetic control peptides on the HILIC separation step clearly demonstrates the ability of the method to reliably distinguish hydroxy-proline from oxidized methionine - containing peptides. Using this method, they identify a site on CDCA2, which they go on to validate in vitro and also study its role in regulation of mitotic progression in an associated manuscript.

      Weaknesses:

      Despite the authors' claim about the specificity of this method in picking up the intended peptides, there is a good amount of potential false positives that also happen to get picked (owing to the limitations of MS-based readout), and the authors' criteria for downstream filtering of such peptides require further clarification. In the same vein, greater and more diverse cell-based validation approach will be helpful to substantiate the claims regarding enrichment of peptides in the described pathway analyses.

      We of course agree that false positives may arise, as is true for essentially all PTM studies. There are two issues here; first, are identified sites technically correct? (i.e. not misidentifications from the MS data) and second, are the identified modifications of biological significance? We have addressed this using the popular MaxQuant software suite to evaluate the modifications identified and to control the false discovery rate (FDR) at both the precursor and protein level, as described in the manuscript. We are aware that false positives could arise from confusing oxidation of methionine with hydroxylation of proline. Therefore, to address the issue as to whether we could identify bona fide PHD protein targets outside of the HIF family, we adopted a conservative approach by simply filtering out peptides where there was a methionine residue within three amino acids of the predicted proline hydroxylation site. This was a pragmatic decision made to reduce the likelihood of false positives in our dataset and we recognise that this likely results in us overlooking some genuine proline hydroxylation sites that occur nearby methionine residues. To address the potential biological relevance of the proline hydroxylation sites identified, we analysed extracts from cells treated with FG inhibitors. Of course a detailed understanding of biological significance relies upon follow-on experimental analyses for each site, which we have performed for P604 on RepoMan in accompanying study by Druker et al., (eLife 2025; doi.org/10.7554/eLife.108131.1).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The finding that the immonium ion intensities of L/I did not increase with increasing collision energy was surprising. Was this specific to this synthetic peptide?

      We agree this is an interesting and unexpected finding. We have no reason to believe that it is specific to synthetic peptides per se, but rather think this reflects an effect of amino acid composition in the peptides analysed. It will be interesting to explore this phenomenon in more detail in future.

      (2) The sequence logos in Figure 4 seemed to lack any amino acid enrichment in most positions except for collagen peptides. Have these findings been tested with statistical analysis?

      The results we show for sequence logo analysis were generated using WebLogo (10.1101/gr.849004) and correspond to an analysis of all proline hydroxylated peptides we detected across all cell lines and replicates analysed. The fact that collagens are highly abundant proteins with very high levels of proline hydroxylation likely explains why collagen peptides dominated the outcome of the sequence logo analysis. There is clearly scope for more detailed follow up analysis in future of the sequence specificity of proline hydroxylation sites in no- collagen proteins that are validated PHD targets.

      (3) Overall figure quality was not ideal. The resolution and font sizes of figures should be carefully evaluated and adjusted. The figure legend should contain a title for the figure. Annotations of the figures were somewhat confusing. 

      We agree with the criticism of the figure resolution in the review copies - the lower resolution appears to have been generated after we had uploaded higher resolution original images. We are providing again higher resolution versions of all figures for the current revision.

      Reviewer #3 (Recommendations for the authors):

      Certain concerns regarding portions of the manuscript that need addressing:

      (1) " These data show that two different cell lines show unique profiles of proteins with hydroxylated peptides." - It is difficult to conclusively say this statement after profiling the prolyl hydroxy proteome from just two cell lines, especially since the amino acids with the highest frequency in the most enriched peptides are similar in both cell lines.

      We agree with this point and have changed the current revision to state instead, “This shows that each of the two cell lines analysed have distinct profiles.”

      (2) "We noted that there was a high frequency of a methionine residues appearing either at the first, second, or even third positions after the HyPro site.." - according to the authors, claim, the advantage of their method was that they were able to overcome the limitation of older methods that couldn't separate methionine oxidation from proline hydroxylation. However, in this statement, they say that the high frequency of methionine residues may be because of the similar mass shift. These statements are contradictory. The authors should either tone down the claim or prove that those are indeed hydroxyproline sites. Is it possible that in the filtering step of excluding these high-frequency of methionine - containing peptides, we are losing potential positive hits for hydroxy-proline sites? What is the authors' take on this?

      We respectfully do not agree that our, “statements are contradictory”, with respect to the potential confusion between identification of methionine oxidation and proline hydroxylation, but acknowledge that we have not explained this issue clearly enough. It is a fact that the similar mass shift resulting from proline hydroxylation and methionine oxidation is a technical challenge that can potentially lead to misidentifications in MS studies and that is what we state clearly in the manuscript. We have addressed this issue head on experimentally in this study and show using synthetic peptides how detailed analysis of specific proline hydroxylation sites in target proteins can be distinguished from methionine oxidation, based upon differential chromatographic behaviour of peptides with either hydroxylated proline or oxidised methionine, as well as by detailed analysis of fragmentation spectra. However, in the case of our global analysis, as we were not able to perform synthetic peptide comparisons for every putative site identified, we took the pragmatic approach of filtering out examples of peptides where a methionine residue was present within three residues of a potential proline hydroxylation site. This was done simply to reduce the possibility of misidentification in the set of novel proline hydroxylated peptides identified and we accept that as a consequence we are likely filtering out peptides that include bona fide proline hydroxylation sites. We have clarified this point in the current revision and hope to be able to address this issue more comprehensively in future studies.

      (3) "Accordingly, a score cut-off of 40 for hydroxylated peptides and a localisation probability cut-off of more than 0.5 for hydroxylated peptides was performed." Could the authors shed more light and clarify what was the basis for this value of cut-off to be used in this filtering step? Is this sample dependent? What should be the criteria to determine this value?

      We used MaxQuant software (10.1016/j.cell.2006.09.026), for PTM analysis, in which a localization probability score of 0.75 and score cut-off of 40 is a commonly used threshold to define high confidence. The reason that we used 0.5 at the first step was to investigate how likely it might be that the misassignment of delta m/z +16 Da (oxidation) on Methionine would affect the identification of hydroxylation on Proline. However, we note that in the final results set used for analysis, all putative proline hydroxylated peptides that had a Methionine residue near to the hydroxylated proline were disregarded as a pragmatic step to reduce the probability of false identifications.

      (4) The authors are requested to kindly make the HPLC and MS traces more legible and use highresolution images, with clearly labeled values on the peaks. Kindly extract coordinates from the underlying data files to plot the curves if needed to make it clearer.

      We have reviewed the clarity of all images and figures in the current revision.

      (5) There seems to be no error bars in Figure 3, Figure 7E, and panels of Figure 8 with bar graphs. Are those single replicate data?

      These specific figures are from single replicate data.

      (6) For Figure 9C, the control with only PHD1 (no peptide) is missing. 

      The ‘no peptide control’ was not included in the figure because it is simply a blank lane and there is nothing to see.

    1. eLife Assessment

      This valuable study presents findings on how prokaryotic antibiotics affect translation in mitochondrial ribosomes. Using mitoribosome profiling, the authors provide solid evidence that most tested antibiotics act similarly on bacterial and mitochondrial translation. Additionally, this work shows that alternative translation initiation events might exist in two specific mt-mRNAs (MT-ND1 and MT-ND5). However, additional biochemical and structural experiments are needed to support these findings.

    2. Reviewer #1 (Public review):

      Summary:

      This study aimed to determine whether bacterial translation inhibitors affect mitochondria through the same mechanisms. Using mitoribosome profiling, the authors found that most antibiotics, except telithromycin, act similarly in both systems. These insights could help in the development of antibiotics with reduced mitochondrial toxicity.

      They also identified potential novel mitochondrial translation events, proposing new initiation sites for MT-ND1 and MT-ND5. These insights not only challenge existing annotations but also open new avenues for research on mitochondrial function.

      Strengths:

      Ribosome profiling is a state-of-the-art method for monitoring the translatome at very high resolution. Using mitoribosome profiling, the authors convincingly demonstrate that most of the analyzed antibiotics act in the same way on both bacterial and mitochondrial ribosomes, except for telithromycin. Additionally, the authors report possible alternative translation events, raising new questions about the mechanisms behind mitochondrial initiation and start codon recognition in mammals.

      Weaknesses:

      All the weaknesses I previously highlighted were adequately addressed.

    3. Reviewer #3 (Public review):

      Summary:

      Recently, the off-target activity of antibiotics on human mitoribosome has been paid more attention in the mitochondrial field. Hafner et al applied mitoribosome profilling to study the effect of antibiotics on protein translation in mitochondria as there are similarities between bacterial ribosome and mitoribosome. The authors conclude that some antibiotics act on mitochondrial translation initiation by the same mechanism as in bacteria. On the other hand, the authors showed that chloramphenicol, linezolid and telithromycin trap mitochondrial translation in a context-dependent manner. More interesting, during deep analysis of 5' end of ORF, the authors reported the alternative start codon for ND1 and ND5 proteins instead of previously known one. This is a novel finding in the field and it also provide another application of the technique to further study on mitochondrial translation.

      Strengths:

      This is the first study which applied mitoribosome profiling method to analyze mutiple antibiotics treatment cells. The mitoribosome profiling method had been optimized carefully and has been suggested to be a novel method to study translation events in mitochondria. The manuscript is constructive and well-written.

      Weaknesses:

      This is a novel and interesting study, however, most of conclusion comes from mitoribosome profiling analysis, as the result, the manuscript lacks the cellular biochemical data to provide more evidence and support the findings.

      Comments on revisions:

      The authors addressed most of my concerns and comments, although there is still no biochemical assay which should be performed to support mitoribsome profiling data.

      The author also carefully investigated the structure of complex I, however, I am surprised that the author chose to analyse a low resolution structure (3.7 A). Recently, there are more high resolution structures of mammalian complex I published (7R41, 7V2C, 7QSM, 9I4I). Furthermore, the authors should not only respond to the reviewers but also (somehow) discuss these points in the manuscript.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study aimed to determine whether bacterial translation inhibitors affect mitochondria through the same mechanisms. Using mitoribosome profiling, the authors found that most antibiotics, except telithromycin, act similarly in both systems. These insights could help in the development of antibiotics with reduced mitochondrial toxicity.

      They also identified potential novel mitochondrial translation events, proposing new initiation sites for MT-ND1 and MT-ND5. These insights not only challenge existing annotations but also open new avenues for research on mitochondrial function.

      Strengths:

      Ribosome profiling is a state-of-the-art method for monitoring the translatome at very high resolution. Using mitoribosome profiling, the authors convincingly demonstrate that most of the analyzed antibiotics act in the same way on both bacterial and mitochondrial ribosomes, except for telithromycin. Additionally, the authors report possible alternative translation events, raising new questions about the mechanisms behind mitochondrial initiation and start codon recognition in mammals.

      Weaknesses:

      The main weaknesses of this study are:

      While the authors highlight an interesting difference in the inhibitory mechanism of telithromycin on bacterial and mitochondrial ribosomes, mechanistic explanations or hypotheses are lacking.

      We acknowledge that we were not able to present a clear explanation for potential mechanistic differences of telithromycin inhibition between mitochondrial and bacterial ribosomes. In future work, structural analyses in collaboration with experts will provide these insights.

      The assignment of alternative start codons in MT-ND1 and MT-ND5 is very interesting but does not seem to fully align with structural data.

      We appreciate the reviewer’s comment and consulted a cryo-EM expert to review our findings in the context of the available structural data. We downloaded the density map and reviewed the N-termini of MT-ND1 and MT-ND5. We only observed the density of the N-terminus of MT-ND1 at low confidence. At an RMSD of 2, we could not observe density for the side chains of Met and Pro, and there are gaps in the density for what is modeled as the main chain. The assignment of these residues may have been overlooked due to the expectation that they should be present in the peptide.

      For MT-ND5, we did observe some density that could be part of the main chain; however, it did not fill out until we reduced the stringency, and we did not observe density mapping to side chain residues. To summarize, we do not confidently see density for either the side chain or the main chain for either peptide.

      The newly proposed translation events in the ncRNAs are preliminary and should be further substantiated with additional evidence or interpreted with more caution.

      We agree with the reviewer that we did not provide conclusive evidence that our phased ribosome footprinting data on mitochondrial non-coding RNAs are proof of novel translation events. We do acknowledge this in the main text:” Due to both the short ORFs, minimal read coverage, and lack of a detectable peptide we could not determine if translation elongation occurred on the mitochondrial tRNAs. These sites may be unproductive mitoribosome binding events or simply from tRNAs partially digesting during MNase treatment.”

      Reviewer #2 (Public review):

      In this study, the authors set out to explore how antibiotics known to inhibit bacterial protein synthesis also affect mitoribosomes in HEK cells. They achieved this through mitoribosome profiling, where RNase I and Mnase were used to generate mitoribosome-protected fragments, followed by sequencing to map the regions where translation arrest occurs. This profiling identified the codon-specific impact of antibiotics on mitochondrial translation.

      The study finds that most antibiotics tested inhibit mitochondrial translation similarly to their bacterial counterparts, except telithromycin, which exhibited distinct stalling patterns. Specifically, chloramphenicol and linezolid selectively inhibited translation when certain amino acids were in the penultimate position of the nascent peptide, which aligns with their known bacterial mechanism. Telithromycin stalls translation at an R/K-X-R/K motif in bacteria, and the study demonstrated a preference for arresting at an R/K/A-X-K motif in mitochondria. Additionally, alternative translation initiation sites were identified in MT-ND1 and MT-ND5, with non-canonical start codons. Overall, the paper presents a comprehensive analysis of antibiotics in the context of mitochondrial translation toxicity, and the identification of alternative translation initiation sites will provide valuable insights for researchers in the mitochondrial translation field.

      From my perspective as a structural biologist working on the human mitoribosome, I appreciate the use of mitoribosome profiling to explore off-target antibiotic effects and the discovery of alternative mitochondrial translation initiation sites. However, the description is somewhat limited by a focus on this single methodology. The authors could strengthen their discussion by incorporating structural approaches, which have contributed significantly to the field. For example, antibiotics such as paromomycin and linezolid have been modeled in the human mitoribosome (PMID: 25838379), while streptomycin has been resolved (10.7554/eLife.77460), and erythromycin was previously discussed (PMID: 24675956). The reason we can now describe off-target effects more meaningfully is due to the availability of fully modified human mitoribosome structures, including mitochondria-specific modifications and their roles in stabilizing the decoding center and binding ligands, mRNA, and tRNAs (10.1038/s41467-024-48163-x).

      These and other relevant studies should be acknowledged throughout the paper to provide additional context.

      We appreciate the work that has previously revealed how different antibiotics bind the mitochondrial ribosome. We have included these references in the manuscript to provide background and context for this work in relationship to the field.

      Reviewer #3 (Public review):

      Summary:

      Recently, the off-target activity of antibiotics on human mitoribosome has been paid more attention in the mitochondrial field. Hafner et al applied mitoribosome profilling to study the effect of antibiotics on protein translation in mitochondria as there are similarities between bacterial ribosome and mitoribosome. The authors conclude that some antibiotics act on mitochondrial translation initiation by the same mechanism as in bacteria. On the other hand, the authors showed that chloramphenicol, linezolid and telithromycin trap mitochondrial translation in a context-dependent manner. More interesting, during deep analysis of 5' end of ORF, the authors reported the alternative start codon for ND1 and ND5 proteins instead of previously known one. This is a novel finding in the field and it also provides another application of the technique to further study on mitochondrial translation.

      Strengths:

      This is the first study which applied mitoribosome profiling method to analyze mutiple antibiotics treatment cells.

      The mitoribosome profiling method had been optimized carefully and has been suggested to be a novel method to study translation events in mitochondria. The manuscript is constructive and written well.

      Weaknesses:

      This is a novel and interesting study, however, most of the conclusion comes from mitoribosome profiling analysis, as a result, the manuscript lacks the cellular biochemical data to provide more evidence and support the findings.

      We thank the reviewer for the positive assessment of our work. We agree that future biochemical and structural experiments will strengthen the conclusions we derive from the ribosome profiling.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      In Fig. 1A, the quality of the Western blot for the sucrose gradient is suboptimal. I recommend enhancing the quality of the Western blot image and providing the sucrose gradient sedimentation patterns for both the mtSSU and mtLSU to confirm the accurate selection of the monosome fraction. Additionally, to correctly assign the A260 peaks to mitochondrial and cytosolic ribosomes, it would be helpful to include markers for both the cytoribosomal LSU and SSU, too. Furthermore, do the authors observe mitochondrial polysomes in their sucrose gradient? If so, were those fractions fully excluded from the analysis?

      We repeated our sucrose gradient and Western blotting with antibodies for the large and small subunits of the mitoribosome. We did not repeat western blotting for the cytoribosomes as the 40S, 60S, and 80S peaks are present in their canonical heights and locations on a sucrose gradient. Western blotting indicates that the large and small subunits of the mitoribosome are located in the fraction taken for mitoribo-seq. We do see trace amounts of mitoribosome in fractions past the 55S site. Those fractions were excluded from library preparation.

      The MNase footprints exhibited a bimodal distribution, which the authors suggest may indicate that "MNase-treatment may have captured two distinct conformations of the ribosome." It would be relevant to clarify whether an enzyme titration was performed, as excessive MNase could lead to ribosomal RNA degradation, potentially influencing the footprints.

      We did not perform a titration and instead based our concentration on the protocol from Rooijers et al, 2013. We included a statement of this and a reference to the concentration in the methods.

      Is there an explanation for why RNase I footprinting reveals a very high peak at the 5'-end of the MT-CYB transcript, whereas this is not observed with MNase footprinting?

      It is not clear. The intensity of peaks at the 5’ end of the transcripts varies. We do observe that the relative intensity of the 5’ peak is greater for RNase I footprinted samples than MNase-treated samples.

      I understand that throughout the manuscript, the authors use MT-CYB as an example to illustrate the effects of the antibiotics on mitochondrial translation. However, to strengthen the generality of the conclusions, it would be beneficial to provide the read distribution across the entire mitochondrial transcriptome, possibly in the supplementary material. Additionally, I suggest including the read distribution for MT-CYB in untreated cells to improve data interpretation and enhance the clarity of figures (e.g., Figs. 1B, 2B, 3B).

      As these experiments were generated across multiple mitoribo-seq experiments, each was done with its own control experiment. It would be inaccurate to show a single trace as representative of all experiments. Instead, we include Supplementary Figure 1, which shows the untreated MT-CYB trace for all control samples and indicates which treatment they pair with.

      It would be very valuable to label each individual data point in the read phasing shown in Fig. 1D with the corresponding transcripts. For improved data visualization, I suggest assigning distinct colors to each transcript.

      We are concerned that including the name of each gene in the main figure would be too difficult for the reader to accurately interpret. Instead, we have added a Supplementary Table with those values.

      How do the authors explain the significant peak (approx. 10,000 reads) at the 5' end of the transcript in the presence of tiamulin (Fig. 2B)? Does this peak correspond to the start codon, and how does it relate to the quantification reported in Fig. 2C?

      Yes, this represents the start codon. These reads are likely derived from the start codon as they are mapping to the 5’ end of the transcript. There are differences in sequencing depth depending on the experiment, so what is critical is the relative distribution of reads on the transcript rather than comparing absolute reads between experiments. MT-CYB has 54% of the reads at the start site, which is representative of what we see across all genes.

      Throughout the manuscript, I found the usage of the terms "5' end" and "start codon" somewhat confusing, as they appear to be used synonymously in some instances. For example, in Fig. 2C, the y-axis label states "ribosomes at start codon," while the figure caption mentions "...percentage of reads that map to the 5' end of mitochondrial transcripts." Given the size of the graphs, it is also challenging for the reader to determine whether the peaks correspond specifically to the start codon or if multiple peaks accumulate at the initial codons.

      We were selected for this language because two different types of analysis are being carried out. Ribosome profiling carried out in Figures 2 and 3 is carried out with RNase I, which poorly maps the ribosomes at the start codon when we do the read length analysis in Figure 4. Ribosome footprint at the 5’ end may include ribosomes that are on the 2-4 codons following the start codon, so it would not be accurate to label those as “ribosomes at a start codon.” We have renamed the axis to “Ribosomes at 5’ end”. Wig files are available online for all mitoribosome profiling experiments. In this case, the assigned “P-site” is several codons after the start codon due to the offset applied and the minimal 5’ UTR. Thus, it is less important to see which codon density is assigned to, but rather the general distribution of the reads.

      The authors state, "Cells treated with telithromycin did show a slight increase in MRPF abundance at the 5' end of MT-CYB" and "the cumulative distribution of MRPFs suggested that ribosome density was biased towards the 5' end of the gene for chloramphenicol and telithromycin, but not significantly for linezolid." Could this observation be linked to the presence of specific stalling motifs in that region? If so, it would be beneficial to display such motifs on the graphs of the read distribution across the transcriptome to substantiate the context-dependent inhibition.

      Thank you for this suggestion. For chloramphenicol and linezolid, alanine, serine, and threonine make up nearly 25% of the mitochondrial proteome. As such, there are numerous stall sites across the transcript. Given their identical stalling motifs, the difference between chloramphenicol and linezolid is due to sequence-specific differences. Potentially, this could be due to conditions such as the final concentration of antibiotic inside the mitochondria and the on/off rate of an inhibitor with the translating mitoribosome. Both may affect the kinetics of stalling and allow mitoribosomes to evade early stall sites.

      We have also included the sites of all A/K/R-X-K motifs located in the genome and the calculated fold change for each position. As a note, this includes sites that do not pass the minimum filter set by our analysis and we note this in the text.

      The comment raises an additional question: Does the increased density at the 5’ end derive from stalled mitoribosomes or queued mitoribosomes behind a stalling event? Recent work by Iwasaki’s group shows that mitoribosomes can form disomes and queue behind each other. However, we could not observe 30 aa periodicities behind stalling events that would be indicative of collided mitoribosomes.

      In Fig. 3E, the authors report an additional and very interesting observation that is not discussed. Linezolid treatment causes reduced ribosome occupancy when proline or glycine codons occupy the P-site, or when the amino acids have been incorporated into the polypeptide chain and occupy the -1 position. It is known that the translation of proline and glycine frequently leads to ribosome stalling due to the physicochemical properties of these amino acids. Has this effect of linezolid been reported in the bacterial translation system? Additionally, can the authors propose hypotheses for the mechanism behind this observation? A similar observation is noted for telithromycin when glycine occupies the same positions, as well as when aspartate occupies the P- and A-sites.

      In bacteria, Linezolid does have an “anti-stalling” motif when glycine is present in the A-site. However, this is due to the size of the residue being compatible with antibiotic binding.

      The most likely cause of this effect is a redistribution of ribosome footprints. As the antibiotics introduce new arrest sites, ribosome density at other sites relatively decreases. This is likely an artifact from mitoribosomes redistributing from endogenously slow codons to new arrest sites. When looking at carrying out our disome profiling in the presence of anisomycin, we see a similar effect. Cytoribosomes are redistributed from endogenous stalling sites, such as proline, and are redistributed throughout the gene. As a result, translation at proline appears “more efficient” upon treatment with an inhibitor but is instead an artifact of analysis.

      Figure 3F could benefit from indicating which mtDNA-encoded protein corresponds to each of the strongest stalling motifs.

      We have included a supplementary figure to highlight which mitochondrially-encoded genes containing the R/K/A-X-K motif and noted in the text that mitochondrial translation may be unevenly inhibited.

      The legend "increasing mRPF abundance" in Fig. 4C may be missing the corresponding colors.

      The legend applies to all sections of the figure. We double-checked the range of the colors in the tables, and they do match the legend.

      The observation that the start codons in MT-ND1 and MT-ND5 might differ from the annotated canonical ones is intriguing. While the ribosome profiling data appear clear, mass spectrometry (MS) analysis may be misleading. The absence of evidence does not necessarily imply evidence of absence. How does this proposed conclusion correlate with the structural data obtained from HEK cells? For instance, the cryo-EM structural model of a complex I-containing human supercomplex (PDB: 5XTD, PMID: 28844695) shows the presence of Pro2 in MT-ND1 and the full-length MT-ND5 protein. The authors should carefully examine structural data to ascertain whether alternative forms of MT-ND1 and MT-ND5 are actually observed in the assembled complex I.

      We really appreciate this comment. We sat down with an expert in cryo-EM and reviewed the figure. We downloaded the density map and reviewed the N-termini of MT-ND1 and MT-ND5. We only observed the density of the N-terminus of MT-ND1 at low confidence. At an RMSD of 2, we could not observe density for the side chains of Met and Pro, and there are gaps in the density for what is modeled as the main chain. The assignment of these residues may have been overlooked due to the expectation that they should be present in the peptide.

      For MT-ND5, we did observe some density that could be part of the main chain; however, it did not fill out until we reduced the stringency, and we did not observe density mapping to side chain residues. To summarize, we do not confidently see density for either the side chain or the main chain for either peptide.

      Given that ribosome profiling is based on the assumption that ribosomes protect mRNA fragments from RNase digestion, interpreting the data related to Fig. 5 and the proposed existence of translation events involving ncRNAs is challenging. Most importantly, tRNAs and rRNAs are highly folded RNA molecules and, by definition, are protected by ribosomal proteins. Simultaneously, as the authors point out, "These reads could either be products of random digestion of the abundant background of ncRNAs or be genuine MRPFs." RNase I preferentially digests single-stranded RNA (ssRNA), but excess enzyme can still lead to degradation. Consequently, many random tRNA/rRNA fragments may be generated by RNase digestion, potentially resulting in artifacts. I suggest that the authors examine what happens to these reads when mitochondrial translation is inhibited.

      We have low-quality mitoribo-seq with initiation inhibitors and Mnase showing footprints of the same size. We do not have a small-molecule inhibitor that is able to completely ablate translation, as they instead stabilize mitoribosomes at different steps in translation. We have considered alternative ways of capturing a background rRNA and tRNA digestion pattern; however, these have their own drawbacks. Dissociation with EDTA prior to digestion or carrying out library prep on the small and large subunits may capture mitoribosomes no longer in the process of translation; however, dissociated subunits would have different surfaces now available for digestion and may not capture tRNAs.

      Regarding the statement, "While the ORF on MT-TS1 is longer, MRPF density was low and we did not observe read phasing and thus it is likely not translated (not shown)," the data should not be excluded unless a clear explanation is provided for why translation would not occur from this specific RNA.

      We have included this value in the graph as well as in Supplementary Figure 1.

      The graph in Fig. 5B shows the periodicity of only the putative RNR1 ORF, but not that of the other proposed ORFs. What is the reason for this?

      We have included the MT-TS1 putative ORF in Figure 5 and Figure S1. Other ORFs did not have density in the ORF. If these are real mitoribosome footprints at these start codons, it may be due to them being transient binding events that never result in elongation. Alternatively, they may be due to tRNA degradation during library preparation.

      The assumption that the UUG codon can serve as a start site for mitochondrial translation has not been substantiated. Recent data have identified translation initiation events from non-ATG/ATA codons (near-cognate and sub-cognate) using retapamulin, but UUG was not among them. Can the authors detect such events in their ribosome profiling data collected in the presence of retapamulin, tiamulin, or josamycin?

      The report of translation initiation at non-ATG/ATA codons strongly disagrees with our findings. We report that sites of translation initiation observed within annotated coding regions in mitochondria occur at the annotated start sites, while the other report finds frequent alternative initiation events. We have looked for those arrest sites and did not observe them.

      In the section "Mitoribosome profiling reveals novel translation events," the title may be misleading given the preliminary nature of the results. To support such a claim, the authors should provide experimental evidence demonstrating that the proposed translation events genuinely exist and result in the synthesis of previously unidentified polypeptides. Alternatively, the interpretation should be approached with greater caution and more clearly indicated as preliminary.

      We agree with the reviewers that a distinction should be made between reporting truly novel translation events, like the recently reported MT-ND5-dORF, and sites we suspect mitoribosomes may be binding and that require detailed follow-up. We altered the section title to suggest that this may be showing novel translation events. Additionally, we included a statement in the discussion that these MRPFs may be simply tRNA digestion by RNase I.

      Although located at the 5' end of RNR1, the newly identified ORF is situated 79 nt downstream. According to current knowledge, this appears to be a lengthened 5' UTR that may hinder mitoribosome loading. The authors should speculate on potential initiation mechanisms.

      The start of the putative ORF is not located 79 nts down, but at the 8<sup>th</sup> nucleotide. The reviewer may be including the tRNA-Phe in their calculation, which is cleaved from MT-RNR1. This start site is closer to the 5’ end than our findings with MT-ND5.

      To enhance the interpretation of the mitoribosome profiling data, the authors could complement their findings with classical metabolic labeling using (35)S-methionine. This approach would allow for a different assessment of the stringency of the inhibition under the tested experimental conditions.

      We are currently working on these experiments using mito-funcats. A future direction we are taking this work is to understand how the cell responds to different mechanisms of translation inhibition. For example, we are trying to understand if telithromycin, which appears highly selective, only partially inhibits translation of the mtDNA-encoded proteome.

      Reviewer #2 (Recommendations for the authors):

      Other small editorial comments:

      Line 24: "translate proteins"?

      Revised for clarity

      Line 24: The sentence describing mitochondrial translation as "closely resembling the one in prokaryotes" could be reformulated. While the core of the mitoribosome is conserved, the entire apparatus has many mitochondria-specific features.

      Since this is the abstract, we simplified the point by saying that mitoribosomes are more similar to prokaryotic than cytosolic ribosomes.

      Clarified to highlight that the mitochondrial system is more similar to the bacterial system than the eukaryotic system.

      Line 33: "novel" or "previously unrecognized" ?

      Rewritten for clarity.

      Lines 33-35: The claim made here is not shown in the paper.

      We removed the more aspirational goal of this paper and focused on the main findings of the paper.

      Lines 44, 47, 89 (and elsewhere): "cytoplasmic" or "cytosolic" ?

      Both terms are used in the literature. We opted for cytoplasmic as it can also include ribosomes not free in the cytosol, such as those bound to the ER.

      Reviewer #3 (Recommendations for the authors):

      (1) The authors should state why they chose these antibiotics for mitoribosome profiling analysis over other antibiotics from same group. Did they screen multiple antibiotics to determine the candidates for next step?

      We selected antibiotics that had a known stalling motif in bacteria (initiation or context-dependent elongation inhibitors). In addition, we carried out mitoribosome profiling with erythromycin, azithromycin, thiostrepton, and kanamycin in this work. However, we did not see any effect from these drugs in mitoribosome profiling. We are currently testing other inhibitors, such as doxycycline and tigecycline, and looking at optimizing treatment conditions to identify stalling motifs in samples that previously showed no difference.

      (2) What is the reason for choosing the concentration of antibiotics retapamulin, tiamulin and josamycin, this is IC50 value or above this value? On the other hand, none of this information has been provided for the antibiotics in the next part. The authors should provide biochemical study for the effect of these antibiotics on cell survival and/or protein translation such as S35 assay or steady state level of mtDNA-encoded proteins upon cell treatment with these antibiotics.

      Prior to mitoribo-seq, we carried out time and concentration assays with all antibiotics. 100 µg/ml and a 30-minute treatment was tolerable for all antibiotics except retapamulin. We aimed to treat cells with a relatively high concentration of inhibitor in order to capture actively translating mitoribosomes. We were concerned that longer treatments may lead to decreased translation initiation, leading to the capture of fewer mitoribosomes. These concentrations were nearly identical to contemporary conditions carried out in Bibel et al, RNA 2025.

      (3) Why did the authors choose MT-CYB as the representative for further analysis in the second and third parts of the manuscript?

      We chose MT-CYB because its length allowed for easy visualization. Some mitochondrial genes, such as MT-ND6, had a propensity for stronger stalling at initiation. While coverage was throughout the genes, it was difficult to visualize the changes within the ORF. Also, MT-CYB was less visually complex than polycistronic transcripts. All wigs were uploaded to GEO.

      (4) Page 11, line 233-234: the authors state that telithromycin induces stalling at R/K/A-X-K motif. The authors should do further analysis on mitochondrial genome which proteins contain this motif. Furthermore, same as comment 2: the authors should confirm by 35S assay or WB to know which mtDNA-encoded proteins are affected.

      We have included a supplementary figure showing which mitochondrial genes contain these motifs.

      (5) The results and conclusion from the fourth paragraph are very interesting. The authors suggest alternative start codon for two mtDNA encoded proteins: ND1 and ND5 based on ribosome profiling analysis. Again, I have several comments on this part: <br /> (a) For the accumulation of the alternative start codon of ND1 and ND5 as suggested in the manuscript, do the authors observe this trend with the initiation inhibitors used in the second paragraphs of the manuscript?

      We did not observe similar read lengths with retapamulin, tiamulin, or josamycin, which produced read lengths that were consistent with other RNase I footprinted samples.

      (b) This observation was further confirmed by MS with a peptide form ND1 protein, the authors should show MS peak indicating MW of the peptide and MS/MS data for the peptide which supports this hypothesis.

      We are including the MS/MS report for this peptide.

      (c) Interestingly, several high-resolution structures of mammalian complex I have been reported so far (PMID: 7614227, 10396290, 38870289), ND1 and ND5 protein express full sequences with fMet at the distal N-terminal. This is different to the suggestion from the manuscript. Could the author discuss or comment on that?

      This point was brought up by another reviewer. We have carefully analyzed the density map of PMID: 28844695. We sat down with an expert in cryo-EM and reviewed the figure. We downloaded the density map and reviewed the N-termini of MT-ND1 and MT-ND5. We only observed the density of the N-terminus of MT-ND1 at low confidence. At an RMSD of 2, we could not observe density for the sidechains of Met and Pro, and there is a gap in density for what is modeled as the main chain. The assignment of these residues may have been overlooked due to the expectation that they should be present in the peptide.

      For MT-ND5, we did observe some density that could be part of the main chain; however, it did not fill out until we reduced the stringency, and we did not observe density mapping to side chain residues. To summarize, we do not confidently see density for either the side chain or the main chain for either peptide.

      Minor comments:

      The method should be written more accurately for easily repeating experiments by other groups. For example:

      (1) The authors should indicate what was exact HEK293 cell line used in this study.

      We have indicated the exact cell line.

      (2) Page 22, line 471: which (number) fractions had been collected. The Western Blot analysis shown in Figure 1A should be repeated with both proteins from small and large subunits.

      We have repeated the Western blot with antibodies for large and small subunits. We took fractions 8 and 9, which are now indicated in the text and figure.

      (3) Page 23, line 502: is this number of cells used for MS experiment is correct? Or is this number of cells per mL?

      This is correct and is based on the kit protocol. It is not cells per mL. We have clarified the kit being used in the methods.

    1. eLife Assessment

      This work provides an important resource identifying 72 proteins as novel candidates for plasma membrane and/or cell wall damage repair in budding yeast, and describes the temporal coordination of exocytosis and endocytosis during the repair process. The data are convincing; however, additional experimental validation will better support the claim that repair proteins shuttle between the bud tip and the damage site.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Yamazaki et al. conducted multiple microscopy-based GFP localization screens, from which they identified proteins that are associated with PM/cell wall damage stress response. Specifically, the authors identified that bud-localized TMD-containing proteins and endocytotic proteins are associated with PM damage stress. The authors further demonstrated that polarized exocytosis and CME are temporally coupled in response to PM damage, and CME is required for polarized exocytosis and the targeting of TMD-containing proteins to the damage site. From these results, the authors proposed a model that CME delivers TMD-containing repair proteins between the bud tip and the damage site.

      Strengths:

      Overall, this is a well-written manuscript, and the experiments are overall well-conducted. The authors identified many repair proteins and revealed the temporal coordination of different categories of repair proteins. Furthermore, the authors demonstrated that CME is required for targeting of repair proteins to the damage site, as well as cellular survival in response to stress related to PM/cell wall damage. Although the roles of CME and bud-localized proteins in damage repair are not completely new to the field, this work does have conceptual advances by identifying novel repair proteins and proposing the intriguing model that the repairing cargoes are shuttled between the bud tip and the damaged site through coupled exocytosis and endocytosis.

      Weaknesses:

      While the results presented in this manuscript are convincing, they might not be sufficient to support some of the authors' claims. Especially in the last two result sessions, the authors claimed CME delivers TMD-containing repair proteins from the bud tip to the damage site. The model is no doubt highly possible based on the date, but caveats still exist. For example, the repair proteins might not be transported from one localization to another localization, but are degraded and re-synthesized. Although the Gal-induced expression system can further support the model to some extent, I think more direct verification (such as FLIP or photo-convertible fluorescence tags to distinguish between pre-existing and newly synthesized proteins) would significantly improve the strength of evidence.

      Review on revised version:

      The authors addressed most of concerns that were originally raised, primarily by revising the text and figures and expanding the discussion, which improves the clarity of the manuscript. Although the authors did not address my major concern on the shuttling/trafficking model experimentally, I do understand the limitation of resources and time. The authors noted that they planned to do these experiments for their future work, and such studies would be more definitive evaluations for the proposed model. Overall I think this is a very interesting and well-conducted study and I enjoyed reading this manuscript. I look forward to their following research of this study.

    3. Reviewer #2 (Public review):

      This paper remarkably reveals the identification of plasma membrane repair proteins, revealing spatiotemporal cellular responses to plasma membrane damage. The study highlights a combination of sodium dodecyl sulfate (SDS) and lase for identifying and characterizing proteins involved in plasma membrane (PM) repair in Saccharomyces cerevisiae. From 80 PM, repair proteins that were identified, 72 of them were novel proteins. The use of both proteomic and microscopy approaches provided a spatiotemporal coordination of exocytosis and clathrin-mediated endocytosis (CME) during repair. Interestingly, the authors were able to demonstrate that exocytosis dominates early and CME later, with CME also playing an essential role in trafficking transmembrane-domain (TMD) containing repair proteins between the bud tip and the damage site.

      Weaknesses/limitations:

      - Still, there is a lack of clarity about mentioning Pkc1 as the best characterized repair protein, or why is Pkc1 mentioned only as it is changing the localization?!

      - The use of a C-terminal GFP-tagged library for the laser damage assay may have limited the identification of proteins whose localization or function depends on an intact N-terminus. N-terminal regions might contain targeting or regulatory elements; therefore, some relevant repair factors may have been missed. Analysis of endogenously N-terminally tagged strains, at least for selected candidates, could help address this limitation.

      - The authors appropriately discuss the limitations of SDS- and laser-induced plasma membrane damage, including the possibility that these approaches may not capture proteins involved in other forms of membrane injury, such as mechanical or osmotic stress.

    4. Reviewer #3 (Public review):

      Summary:

      This work aims to understand how cells repair damage to the plasma membrane (PM). This is important as failure to do so will result in cell lysis and death. Therefore, this is an important fundamental question with broad implications for all eukaryotic cells. Despite this importance, there are relatively few proteins known to contribute to this repair process. This study expands the number of experimentally validated PM from 8 to 80. Further, they use precise laser-induced damage of the PM/cell wall and use live-cell imaging to track the recruitment of repair proteins to these damage sites. They focus on repair proteins that are involved in either exocytosis or clathrin-mediated endocytosis (CME) to understand how these membrane remodeling processes contribute to PM repair. Through these experiments, they find that while exocytosis and CME both occur at the sites of PM damage, exocytosis predominates the early stages of repairs, while CME predominates in the later stages of repairs. Lastly, they propose that CME is responsible for diverting repair proteins localized to the growing bud cell to the site of PM damage.

      Strengths:

      The manuscript is very well written and the experiments presented flow logically. The use of laser-induced damage and live-cell imaging to validate the proteome-wide screen using SDS induced damage strengthen the role of the identified candidates in PM/cell wall repair.

      Comments on revisions:

      The authors have very nicely addressed my previous comments and I have no further concerns.

    5. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This work provides an important resource identifying 72 proteins as novel candidates for plasma membrane and/or cell wall damage repair in budding yeast, and describes the temporal coordination of exocytosis and endocytosis during the repair process. The data are convincing; however, additional experimental validation will better support the claim that repair proteins shuttle between the bud tip and the damage site.

      We thank the editors and reviewers for their positive assessment of our work and the constructive feedback to improve our manuscript. We agree with the assessment that additional validation of repair protein shuttling between the bud tip and the damage site is required to further support the model.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Yamazaki et al. conducted multiple microscopy-based GFP localization screens, from which they identified proteins that are associated with PM/cell wall damage stress response. Specifically, the authors identified that budlocalized TMD-containing proteins and endocytotic proteins are associated with PM damage stress. The authors further demonstrated that polarized exocytosis and CME are temporally coupled in response to PM damage, and CME is required for polarized exocytosis and the targeting of TMD-containing proteins to the damage site. From these results, the authors proposed a model that CME delivers TMD-containing repair proteins between the bud tip and the damage site.

      Strengths:

      Overall, this is a well-written manuscript, and the experiments are well-conducted. The authors identified many repair proteins and revealed the temporal coordination of different categories of repair proteins. Furthermore, the authors demonstrated that CME is required for targeting of repair proteins to the damage site, as well as cellular survival in response to stress related to PM/cell wall damage. Although the roles of CME and bud-localized proteins in damage repair are not completely new to the field, this work does have conceptual advances by identifying novel repair proteins and proposing the intriguing model that the repairing cargoes are shuttled between the bud tip and the damaged site through coupled exocytosis and endocytosis.

      Weaknesses:

      While the results presented in this manuscript are convincing, they might not be sufficient to support some of the authors' claims. Especially in the last two result sessions, the authors claimed CME delivers TMD-containing repair proteins from the bud tip to the damage site. The model is no doubt highly possible based on the data, but caveats still exist. For example, the repair proteins might not be transported from one localization to another localization, but are degraded and resynthesized. Although the Gal-induced expression system can further support the model to some extent, I think more direct verification (such as FLIP or photo-convertible fluorescence tags to distinguish between pre-existing and newly synthesized proteins) would significantly improve the strength of evidence.

      Major experiment suggestions:

      (1) The authors may want to provide more direct evidence for "protein shuttling" and for excluding the possibility that proteins at the bud are degraded and synthesized de novo near the damage site. For example, if the authors could use FLIP to bleach budlocalized fluorescent proteins, and the damaged site does not show fluorescent proteins upon laser damage, this will strongly support the authors' model. Alternatively, the authors could use photo-convertible tags (e.g., Dendra) to differentiate between preexisting repair proteins and newly synthesized proteins.

      We thank the reviewer for evaluating our work and giving us important feedback. We agree that the FLIP and photo-convertible experiments will further confirm our model. Here, due to time and resource constraints, we decided not to perform this experiment. Instead, we have discussed this limitation in 363-366. Our proposed model of repair protein shuttling should be further tested in our future work.

      (2) In line with point 1, the authors used Gal-inducible expression, which supported their model. However, the author may need to show protein abundance in galactose, glucose, and upon PM damage. Western blot would be ideal to show the level of fulllength proteins, or whole-cell fluorescence quantification can also roughly indicate the protein abundance. Otherwise, we cannot assume that the tagged proteins are only expressed when they are growing in galactose-containing media.

      Thank you very much for raising the concern and suggesting the important experiments.We agree that the Western blot experiment to confirm the mNG-Snc1 expression in each medium will further strengthen our conclusion. Along with point (1), further investigation of repair protein shuttling between the bud tip and the damage site and the mechanisms underlying it will be an important future direction. As described above, we have discussed this limitation in 363-366.

      (3) Similarly, for Myo2 and Exo70 localization in CME mutants (Figure 4), it might be worth doing a western or whole-cell fluorescence quantification to exclude the caveat that CME deficiency might affect protein abundance or synthesis.

      We thank the reviewer for suggesting the point. Following the reviewer’s suggestion, we quantified the whole-cell fluorescence of WT and CME mutants and verified that the effect of the CME deletion on the expression levels of Myo2-sfGFP and Exo70-mNG is minimal ( Figure S6). We added the description in lines 211-212.

      (4) From the authors' model in Figure 7, it looks like the repair proteins contribute to bud growth. Does laser damage to the mother cell prevent bud growth due to the reduction of TMD-containing repair proteins at the bud? If the authors could provide evidence for that, it would further support the model.

      Thank you very much for raising the important point. We speculate that the reduction of TMD-containing proteins at the bud by CME is one of the causes of cell growth arrest after PM damage (1). This is because TMD-containing repair proteins at the bud tip, including phospholipid flippases (Dnf1/Dnf2), Snc1, and Dfg5, are involved in polarized cell growth (2-4). This will be an important future direction as well.

      (5) Is the PM repair cell-cycle-dependent? For example, would the recruitment of repair proteins to the damage site be impaired when the cells are under alpha-factor arrest?

      Thank you for raising this interesting point. Indeed, the senior author Kono previously performed this experiment when she was in David Pellman’s lab. The preliminary results suggest that Pkc1 can be targeted to the damage site, without any impairment, under alpha-factor arrest. A more comprehensive analysis in the future will contribute to concluding the relation between PM repair and the cell cycle.

      Reviewer #2 (Public review):

      This paper remarkably reveals the identification of plasma membrane repair proteins, revealing spatiotemporal cellular responses to plasma membrane damage. The study highlights a combination of sodium dodecyl sulfate (SDS) and lase for identifying and characterizing proteins involved in plasma membrane (PM) repair in Saccharomyces cerevisiae. From 80 PM, repair proteins that were identified, 72 of them were novel proteins. The use of both proteomic and microscopy approaches provided a spatiotemporal coordination of exocytosis and clathrin-mediated endocytosis (CME) during repair. Interestingly, the authors were able to demonstrate that exocytosis dominates early and CME later, with CME also playing an essential role in trafficking transmembrane-domain (TMD)containing repair proteins between the bud tip and the damage site.

      Weaknesses/limitations:

      (1) Why are the authors saying that Pkc1 is the best characterized repair protein? What is the evidence?

      We would like to thank the reviewer for taking his/her time to evaluate our work and for valuable suggestions. We described Pkc1 as “best characterized” because it was the first protein reported to accumulate at the laser damage site in budding yeast (5). However, as the reviewer suggested, we do not have enough evidence to describe Pkc1 as “best characterized”. We therefore used “one of the known repair proteins” to mention Pkc1 in the manuscript (lines 90-91).

      (2) It is unclear why the authors decided on the C-terminal GFP-tagged library to continue with the laser damage assay, exclusively the C-terminal GFP-tagged library. Potentially, this could have missed N-terminal tag-dependent localizations and functions and may have excluded functionally important repair proteins

      Thank you very much for the comments. We decided to use the C-terminal GFP-tagged library for the laser damage assay because we intended to evaluate the proteins of endogenous expression levels. The N-terminal sfGFP-tagged library is expressed by the NOP1 promoter, while the C-terminal GFP-tagged library is expressed by the endogenous promoters. We clarified these points in lines 114-118. We agree with the reviewer on that we may have missed some portion of repair proteins in the N-terminaldependent localization and functions by this approach. Therefore, in our manuscript, we discussed these limitations in lines 281-289.

      (3) The use of SDS and laser damage may bias toward proteins responsive to these specific stresses, potentially missing proteins involved in other forms of plasma membrane injuries, such as mechanical, osmotic, etc.). SDS stress is known to indirectly induce oxidative stress and heat-shock responses.

      Thank you very much for raising this point. We agree that the combination of SDS and laser may be biased to identify PM repair proteins. Therefore, in the manuscript, we discussed this point as a limitation of this work in lines 292-298.

      (4) It is unclear what the scale bars of Figures 3, 5, and 6 are. These should be included in the figure legend.

      We apologize for the missing scale bars. We added them to the legends of the figures in the manuscript.

      (5) Figure 4 should be organized to compare WT vs. mutant, which would emphasize the magnitude of impairment.

      Thank you for raising this point. Following the suggestion, we updated Figure 4. In the Figure 4, we compared WT vs mutant in the manuscript. We clarified it in the legends in the manuscript. 

      (6) It would be interesting to expand on possible mechanisms for CME-mediated sorting and retargeting of TMD proteins, including a speculative model.

      Thank you very much for this important suggestion. We think it will be very important to characterize the mechanism of CME-mediated TMD protein trafficking between the bud tip and the damage site. In the manuscript, we discussed the possible mechanism for CME activation at the damage site in lines 328-333. We speculate that the activation of the CME may facilitate the retargeting of the TMD proteins from the damage site to the bud tip.

      We do not have a model of how CMEs activate at the bud tip to sort and target the TMD proteins to the damage site. One possibility is that the cell cycle arrest after PM damage (1) may affect the localization of CME proteins because the cell cycle affects the localization of some of the CME proteins (6). We will work on the mechanism of repair protein sorting from the bud tip to the damage site in our future work.

      Reviewer #3 (Public review):

      Summary:

      This work aims to understand how cells repair damage to the plasma membrane (PM). This is important, as failure to do so will result in cell lysis and death. Therefore, this is an important fundamental question with broad implications for all eukaryotic cells. Despite this importance, there are relatively few proteins known to contribute to this repair process. This study expands the number of experimentally validated PM from 8 to 80. Further, they use precise laser-induced damage of the PM/cell wall and use livecell imaging to track the recruitment of repair proteins to these damage sites. They focus on repair proteins that are involved in either exocytosis or clathrin-mediated endocytosis (CME) to understand how these membrane remodeling processes contribute to PM repair. Through these experiments, they find that while exocytosis and CME both occur at the sites of PM damage, exocytosis predominates in the early stages of repairs, while CME predominates in the later stages of repairs. Lastly, they propose that CME is responsible for diverting repair proteins localized to the growing bud cell to the site of PM damage.

      Strengths:

      The manuscript is very well written, and the experiments presented flow logically. The use of laser-induced damage and live-cell imaging to validate the proteome-wide screen using SDS-induced damage strengthens the role of the identified candidates in PM/cell wall repair.

      Weaknesses:

      (1) Could the authors estimate the fraction of their candidates that are associated with cell wall repair versus plasma membrane repair? Understanding how many of these proteins may be associated with the repair of the cell wall or PM may be useful for thinking about how these results are relevant to systems that do or do not have a cell wall. Perhaps this is already in their GO analysis, but I don't see it mentioned in the manuscript.

      We would like to thank the reviewer for taking his/her time to evaluate our work and valuable suggestions. We agree that this is important information to include. Although it may be difficult to completely distinguish the PM repair and cell wall repair proteins, we have identified at least six proteins involved in cell wall synthesis (Flc1, Dfg5, Smi1, Skg1, Tos7, and Chs3). We included this information in lines 142-146 in the manuscript.

      (2) Do the authors identify actin cable-associated proteins or formin regulators associated with sites of PM damage? Prior work from the senior author (reference 26) shows that the formin Bnr1 relocalizes to sites of PM damage, so it would be interesting if Bnr1 and its regulators (e.g., Bud14, Smy1, etc) are recruited to these sites as well. These may play a role in directing PM repair proteins (see more below).

      Thank you for the suggestion. We identified several Bnr1-interacting proteins, including Bud6, Bil1, and Smy1 (Table S2), although Bnr1 itself was not identified in our screening. This could be attributed to the fact that (1) C-terminal GFP fusion impaired the function of Bnr1, and (2) a single GFP fusion is not sufficient to visualize the weak signal at the damage site. Indeed, in reference 26, 3GFP-Bnr1 (N-terminal 3xGFP fusion) was used.

      (3) Do the authors suspect that actin cables play a role in the relocalization of material from the bud tip to PM damage sites? They mention that TMD proteins are secretory vesicle cargo (lines 134-143) and that Myo2 localizes to damage sites. Together, this suggests a possible role for cable-based transport of repair proteins. While this may be the focus of future work, some additional discussion of the role of cables would strengthen their proposed mechanism (steps 3 and 4 in Figure 7).

      Thank you very much for the suggestion. We agree that actin cables may play a role in the targeting of vesicles and repair proteins to the damage site. Following the reviewer’s suggestion, we discussed the roles of Bnr1 and actin cables for repair protein trafficking in lines 309-313 in the manuscript.

      (4) Lines 248-249: I find the rationale for using an inducible Gal promoter here unclear. Some clarification is needed.

      Thank you for raising this point. We clarified this as possible as we could in lines 249255 in the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The N-terminal GFP collection screen is interesting but seems irrelevant to the rest of the results. The authors discussed that in the discussion part, but it might be worth showing how many hits from the laser damage screen (in Figure 2) overlap with the Nterminal GFP screen hits.

      Thank you for the suggestion. We found that 48 out of 80 repair proteins are hits in the N-terminal GFP library (Table S1 and S2). This result suggested that the N-terminal library is also a useful resource for identifying repair proteins. In the manuscript, we discussed it in lines 288-289.

      (2) SDS treatment seems a harsh stressor. As the authors mentioned, the overlapped hits from the N- and C-terminal GFP screen might be more general stress factors. Thus, I think Line 84 (the subtitle) might be overclaiming, and the authors might need to tone down the sentence.

      Thank you for the suggestion. Following the reviewer’s suggestion, we changed the sentence to “Proteome-scale identification of SDS-responsive proteins” in the manuscript. We believe that the new sentence describes our findings more precisely.

      (3) Line 103-106, it does not seem obvious to me that the protein puncta in the cytoplasm are due to endocytosis. The authors might need to provide more experimental evidence for the conclusion, or at least provide more reasoning/references on that aspect (e.g.,several specific protein hits belonging to that group have been shown to be endocytosed).

      Thank you very much for raising this point. We agree with the reviewer and deleted the description that these puncta are due to endocytosis in the manuscript.

      (4) For Figure 1D and S1C, the authors annotated some of the localization changes clearly, but some are confusing to me. For example," from bud tip/neck" to where? And from where to "Puncta/foci"? A clearer annotation might help the readers to understand the categorization.

      Thank you very much for the suggestion. These annotations were defined because it is difficult to conclusively describe the protein localization after SDS treatment. To convincingly identify the destination of the GFP fusion proteins, the dual color imaging of proteins with organelle markers or deep learning-based localization estimation is required. We feel that this might be out of the scope of this work. Therefore, as criteria, we used the localization of protein localization in normal/non-stressed conditions reported in (7) and the Saccharomyces Genome Database (SGD). We clarified this annotation definition in the manuscript (lines 413-436).

      (5) For localization in Figure 2C, as I understand, does it refer to6 the "before damage/normal" localization? If so, I think it would be helpful to state that these localizations are based on the untreated/normal conditions in the text.

      Yes, it refers to the “before damage/normal localization”. Following the reviewer’s suggestion, we stated that these localizations are based on these conditions in the manuscript (line 130).

      (6) The authors mentioned "four classes" in Line 120, but did not mention the "PM to cytoplasm" class in the text. It would be helpful to discuss/speculate why these transporters might contribute to PM damage repair.

      Thank you very much for this suggestion. We speculated that these transporters are endocytosed after PM damage because endocytosis of PM proteins contributes to cell adaptation to environmental stress (8). We mentioned it in the manuscript (lines 120-122).

      (7) Line 175-180 My understanding of the text is that the signals of Exo70-mNG/Dnf1mNG peak before the Ede1-mSc-I peaks. They occur simultaneously, but their dominating phase are different. It is clearer when looking at the data, but I think the conclusion sentences themselves are confusing to me. The authors might consider rewriting the sentences to make them more straightforward.

      Thank you very much for pointing this out. Following the reviewer’s suggestion, we revised the sentence (lines 177-182 in the manuscript).

      Reviewer #2 (Recommendations for the authors):

      It would be interesting to expand on the functional characterization of the 72 novel candidates and explore possible mechanisms for CME-mediated sorting and retargeting of TMD proteins by including a speculative model.

      Thank you very much for the comment. We agree that the further characterization of novel repair proteins and exploration of the possible mechanisms for CME-mediated TMD protein sorting and retargeting are truly important. This should be our important future direction.

      Reviewer #3 (Recommendations for the authors):

      The x-axis in Figure 1C is labeled 'Ratio' - what is this a ratio of?

      Thank you for raising this point. It is the ratio of the number of proteins associated with a GO term to the total number of proteins in the background. We clarified it in the legend of Figure 1C in the manuscript.

      References

      (1) K. Kono, A. Al-Zain, L. Schroeder, M. Nakanishi, A. E. Ikui, Plasma membrane/cell wall perturbation activates a novel cell cycle checkpoint during G1 in Saccharomyces cerevisiae. Proc Natl Acad Sci U S A 113, 6910-6915 (2016).

      (2) A. Das et al., Flippase-mediated phospholipid asymmetry promotes fast Cdc42 recycling in dynamic maintenance of cell polarity. Nat Cell Biol 14, 304-310 (2012).

      (3) M. Adnan et al., SNARE Protein Snc1 Is Essential for Vesicle Trafficking, Membrane Fusion and Protein Secretion in Fungi. Cells 12 (2023).

      (4) H.-U. Mösch, G. R. Fink, Dissection of Filamentous Growth by Transposon Mutagenesis in Saccharomyces cerevisiae. Genetics 145, 671-684 (1997).

      (5) K. Kono, Y. Saeki, S. Yoshida, K. Tanaka, D. Pellman, Proteasomal degradation resolves competition between cell polarization and cellular wound healing. Cell 150, 151-164 (2012).

      (6) A. Litsios et al., Proteome-scale movements and compartment connectivity during the eukaryotic cell cycle. Cell 187, 1490-1507.e1421 (2024).

      (7) W.-K. Huh et al., Global analysis of protein localization in budding yeast.Nature 425, 686-691 (2003).

      (8) T. López-Hernández, V. Haucke, T. Maritzen, Endocytosis in the adaptation to cellular stress. Cell Stress 4, 230-247 (2020).

    1. Reviewer #2 (Public review):

      Summary:

      Feng, Jing-Xin et al. studied the hemogenic capacity of the endothelial cells in the adult mouse bone marrow. Using Cdh5-CreERT2 in vivo inducible system, though rare, they characterized a subset of endothelial cells expressing hematopoietic markers that were transplantable. They suggested that the endothelial cells need the support of stromal cells to acquire blood-forming capacity ex vivo. These endothelial cells were transplantable and contributed to hematopoiesis with ca. 1% chimerism in a stress hematopoiesis condition (5-FU) and recruited to the peritoneal cavity upon Thioglycolate treatment. Ultimately, the authors detailed the blood lineage generation of the adult endothelial cells in a single cell fashion, suggesting a predominant HSPCs-independent blood formation by adult bone marrow endothelial cells, in addition to the discovery of Col1a2+ endothelial cells with blood-forming potential, corresponding to their high Runx1 expressing property.

      The conclusion regarding the characterization of hematopoietic-related endothelial cells in adult bone marrow is well supported by data. However, the paper would be more convincing, if the function of the endothelial cells were characterized more rigorously.

      (1) Ex vivo culture of CD45-VE-Cadherin+ZsGreen EC cells generated CD45+ZsGreen+ hematopoietic cells. However, given that FACS sorting can never achieve 100% purity, there is a concern that hematopoietic cells might arise from the ones that got contaminated into the culture at the time of sorting. The sorting purity and time course analysis of ex vivo culture should be shown to exclude the possibility.

      (2) Although it was mentioned in the text that the experimental mice survived up to 12 weeks after lethal irradiation and transplantation, the time-course kinetics of donor cell repopulation (>12 weeks) would add a precise and convincing evaluation. This would be absolutely needed as the chimerism kinetics can allow us to guess what repopulation they were (HSC versus progenitors). Moreover, data on either bone marrow chimerism assessing phenotypic LT-HSC and/or secondary transplantation would dramatically strengthen the manuscript.

      (3) The conclusion by the authors, which says "Adult EHT is independent of pre-existing hematopoietic cell progenitors", is not fully supported by the experimental evidence provided (Figure 4 and Figure S3). More recipients with ZsGreen+ LSK must be tested.

      Strengths:

      The authors used multiple methods to characterize the blood-forming capacity of the genetically - and phenotypically - defined endothelial cells from several reporter mouse systems. The polylox barcoding method to trace the adult bone marrow endothelial cell contribution to hematopoiesis is a strong insight to estimate the lineage contribution.

      Weaknesses:

      It is unclear what the biological significance of the blood cells de novo generated from the adult bone marrow endothelial cells is. Moreover, since the frequency is very rare (<1% bone marrow and peripheral blood CD45+), more data regarding its identity (function, morphology, and markers) are needed to clearly exclude the possibility of contamination/mosaicism of the reporter mice system used.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript by Feng et al. uses mouse models to study the embryonic origins of HSPCs. Using multiple types of genetic lineage tracing, the authors aimed to identify whether BM-resident endothelial cells retain hematopoietic capacity in adult organisms. Through an important mix of various labeling methodologies (and various controls), they reach the conclusion that BM endothelial cells contribute up to 3% of hematopoietic cells in young mice.

      Strengths:

      The major strength of the paper lies in the combination of various labeling strategies, including multiple Cdh5-CreER transgenic lines, different CreER lines (col1a2), and different reporters (ZsGreen, mTmG), including a barcoding-type reporter (PolyLox). This makes it highly unlikely that the results are driven by a rare artifact due to one random Cre line or one leaky reporter. The transplantation control (where the authors show no labeling of transplanted LSKs from the Cdh5 model) is also very supportive of their conclusions.

      Weaknesses:

      We believe that the work of ruling out alternative hypotheses, though initiated, was left incomplete. We specifically think that the authors need to properly consider whether there is specific, sparse labeling of HSPCs (in their native, non-transplant, model, in young animals). Polylox experiments, though an exciting addition, are also incomplete without additional controls. Some additional killer experiments are suggested.

    3. eLife Assessment

      The study proposed hemogenic endothelium in adult BM using lineage tracing. Though the study is potentially valuable, the data is incomplete due to the lack of control and insufficient analysis. There is potential for the study to be improved by further revision.

    1. eLife Assessment

      This important study demonstrates that paternal diet influences not only testicular morphology but also placental and fetal development, supporting a role for paternal contributions to offspring health. The authors combine transcriptomic and histological analyses across multiple tissues, and the evidence supporting the central conclusions is convincing. While aspects of the paternal gut phenotype remain largely descriptive, and the paternal and fetoplacental findings are discussed separately, clearer integration of these elements and additional methodological clarification would strengthen interpretation.

    2. Reviewer #1 (Public review):

      Summary:

      Morgan et al. studied how paternal dietary alteration influenced testicular phenotype, placental and fetal growth using a mouse model of paternal low protein diet (LPD) or Western Diet (WD) feeding, with or without supplementation of methyl-donors and carriers (MD). They found diet- and sex-specific effects of paternal diet alteration. All experimental diets decreased paternal body weight and the number of spermatogonial stem cells, while fertility was unaffected. WD males (irrespective of MD) showed signs of adiposity and metabolic dysfunction, abnormal seminiferous tubules, and dysregulation of testicular genes related to chromatin homeostasis. Conversely, LPD induced abnormalities in the early placental cone, fetal growth restriction, and placental insufficiency, which were partly ameliorated by MD. The paternal diets changed the placental transcriptome in a sex-specific manner and led to a loss of sexual dimorphism in the placental transcriptome. These data provide a novel insight into how paternal health can affect the outcome of pregnancies, which is often overlooked in prenatal care.

      Strengths:

      The authors have performed a well-designed study using commonly used mouse models of paternal underfeeding (low protein) and overfeeding (Western diet). They performed comprehensive phenotyping at multiple timepoints, including the fathers, the early placenta, and the late gestation feto-placental unit. The inclusion of both testicular and placental morphological and transcriptomic analysis is a powerful, non-biased tool for such exploratory observational studies. The authors describe changes in testicular gene expression revolving around histone (methylation) pathways that are linked to altered offspring development (H3.3 and H3K4), which is in line with hypothesised paternal contributions to offspring health. The authors report sex differences in control placentas that mimic those in humans, providing potential for translatability of the findings. The exploration of sexual dimorphism (often overlooked) and its absence in response to dietary modification is novel and contributes to the evidence-base for the inclusion of both sexes in developmental studies.

      Weaknesses:

      The data are overall consistent with the conclusions of the authors. The paternal and pregnancy data are discussed separately, instead of linking the paternal phenotype to offspring outcomes. Some clarifications regarding the methods and the model would improve the interpretation of the findings.

      (1) The authors insufficiently discuss their rationale for studying methyl-donors and carriers as micronutrient supplementation in their mouse model. The impact of the findings would be better disseminated if their role were explained in more detail.

      (2) It is unclear from the methods exactly how long the male mice were kept on their respective diets at the time of mating and culling. Male mice were kept on the diet between 8 and 24 weeks before mating, which is a large window in which the males undergo a considerable change in body weight (Figure 1A). If males were mated at 8 weeks but phenotyped at 24 weeks, or if there were differences between groups, this complicates the interpretation of the findings and the extrapolation of the paternal phenotype to changes seen in the fetoplacental unit. The same applies to paternal age, which is an important known factor affecting male fertility and offspring outcomes.

      (3) The male mice exhibited lower body weights when fed experimental diets compared to the control diet, even when placed on the hypercaloric Western Diet. As paternal body weight is an important contributor to offspring health, this is an important confounder that needs to be addressed. This may also have translational implications; in humans, consumption of a Western-style diet is often associated with weight gain. The cause of the weight discrepancy is also unaddressed. It is mentioned that the isocaloric LPD was fed ad libitum, while it is unclear whether the WD was also fed ad libitum, or whether males under- or over-ate on each experimental diet.

      (4) The description and presentation of certain statistical analyses could be improved.

      (i) It is unclear what statistical analysis has been performed on the time-course data in Figure 1A (if any). If one-way ANOVA was performed at each timepoint (as the methods and legend suggest), this is an inaccurate method to analyse time-course data.

      (ii) It is unclear what methods were used to test the relative abundance of microbiome species at the family level (Figure 2L), whether correction was applied for multiple testing, and what the stars represent in the figure. 3) Mentioning whether siblings were used in any analyses would improve transparency, and if so, whether statistical correction needed to be applied to control for confounding by the father.

    3. Reviewer #2 (Public review):

      Summary:

      The authors investigated the effects of a low-protein diet (LPD) and a high sugar- and fat-rich diet (Western diet, WD) on paternal metabolic and reproductive parameters and feto-placental development and gene expression. They did not observe significant effects on fertility; however, they reported gut microbiota dysbiosis, alterations in testicular morphology, and severe detrimental effects on spermatogenesis. In addition, they examined whether the adverse effects of these diets could be prevented by supplementation with methyl donors. Although LPD and WD showed limited negative effects on paternal reproductive health (with no impairment of reproductive success), the consequences on fetal and placental development were evident and, as reported in many previous studies, were sex-dependent.

      Strengths:

      This study is of high quality and addresses a research question of great global relevance, particularly in light of the growing concern regarding the exponential increase in metabolic disorders, such as obesity and diabetes, worldwide. The work highlights the importance of a balanced paternal diet in regulating the expression of metabolic genes in the offspring at both fetal and placental levels. The identification of genes involved in metabolic pathways that may influence offspring health after birth is highly valuable, strengthening the manuscript and emphasizing the need to further investigate long-term outcomes in adult offspring.

      The histological analyses performed on paternal testes clearly demonstrate diet-induced damage. Moreover, although placental morphometric analyses and detailed histological assessments of the different placental zones did not reveal significant differences between groups, their inclusion is important. These results indicate that even in the absence of overt placental phenotypic changes, placental function may still be altered, with potential consequences for fetal programming.

      Weaknesses:

      Overall, this manuscript presents a rich and comprehensive dataset; however, this has resulted in the analysis of paternal gut dysbiosis remaining largely descriptive. While still valuable, this raises questions regarding why supplementation with methyl donors was unable to restore gut microbial balance in animals receiving the modified diets.

    4. Author response:

      Public Reviews: 

      Reviewer #1 (Public review):

      Summary:

      Morgan et al. studied how paternal dietary alteration influenced testicular phenotype, placental and fetal growth using a mouse model of paternal low protein diet (LPD) or Western Diet (WD) feeding, with or without supplementation of methyl-donors and carriers (MD). They found diet- and sex-specific effects of paternal diet alteration. All experimental diets decreased paternal body weight and the number of spermatogonial stem cells, while fertility was unaffected. WD males (irrespective of MD) showed signs of adiposity and metabolic dysfunction, abnormal seminiferous tubules, and dysregulation of testicular genes related to chromatin homeostasis. Conversely, LPD induced abnormalities in the early placental cone, fetal growth restriction, and placental insufficiency, which were partly ameliorated by MD. The paternal diets changed the placental transcriptome in a sex-specific manner and led to a loss of sexual dimorphism in the placental transcriptome. These data provide a novel insight into how paternal health can affect the outcome of pregnancies, which is often overlooked in prenatal care.

      Strengths:

      The authors have performed a well-designed study using commonly used mouse models of paternal underfeeding (low protein) and overfeeding (Western diet). They performed comprehensive phenotyping at multiple timepoints, including the fathers, the early placenta, and the late gestation feto-placental unit. The inclusion of both testicular and placental morphological and transcriptomic analysis is a powerful, non-biased tool for such exploratory observational studies. The authors describe changes in testicular gene expression revolving around histone (methylation) pathways that are linked to altered offspring development (H3.3 and H3K4), which is in line with hypothesised paternal contributions to offspring health. The authors report sex differences in control placentas that mimic those in humans, providing potential for translatability of the findings. The exploration of sexual dimorphism (often overlooked) and its absence in response to dietary modification is novel and contributes to the evidence-base for the inclusion of both sexes in developmental studies.

      Weaknesses:

      The data are overall consistent with the conclusions of the authors. The paternal and pregnancy data are discussed separately, instead of linking the paternal phenotype to offspring outcomes. Some clarifications regarding the methods and the model would improve the interpretation of the findings.

      (1) The authors insufficiently discuss their rationale for studying methyl-donors and carriers as micronutrient supplementation in their mouse model. The impact of the findings would be better disseminated if their role were explained in more detail.

      We acknowledge the Reviewer’s comments regarding the amount of detail in support of the inclusion of methyl carriers and donors within our diet. Therefore, we will revise the manuscript to include more justification, especially within the Introduction section, for their inclusion.

      (2) It is unclear from the methods exactly how long the male mice were kept on their respective diets at the time of mating and culling. Male mice were kept on the diet between 8 and 24 weeks before mating, which is a large window in which the males undergo a considerable change in body weight (Figure 1A). If males were mated at 8 weeks but phenotyped at 24 weeks, or if there were differences between groups, this complicates the interpretation of the findings and the extrapolation of the paternal phenotype to changes seen in the fetoplacental unit. The same applies to paternal age, which is an important known factor affecting male fertility and offspring outcomes.

      We thank the Reviewer for their comments regarding the ages of the males analysed. We will provide more detailed descriptions of the males in our manuscript. However, all male ages were balanced across all groups.

      (3) The male mice exhibited lower body weights when fed experimental diets compared to the control diet, even when placed on the hypercaloric Western Diet. As paternal body weight is an important contributor to offspring health, this is an important confounder that needs to be addressed. This may also have translational implications; in humans, consumption of a Western-style diet is often associated with weight gain. The cause of the weight discrepancy is also unaddressed. It is mentioned that the isocaloric LPD was fed ad libitum, while it is unclear whether the WD was also fed ad libitum, or whether males under- or over-ate on each experimental diet.

      We agree with the Reviewer that the general trend towards a lighter body weight for our experimental animals is unexpected. We can confirm that all diets were fed ad libitum. However, as males were group housed, we were unable to measure food consumption for individual males. We also observed that for males fed the high fat diets, they often shredded significant quantities of their diet, rather than eating it, so preventing accurate measurement of food intake.

      We also agree with the Reviewer that body weight can be a significant confounder for many paternal and offspring parameters. However, while the experimental males did become lighter, there were no statistical differences between groups in mean body weight. As such, body weight was not included as a variable within our statistical analysis.

      (4) The description and presentation of certain statistical analyses could be improved.

      (i) It is unclear what statistical analysis has been performed on the time-course data in Figure 1A (if any). If one-way ANOVA was performed at each timepoint (as the methods and legend suggest), this is an inaccurate method to analyse time-course data.

      (ii) It is unclear what methods were used to test the relative abundance of microbiome species at the family level (Figure 2L), whether correction was applied for multiple testing, and what the stars represent in the figure. 3) Mentioning whether siblings were used in any analyses would improve transparency, and if so, whether statistical correction needed to be applied to control for confounding by the father.

      We apologize for the lack of clarity regarding the statistical analyses. Going forward, we will revise the manuscript and include a more detailed description of the different analyses, the inclusion of siblings, and the correction for multiple testing.

      Reviewer #2 (Public review):

      Summary:

      The authors investigated the effects of a low-protein diet (LPD) and a high sugar- and fat-rich diet (Western diet, WD) on paternal metabolic and reproductive parameters and fetoplacental development and gene expression. They did not observe significant effects on fertility; however, they reported gut microbiota dysbiosis, alterations in testicular morphology, and severe detrimental effects on spermatogenesis. In addition, they examined whether the adverse effects of these diets could be prevented by supplementation with methyl donors. Although LPD and WD showed limited negative effects on paternal reproductive health (with no impairment of reproductive success), the consequences on fetal and placental development were evident and, as reported in many previous studies, were sex-dependent.

      Strengths:

      This study is of high quality and addresses a research question of great global relevance, particularly in light of the growing concern regarding the exponential increase in metabolic disorders, such as obesity and diabetes, worldwide. The work highlights the importance of a balanced paternal diet in regulating the expression of metabolic genes in the offspring at both fetal and placental levels. The identification of genes involved in metabolic pathways that may influence offspring health after birth is highly valuable, strengthening the manuscript and emphasizing the need to further investigate long-term outcomes in adult offspring.

      The histological analyses performed on paternal testes clearly demonstrate diet-induced damage. Moreover, although placental morphometric analyses and detailed histological assessments of the different placental zones did not reveal significant differences between groups, their inclusion is important. These results indicate that even in the absence of overt placental phenotypic changes, placental function may still be altered, with potential consequences for fetal programming.

      Weaknesses:

      Overall, this manuscript presents a rich and comprehensive dataset; however, this has resulted in the analysis of paternal gut dysbiosis remaining largely descriptive. While still valuable, this raises questions regarding why supplementation with methyl donors was unable to restore gut microbial balance in animals receiving the modified diets.

      We thank the Reviewer for their considered thoughts on the gut dysbiosis induced in our models the minimal impact of the methyl donors and carriers. We will include additional text within the Discussion to acknowledge this. However, at this point in time, we are unsure as to why the methyl donors had minimal impact. It could be that the macronutrients (i.e. protein, fat, carbohydrates) have more of an influence on gut bacterial profiles than micronutrients. Alternatively, due to the prolonged nature of our feeding regimens, any initial influences of the methyl donors may become diluted out over time. We will amend the text to reflect these potential factors.

    1. eLife Assessment

      This valuable paper describes the regulation of the association of meiotic chromosome axis proteins on chromosome ends with sub-telomeric elements in budding yeast. The genome-wide analyses of binding of chromosome components as well as chromatin regulators, complemented with the mapping of meiotic DNA double-strand breaks on chromosome ends, provided incomplete evidence to support the authors' conclusion. The results in the paper are of interest to researchers in meiotic recombination and the structure of genomes and chromosomes.

    2. Reviewer #1 (Public review):

      Meiotic recombination at chromosome ends can be deleterious, and its initiation-the programmed formation of DSBs-has long been known to be suppressed. However, the underlying mechanisms of this suppression remained unclear. A bottleneck has been the repetitive sequences embedded within chromosome ends, which make them challenging to analyze using genomic approaches. The authors addressed this issue by developing a new computational pipeline that reliably maps ChIP-seq reads and other genomic data, enabling exploration of previously inaccessible yet biologically important regions of the genome.

      In budding yeast, chromosome ends (~20 kb) show depletion of axis proteins (Red1 and Hop1) important for recruiting DSB-forming proteins. Using their newly developed pipeline, the authors reanalyzed previously published datasets and data generated in this study, revealing heretofore unseen details at chromosome ends. While axis proteins are depleted at chromosome ends, the meiotic cohesin component Rec8 is not. Y' elements play a crucial role in this suppression. The suppression does not depend on the physical chromosome ends but on cis-acting elements. Dot1 suppresses Red1 recruitment at chromosome ends but promotes it in interior regions. Sir complex renders subtelomeric chromatin inaccessible to the DSB-forming machinery.

      The high-quality data and extensive analyses provide important insights into the mechanisms that suppress meiotic DSB formation at chromosome ends. To fully realise this value, several aspects of data presentation and interpretation should be clarified to ensure that the conclusions are stated with appropriate precision and that remaining future issues are clearly articulated.

      (1) To assess the chromosome fusion effects on overall subtelomeric suppression, authors should guide how to look at the data presented in Figure 2b-c. Based on the authors' definition of the terminal 20 kb as the suppressed region, SK1 chrIV-R and S288c chrI-L would be affected by the chromosome fusion, if any. In addition, I find it somewhat challenging to draw clear conclusions from inspecting profiles to compare subtelomeric and internal regions. Perhaps, applying a quantitative approach - such as a bootstrap-based analysis similar to those presented earlier-would be easier to interpret.

      (2) The relationship between coding density and Red1 signal needs clarification. An important conclusion from Figure 3 is that the subtelomeric depletion of Red1 primarily reflects suppression of the Rec8-dependent recruitment pathway, whereas Rec8-independent recruitment appears similar between ends and internal regions. Based on the authors' previous papers (referencess 13, 16), I thought coding (or nucleosome) density primarily influences the Rec8-independent pathway. However, the correlations presented in Figure 2d-e (also implied in Figure 3a) appear opposite to my expectation. Specifically, differences in axis protein binding between chromosome ends and internal regions (or within chromosome ends), where the Rec8-dependent pathway dominates, correlate with coding density. In contrast, no such correlation is evident in rec8Δ cells, where only the Rec8-independent pathway is active and end-specific depletion is absent. One possibility is that masking coding regions within Y' elements influences the correlation analysis. Additional analysis and a clearer explanation would be highly appreciated.

      (3) The Dot1-Sir3 section staring from L266 should be clarified. I found this section particularly difficult to follow. It begins by stating that dot1∆ leads to Sir complex spreading, but then moves directly to an analysis of Red1 ChIP in sir3∆ without clearly articulating the underlying hypothesis. I wonder if this analysis is intended to explain the differences observed between dot1∆ and H3K79R mutants in the previous section. I also did not get the concluding statement - Dot1 counteracts Sir3 activity. As sir3Δ alone does not affect subtelomeric suppression, it is unclear what Dot1 counteracts. Perhaps, explicitly stating the authors' working model at the outset of this section would greatly clarify the rationale, results, and conclusions.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Raghavan and his colleagues sought to identify cis-acting elements and/or protein factors that limit meiotic crossover at chromosome ends. This is important for avoiding chromosome rearrangements and preventing chromosome missegregation.

      By reanalyzing published ChIP datasets, the researchers identified a correlation between low levels of protein axis binding - which are known to modulate homologous recombination - and the presence of cis-acting elements such as the subtelomeric element Y' and low gene density. Genetic analyses coupled with ChIP experiments revealed that the differential binding of the Red1 protein in subtelomeric regions requires the methyltransferase Dot1. Interestingly, Red1 depletion in subtelomeric regions does not impact DSB formation. Another surprising finding is that deleting DOT1 has no effect on Red1 loading in the absence of the silencing factor Sir3. Unlike Dot1, Sir3 directly impacts DSB formation, probably by limiting promoter access to Spo11. However, this explains only a small part of the low levels of DSBs forming in subtelomeric regions.

      Strengths:

      (1) This work provides intriguing observations, such as the impact of Dot1 and Sir3 on Red1 loading and the uncoupling of Red1 loading and DSB induction in subtelomeric regions.

      (2) The separation of axis protein deposition and DSB induction observed in the absence of Dot1 is interesting because it rules out the possibility that the binding pattern of these proteins is sufficient to explain the low level of DSB in subtelomeric regions.

      (3) The demonstration that Sir3 suppresses the induction of DSBs by limiting the openness of promoters in subtelomeric regions is convincing.

      Weaknesses:

      (1) The impact of the cis-encoded signal is not demonstrated. Y' containing subtelomeres behave differently from X-only, but this is only correlative. No compelling manipulation has been performed to test the impact of these elements on protein axis recruitment or DSB formation.

      (2) The mechanism by which Dot1 and Sir3 impact Red1 loading is missing.

      (3) Sir3's impact on DSB induction is compelling, yet it only accounts for a small proportion of DSB depletion in subtelomeric regions. Thus, the main mechanisms suppressing crossover close to the ends of chromosomes remain to be deciphered.

    4. Reviewer #3 (Public review):

      Summary:

      The paper by Raghavan et. al. describes pathways that suppress the formation of meiotic DNA double-strand breaks (DSBs) for interhomolog recombination at the end of chromosomes. Previously, the authors' group showed that meiotic DSB formation is suppressed in a ~20kb region of the telomeres in S. cerevisiae by suppressing the binding of meiosis-specific axis proteins such as Red1 and Hop1. In this study, by precise genome-wide analysis of binding sites of axis proteins, the authors showed that the binding of Red1 and Hop1 to sub-telomeric regions with X and Y' elements is dependent on Rec8 (cohesin) and/or Hop1's chromatin-binding region (CBR). Furthermore, Dot1 functions in a histone H3K79 trimethylation-independent manner, and the silencing proteins Sir2/3 also regulate the binding of Red1 and Hop1 and also the distribution of DSBs in sub-telomeres.

      Strengths:

      The experiments were conducted with high quality and included nice bioinformatic analyses, and the results were mostly convincing. The text is easy to read.

      Weaknesses:

      The paper did not provide any new mechanistic insights into how DSB formation is suppressed at sub-telomeres.

    1. eLife Assessment

      This important study reveals intriguing connections between chromosome breakage and DNA elimination during programmed genome rearrangement in the ciliate Tetrahymena thermophila. By developing a novel FISH approach that distinguishes germline and somatic telomeres, the authors provide compelling evidence that chromosome breakage removes germline telomeres along with hundreds of kilobases of germline-limited sequences. By disrupting a single chromosome breakage site, they further showed that DNA elimination was globally affected, which opens up a new direction for mechanistic studies. Thus, this work reveals additional similarity between the programmed DNA elimination in ciliates and nematodes that underlies the transition from germline to somatic telomeres.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, Nagao and Mochizuki examine the fate of germline chromosome ends during somatic genome differentiation in the ciliate Tetrahymena thermophila. During sexual reproduction, a new somatic genome is created from a zygotic, germline-derived genome by extensive programmed DNA elimination events. It has been known for some time that the termini of the germline chromosomes are eliminated, but the exact process and kinetics of the elimination events have not been thoroughly investigated. The authors first use germline-specific telomere probes to show that the loss of these chromosome ends occurs with similar timing as other DNA elimination events. By comparative analysis of the assembled germline and somatic genomes, the authors find that the ends of each of the germline chromosomes are composed of a few hundred kilobases of micronuclear limited sequences (MLS) that are removed starting around 14 hours after the start of conjugation, which initiates sexual development. They then develop an in situ hybridization assay to track the fate of one end of chromosome 4 while simultaneously following the adjacent macronuclear destined sequence (MDS) retained in the new somatic genome. This allows the authors to more clearly show that these adjacent chromosomal segments are initially amplified in the developing genome before the terminal MLS is eliminated. Finally, they mutate the chromosome breakage sequence (CBS) that normally separates the MLS terminus from the adjacent MDS region, to show that strains that develop with only one mutant chromosome can produce viable sexual progeny, but it appears that both the MLS and the MDS from the mutant chromosome are lost. If both chromosome copies have the CBS mutation, the cells arrest during development and do not eliminate many germline-limited sequences and fail to produce viable progeny. Overall, this study provides many new insights into the fate of germline chromosome ends during somatic genome remodeling and suggests extensive coordination of different DNA elimination events in Tetrahymena.

      Strengths:

      Overall, the experiments were well executed with appropriate controls. The findings are generally robust. Importantly, the study provides several novel findings. First, the authors provide a fairly comprehensive characterization of the size of the MLS at the end of each germline chromosome. I'm not sure whether this has been published elsewhere. Second, the authors develop a novel method to study the fate of chromosome termini during development and use it to conclusively track the elimination of these termini. Third, the authors show that the elimination of these termini appears to occur concurrently with most other DNA elimination events during somatic genome differentiation. And fourth, the authors show that failure to separate these eliminated sequences from the normally retained chromosome alters the fate of these adjacent MDS and the loss of the cells' ability to produce viable progeny.

      Weaknesses:

      It appears the authors did extensive analysis of the MLS chromosome ends, but did not provide too much information related to their composition. If this has not been published elsewhere, it would be useful to describe the proportion of unique and repetitive sequences and provide more information about the general composition of the chromosome ends. Such information would help the reader understand the nature of these MLS and how they may or may not differ from other eliminated sequences. Although the development of the novel FISH probes for large chromosome ends allowed for these novel discoveries, the signal in several images was visible, but often quite faint. I'm not sure there is anything the authors could do to improve the signal-to-noise ratio, but one needs to stare at the images carefully to understand the findings. One main weakness in the opinion of this reviewer is that the authors did very little to understand why, when a terminal MLS and the adjacent MDS fail to get separated because of failure in chromosome breakage, both segments are eliminated. The authors propose that possibly essential genes in the MDS get silenced, and the resulting lack of gene expression is the issue, but this and other possibilities were not tested. The study would provide more mechanistic insight if they had tried to assess whether the MDS on the CBS mutant chromosome becomes enriched in silencing modifications (e.g., H3K9me3). Alternatively, the authors could have examined changes in gene expression for some of the loci on the neighbouring MDS. The other main weakness is that since the authors only mutated the end of one germline chromosome, it is not clear whether the elimination of the MDS adjacent to the terminal MLS on chromosome 4 when the CBS is mutated is a general phenomenon, i.e., would happen at all chromosome ends, or is unique to the situation at Chromosome 4R. Knowing whether it is a general phenomenon or not would provide important insight into the authors' findings.

    3. Reviewer #2 (Public review):

      Summary:

      Nagao and Mochizuki investigated how the germline (MIC) telomere was removed during programmed genome rearrangement in the developing somatic nucleus (MAC). Using an optimized oligo-FISH procedure, the authors demonstrated that MIC telomeres were co-eliminated with a large region of MIC-limited sequences (MLS) demarcated on the opposite side by a sub-telomeric chromosome breakage site (CBS). This conclusion was corroborated by the latest assembly of the Tetrahymena MIC genome. They further employed CRISPR-Cas9 mutagenesis to disrupt a specific sub-telomeric CBS (4R-CBS). In uniparental progeny (mutant X WT), DNA elimination of the sub-telomeric MLS was not affected, but the adjacent MAC-destined sequence (MDS) may be co-eliminated. However, in biparental progeny (mutant X mutant), global DNA elimination was arrested, revealing previously unrecognized connections between chromosome breakage and DNA elimination. It also paves the way for future studies into the underlying molecular mechanisms. The work is rigorous, well-controlled, and offers important insights into how eukaryotic genomes demarcate genic regions (retained DNA) and regions derived from transposable elements (TE; eliminated DNA) during differentiation. The identification of chromosome breakage sequences as barriers preventing the spread of silencing (and ultimately, DNA elimination) from TE-derived regions into functional somatic genes is a key conceptual contribution.

      Strengths:

      New method development: Oligo-FISH in Tetrahymena. This allows high-resolution visualization of critical genome rearrangement events during MIC-to-MAC differentiation. This method will be a very powerful tool in this area of study.

      Integration of cytological and genomic data. The conclusion is strongly supported by both analyses.

      Rigorous genetic analysis of the role played by 4R-CBS in separating the fate of sub-telomeric MLS (elimination) and MDS (retention). DNA elimination in ciliates has long been regarded as an extreme form of gene silencing. Now, chromosome breakage sequences can be viewed as an extreme form of gene insulators.

      Weaknesses:

      The finding of global disruption of DNA elimination in 4R-CBS mutant progeny is highly intriguing, but it's mostly presented as a hypothesis in the Discussion. The authors propose that the failure to separate MLS from MDS allows aberrant heterochromatin spreading from the former into the latter, potentially silencing genes required for DNA elimination itself. While supported by prior literature on heterochromatin feedback loops, the specific targets silenced are not identified. While results from ChIP-seq and small RNA-seq can greatly strengthen the paper, the reviewer understands that direct molecular characterization may be beyond the scope of the current work.

    4. Reviewer #3 (Public review):

      Programmed DNA elimination (PDE) is a process that removes a substantial amount of genomic DNA during development. While it contradicts the genome constancy rule, an increasing number of organisms have been found to undergo PDE, indicating its potential biological function. Single-cell ciliates have been used as a prominent model system for studying PDE, providing important mechanistic insights into this process. Many of those studies have focused on the excision of internally eliminated sequences (IES) and the subsequent repair using non-homologous end joining (NHEJ). These studies have led to the identification of small RNAs that mark retained or eliminated regions and the transposons that generate double-strand breaks.

      In this manuscript, Nagao and Mochizuki examined the other type of breaks in ciliates that were healed with telomere addition. They specifically focused on the sequences at the ends of the germline (MIC) chromosomes, which have received relatively less attention due to the technical challenges associated with the highly repetitive nature of the sequences. The authors used the Tetrahymena model and developed a set of new tools. They used a novel FISH strategy that enables the distinction between germline and somatic telomeres, as well as the retained and eliminated DNA near the chromosome ends. This allows them to track these sequences at the cellular level throughout the development process, where PDE occurs. They also analyzed the more comprehensive germline and somatic genomes and determined at the sequence level the loss of subtelomeric and telomere sequences at all chromosome ends. Their result is reminiscent of the PDE observed in nematodes, where all germline chromosome ends are removed and remodeled. Thus, the finding connects two independent PDE systems, a protozoan and a metazoan, and suggests the convergent evolution of chromosome end removal and remodeling in PDE.

      The majority of sites (8/10) at the junctions of retained and eliminated DNA at the chromosome ends contain a chromosome breakage sequence (CBS). The authors created a set of mutants that modify the CBS at the ends of chromosome 4R. CBS regions are challenging for CRISPR due to their AT-rich sequences, making the creation of the 4R-CBS mutants a significant breakthrough. They used the FISH assay to determine if PDE still occurs in these mutant strains with compromised CBS. Surprisingly, they found that instead of blocking PDE, its adjacent retained DNA is now eliminated, suggesting a co-elimination event when the breakage is impaired. Furthermore, in biparental mutant crosses, no PDE occurred, and no viable progeny were produced, indicating that the removal of chromosome ends is crucial for proper PDE and sexual progeny development. Overall, the work demonstrates a critical role for 4R-CBS in separating retained and eliminated DNA.

    1. eLife Assessment

      This important study presents new insights into the post-transcriptional mechanisms that govern cortical development. Through state-of-the-art methodology to track neuronal birth order, the data provide compelling evidence that Imp1 (Igf2bp1/Zbp1) orchestrates radial glia fate transitions and cortical neurogenesis. The findings establish a new framework for understanding how post-transcriptional mechanisms integrate with transcriptional and epigenetic regulatory layers to control cortical temporal patterning.

    2. Reviewer #1 (Public review):

      Summary:

      A hallmark of cortical development is the temporal progression of lineage programs in radial glia progenitors (RGs) that orderly generate a large set of glutamatergic projection neuron types, which are deployed to the cortex in a largely inside-out sequence. This process is thought to contribute to the formation of proper cortical circuitry, but the underlying cellular and molecular mechanisms remain poorly understood. To a large extent, this is due to technical limitations that can fate map RGs and their progeny with cell type resolution, and manipulate gene expression with proper cell and temporal resolution. Building on the TEMPO technique that Tsumin Lee group developed, here Azur et al show that the RNA binding protein Imp1 functions as a dosage- and stage-dependent post-transcriptional mechanism that orchestrates developmental stage transitions in radial glial progenitors, and controls neuronal fate decisions and spatial organization of neuronal and glial cell progeny. Their results suggest that while transcriptional regulators define available cellular states and gate major transitions, post-transcriptional mechanisms like Imp1 provide an additional layer of control by modulating stage-specific transcript stability. Imp1 thus acts as a temporal coordinator whose dosage and timing determine whether developmental transitions are temporarily delayed or blocked. These findings establish a new framework for understanding how post-transcriptional mechanisms integrate with transcriptional and epigenetic regulatory layers to control cortical temporal patterning.

      Strengths:

      The authors apply a novel genetic fate mapping and gene manipulation technology (TEMPO) with cellular resolution. This reveals a dosage- and stage-dependent post-transcriptional mechanism that orchestrates developmental stage transitions in radial glial progenitors, and controls neuronal fate decisions and spatial organization of neuronal and glial cell/astrocyte progeny.

      Weaknesses:

      The endogenous developmental expression pattern of Imp1 and TEMPO-mediated overexpression are not well described or characterized with cellular resolution (whether only in radial glial cells or also in post-mitotic neurons). Thus, the interpretations of the overexpression phenotypes are not always clear.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Azur et al seek to determine the role of Imp1/Igf2bp1 in regulating the temporal generation of cortical neuron types. The authors showed that overexpression of Imp1 changes the laminar distribution of cortical neurons and suggest that Imp1 plays a temporal role in specifying cell fates.

      Strengths:

      The study uniquely used TEMPO to investigate the temporal effects of Imp1/Igf2bp1 in cortical development. The disrupted laminar distribution and delayed fate transition are interesting. The results are presented with proper quantification, they are generally well interpreted, and suggest important roles for Imp1.

      Weaknesses:

      (1) While the results suggest Imp1 is important in regulating cortical neurogenesis, it remains unclear when and where it is expressed to execute such temporal functions. For instance, where is Imp1 expressed in the developing brain? Is it specific to the radial glial cells or ubiquitous in progenitors and neurons? Does it show temporal expression in RGCs?

      (2) The advantage and interpretation of TEMPO need further clarification. TEMPO is an interesting method and appears useful in simultaneously labelling cells and controlling gene expression. Since the reporter, Cas9, and gRNA triggers are all driven by ubiquitous promoters and integrated into the genome using piggyBac, it appears logical that the color transition should happen in all cells over time. The color code appears to track the time when the plasmids got integrated instead of the birthday of neurons. Is this logically true? If the TEMPO system is introduced into postmitotic neurons and the CAG promoter is not silenced, would the tri-color transition happen?

      (3) The accumulation of neurons at the subplate region would benefit from showing larger views of the affected hemisphere. IUE is invasive. The glass pipette may consistently introduce focal damages and truncate RGCs. It is important to examine slices covering the whole IUE region.

    4. Reviewer #3 (Public review):

      Summary:

      The work by Azur and colleagues makes use of the TEMPO (Temporal Encoding and Manipulation in a Predefined Order) methodology to trace cortical neurogenesis in combination with overexpression of Imp1. Imp1 is a mammalian homologue of the Drosophila Imp, which has been shown to control temporal identity in a stem cell context. In their work, they show that overexpression of Imp1 in radial glia, which generate neurons and macroglia in a sequential manner during cortical development, leads to a disruption of faithful neuron/glia generation. They show that continuous overexpression leads to a distinct phenotypic outcome when compared to paradigms where Imp1 was specifically overexpressed in defined temporal windows, enabled by the unique TEMPO approach. Interestingly, the observed phenotype with 'ectopic' generation of mainly lower cortical layer neurons appears not to be due to migration deficits. Strikingly, the overexpression of Imp1 specifically at later stages also leads to ectopic glia-like foci throughout the developing cortical plate. Altogether, the new data provide new insights regarding the role of the post-transcriptional Imp1 regulator in controlling temporal fate in radial glia for the faithful generation of neurons and glia during cortical development.

      Strengths:

      The TEMPO approach provides excellent experimental access to probe Imp1 gene function at defined temporal windows. The data is very robust and convincing. The overexpression paradigm and its associated phenotypes match very well the expected outcome based on Imp1 loss-of-function. Overall, the study contributes significantly to our understanding of the molecular cues that are associated with the temporal progression of radial glia fate potential during cortical development.

      Weaknesses:

      The authors provide some experimental evidence, including live imaging, that deficits related to Imp1 overexpression and subsequent overabundance of lower-layer neurons, or accumulation at the subplate, appear to evolve independently of neuronal migration deficits. However, the analysis at the population level might not suffice to make the claim robust. To analyze neuronal migration in more depth, the authors could trace individual neurons and establish speed and directional parameters for comparison.

      In their analysis, the authors mainly rely on temporal parameters/criteria to associate the generation of certain neuron fates. While two markers were used to identify the neuronal fate, the variance seems quite high. The authors could consider utilizing an antibody against Satb2, which would provide additional data points that could help to establish statistical significance in some of the analyses.

      The analysis of glia was done at postnatal day 10, although gliogenesis and, in particular, astrocyte maturation last at least until postnatal day 28. The authors could consider extending their analysis to capture the full spectrum of their astrocyte phenotype.

    1. eLife Assessment

      This is a well-executed intrathecal MRI tracer study that provides valuable early in vivo evidence for CSF drainage into human skull bone marrow and explores clinically relevant associations using robust imaging methodology and regional analyses. However, the evidence supporting the interpretation of early (4 h) tracer signal as impaired clearance is incomplete, and appears difficult to reconcile with established CSF tracer kinetics. They also note that the reported links to sleep and cognitive performance are weakened by reliance on subjective, retrospective questionnaires rather than objective physiological measurements.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript examines the passage of an intrathecal CSF tracer into skull bone marrow, cortex, and venous compartments using serial MRI at multiple time points. The study builds on recent anatomical and imaging work suggesting direct communication between CSF spaces and bone marrow in the skull. It extends these observations to a larger, clinically heterogeneous human cohort. The imaging methodology is carefully executed, and the dataset is rich. The findings are potentially important for understanding CSF drainage pathways and their associations with inflammation, sleep quality, and cognition. However, key aspects of the interpretation - particularly regarding tracer kinetics and the definition of "clearance" - require clarification and, in my view, reconsideration.

      Strengths:

      (1) The study employs a well-established intrathecal contrast-enhanced MRI approach with multiple post-injection time points, enabling the assessment of regional tracer dynamics.

      (2) The analysis of skull bone marrow in distinct anatomical regions (near the superior sagittal sinus, lateral fissure, and cisterna magna) is novel and informative.

      (3) The cohort size is relatively large for an intrathecal tracer study in humans, and the authors make commendable efforts to relate imaging findings to clinical variables such as inflammation, sleep quality, and cognitive performance.

      (4) The manuscript is clearly written, the figures are informative, and the discussion is well grounded in recent anatomical and experimental literature on skull-meningeal connections.

      Weaknesses:

      The central interpretation that a higher percentage increase in skull bone marrow tracer signal at 4.5 hours reflects reduced clearance is not convincingly justified. Based on the existing CSF tracer literature, the 4-6 hour time window is generally considered an enrichment or inflow phase rather than a clearance phase. Later time points (15 and 39 hours) are more likely to reflect clearance or washout. An alternative interpretation - that a higher signal at 4.5 hours reflects more pronounced tracer entry - should be considered and discussed.

      Relatedly, the manuscript lacks a clear conceptual separation between tracer enrichment and clearance phases across time points. If 4.5 hours is intended to represent clearance, this assumption requires more vigorous justification and alignment with prior work.

      CSF passage via the nasal/olfactory pathway is insufficiently discussed. Previous human imaging studies have questioned the importance of peri-olfactory CSF clearance, yet the present findings suggest delayed enrichment in the nasal turbinates. This discrepancy should be explicitly addressed, including a discussion of potential methodological limitations (e.g., timing of acquisitions, ROI definition, or sensitivity to slow drainage pathways).

      More generally, given the descriptive nature of the study and the limited temporal sampling, some conclusions regarding directionality and efficiency of "drainage" may be overstated and would benefit from more cautious framing.

    3. Reviewer #2 (Public review):

      Summary

      Zhou et al. utilize longitudinal, intrathecal contrast-enhanced MRI to investigate a novel physiological pathway: the drainage of cerebrospinal fluid (CSF) into the human skull bone marrow. By mapping tracer enrichment across 87 patients at multiple time points, the authors identify regional variations in drainage speed and link these dynamics to systemic factors like aging, hypertension, and diabetes. Most notably, the study suggests that this drainage function serves as a significant mediator between sleep quality and cognitive performance.

      Strengths

      (1) The study provides a significant transition from murine models to human subjects, showing that CSF-to-marrow communication is a broader phenomenon in clinical cohorts.

      (2) The use of four imaging time points (0h to 39h) allows for a precise characterization of tracer kinetics, revealing that the parietal region near the superior sagittal sinus (SSS) is a rapid exit route.

      (3) The statistical finding that skull bone marrow drainage accounts for approximately 38% of the link between sleep and cognition provides a provocative new target for neurodegenerative research.

      Weaknesses

      (1) Figure 1: The figure relies on a single representative brain to illustrate a process that likely varies significantly across different skull anatomies and disease states. In the provided grayscale MRI scans, the tracer enrichment is essentially imperceptible to the naked eye. Without heatmaps or digital subtraction maps (Post-injection minus Baseline) for the entire cohort, it is difficult to substantiate the quantitative "percentage change" data visually.

      Reliance on a single, manually placed circular Region of Interest (ROI) is susceptible to sampling bias. A more robust approach would involve averaging multiple ROIs per region (multi-sampling) to ensure the signal is representative of the whole marrow compartment.

      (2) Methodological Rigor of Sleep Analysis: The study relies exclusively on the self-reported Pittsburgh Sleep Quality Index (PSQI), which is retrospective and highly prone to recall bias, particularly in a cohort with cognitive impairment. There is no objective verification of sleep (e.g., actigraphy or polysomnography). Since waste clearance is physiologically tied to specific stages, such as Slow-Wave Sleep, subjective scores cannot determine whether drainage is linked to sleep physiology or reflects a higher general disease burden. The MRI captures an acute state during hospitalization, whereas the sleep quality reported covers the month preceding admission. This mismatch complicates the claim that the current drainage function directly reflects historical sleep quality.

      Appraisal and Impact

      The authors demonstrate the feasibility of monitoring CSF-to-skull marrow drainage in humans. However, the strength of the associations with sleep and cognition is currently attenuated by a lack of visual "proof" in the raw data and a reliance on subjective behavioral metrics. If these technical gaps are explicitly addressed through the use of population heatmaps and more rigorous multi-ROI sampling, this work will significantly advance our understanding of the brain's waste-clearance systems and their role in systemic health.

    4. Reviewer #3 (Public review):

      Summary:

      In this study, the authors injected a contrast agent into patients and followed the induced signal change with MRI. Doing so, they observed cerebrospinal fluid (CSF) drainage whose magnitude and dynamics varied by anatomical location and scaled with a range of cognitive and socio-demographic metrics, including sleep scores and sex.

      Strengths:

      I would first like to stress that I am not a specialist in the topic of that paper; so my comments should be taken with a grain of salt, and feedback from the other reviewers should also be carefully considered.

      I found the text concise and the figures straightforward to understand. Although they are manually defined, the authors compared drainage across different anatomical locations, which is a positive feature. Albeit purely correlative, the attempt to connect these otherwise 'peripheral' measures to cognitive variables is quite interesting. I also particularly liked the last paragraph of the discussion, which listed the main limitations of the study.

      Weaknesses:

      In the paragraph starting at line 446, the authors interpret poor sleep quality as being a cause and a consequence of impaired CSF clearance, but their approach is purely correlational. In other words, a third variable could be driving both of these parameters (correct?), thereby explaining their correlation. Later, they also proposed that therapeutically altering CSF clearance could improve cognitive symptoms, but, again, if there's a hidden cause of the correlation, that does not seem like a valid possibility. I believe there were other instances of this sort of inferential problem in the Discussion. It seems essential, particularly in clinical research, to precisely identify what the available evidence supports (correlation) and what is speculation (causation).

      Assuming I did not miss it, the approach for testing and reporting correlations is not specified. In particular, the authors report correlation with CSF drainage and a variety of other metrics. But how many tests did the authors perform? They solely mention that they used the Benjamini-Hochberg method to correct for multiple comparisons. How were the decisions to test for this or that effect determined? Or did they test all the metrics they had? Also, that particular correction method is limited when statistics are negatively correlated. It would be helpful to validate findings with another approach.

      I assume many of the metrics the authors use are also correlated with one another. Is it possible that a single principal component is driving the different correlations they see? Performing dimensionality reduction across available metrics and relating the resulting principal components to CSF drainage would help clarify the forces at play here.

      In their interpretations, the authors claim that the CSF drainage they observe occurs through the bone marrow of the skull. How confident can we be in that claim? Is it that there are no other likely possibilities? It might be an unnecessary question, but given there seems to be no causal intervention (which is fine), and no consideration of alternatives, I am wondering whether this is because other possibilities are improbable or whether they were not adequately considered.

    1. eLife Assessment

      This valuable work describes a computational and experimental workflow that turns a moderately stable α-helical bundle into a very stable fold. The authors advance our understanding of α-helix stabilization and provide a convenient framework with implications for the protein design field. The main claims are supported by convincing evidence through sound and well-validated methods, yet further characterization would strengthen specific conclusions for the design of mechanically, thermally, and chemically stable α-helical bundles.

    2. Reviewer #1 (Public review):

      Summary:

      In the work from Qiu et al., a workflow aimed at obtaining the stabilization of a simple small protein against mechanical and chemical stressors is presented.

      Strengths:

      The workflow makes use of state-of-the-art AI-driven structure generation and couples it with more classical computational and experimental characterizations in order to measure its efficacy. The work is well presented, and the results are thorough and convincing.

      Weaknesses:

      I will comment mostly on the MD results due to my expertise.

      The Methods description is quite precise, but is missing some important details:

      (1) Version of GROMACS used.

      (2) The barostat used.

      (3) pH at which the system is simulated.

      (4) The pulling is quite fast (but maybe it is not a problem)

      (5) What was the value for the harmonic restraint potential? 1000 is mentioned for the pulling potential, but it is not clear if the same value is used for the restraint, too, during pulling.

      (6) The box dimensions.

      From this last point, a possible criticism arises: Do the unfolded proteins really still stay far enough away from themselves to not influence the result? This might not be the major influence, but for correctness, I would indicate the dimensions of the box in all directions and plot the minimum distance of the protein from copies of itself across the boundary conditions over time.

      Additionally, no time series are shown for the equilibration phases (e.g., RMSD evolution over time), which would empower the reader to judge the equilibration of the system before either steered MD or annealing MD is performed.

    3. Reviewer #2 (Public review):

      Summary:

      Qiu, Jun et. al., developed and validated a computational pipeline aimed at stabilizing α-helical bundles into very stable folds. The computational pipeline is a hierarchical computational methodology tasked to generate and filter a pool of candidates, ultimately producing a manageable number of high-confidence candidates for experimental evaluation. The pipeline is split into two stages. In stage I, a large pool of candidate designs is generated by RFdiffusion and ProteinMPNN, filtered down by a series of filters (hydropathy score, foldability assessed by ESMFold and AlphaFold). The final set is chosen by running a series of steered MD simulations. This stage reached unfolding forces above 100pN. In stage II, targeted tweaks are introduced - such as salt bridges and metal ion coordination - to further enhance the stability of the α-helical bundle. The constructs undergo validation through a series of biophysical experiments. Thermal stability is assessed by CD, chemical stability by chemical denaturation, and mechanical stability by AFM.

      Strengths:

      A hierarchical computational approach that begins with high-throughput generation of candidates, followed by a series of filters based on specific goal-oriented constraints, is a powerful approach for a rapid exploration of the sequence space. This type of approach breaks down the multi-objective optimization into manageable chunks and has been successfully applied for protein design purposes (e.g., the design of protein binders). Here, the authors nicely demonstrate how this design strategy can be applied to successfully redesign a moderately stable α-helical bundle into an ultrastable fold. This approach is highly modular, allowing the filtering methods to be easily swapped based on the specific optimization goals or the desired level of filtering.

      Weaknesses:

      Assessing the change in stability relative to the WT α-helical bundle is challenging because an additional helix has been introduced, resulting in a comparison between a three-helix bundle and a four-helix bundle. Consequently, the appropriate reference point for comparison is unclear. A more direct and informative approach would have been to redesign the original α-helical bundle of the human spectrin repeat R15, allowing for a more straightforward stability comparison.

      While the authors have shown experimentally that stage II constructs have increased the mechanical stability by AFM, they did not show that these same constructs have increased the thermal and chemical stabilities. Since the effects of salt bridges on stability are highly context dependent (orientation, local environment, exposed vs buried, etc.), it is difficult to assess the magnitude of the effect that this change had on other types of stabilities.

      The three constructs chosen are 60-70% identical to each other, either suggesting overconstrained optimization of the sequence or a physical constraint inherent to designing ultrastable α-helical bundles. It would be interesting to explore these possible design principles further.

      While the use of steered MD is an elegant approach to picking the top N most stable designs, its computational cost may become prohibitive as the number of designs increases or as the protein size grows, especially since it requires simulating a water box that can accommodate a fully denatured protein.

    4. Reviewer #3 (Public review):

      Summary:

      Qiu et al. present a hierarchical framework that combines AI and molecular dynamics simulation to design an α-helical protein with enhanced thermal, chemical, and mechanical stability. Strategically, chemical modification by incorporating additional α-helix, site-specific salt bridges, and metal coordination further enhanced the stability. The experimental validation using single-molecule force spectroscopy and CD melting measurements provides fundamental physical chemical insights into the stabilization of α-helices. Together with the group's prior work on super-stable β strands (https://www.nature.com/articles/s41557-025-01998-3), this research provides a comprehensive toolkit for protein stabilization. This framework has broad implications for designing stable proteins capable of functioning under extreme conditions.

      Strengths:

      The study represents a complete framework for stabilizing the fundamental protein elements, α-helices. A key strength of this work is the integration of AI tools with chemical knowledge of protein stability.<br /> The experimental validation in this study is exceptional. The single-molecule AFM analysis provided a high-resolution look at the energy landscape of these designed scaffolds. This approach allows for the direct observation of mechanical unfolding forces (exceeding 200 pN) and the precise contribution of individual chemical modifications to global stability. These measurements offer new, fundamental insights into the physicochemical principles that govern α-helix stabilization.

      Weaknesses:

      (1) The authors report that appending an additional helix increases the overcall stability of the α-helical protein. Could the author provide a more detailed structural explanation for this? Why does the mechanical stability increase as the number of helixes increase? Is there a reported correlation between the number of helices (or the extent of the hydrophobic core) and the stability?

      (2) The author analyzed both thermal stability and mechanical stability. It would be helpful for the author to discuss the relationship between these two parameters in the context of their design. Since thermal melting probes equilibrium stability (ΔG), while mechanical stability probes the unfolding energy barriers along the pulling coordinate.

      (3) While the current study demonstrates a dramatic increase in global stability, the analysis focuses almost exclusively on the unfolding (melting) process. However, thermodynamic stability is a function of both folding (kf) and unfolding (ku) rates. It remains unclear whether the observed ultrastability is primarily driven by a drastic decrease in the unfolding rate (ku) or if the design also maintains or improves the folding rate (kf)?

      (4) The authors chose the spectrin repeat R15 as the starting scaffold for their design. R15 is a well-established model known for its "ultra-fast" folding kinetics, with folding rates (kf ~105s), near three orders of magnitude faster than its homologues like R17 (Scott et.al., Journal of molecular biology 344.1 (2004): 195-205). Does the newly designed protein, with its additional fourth helix and site-specific chemical modifications, retain the exceptionally high folding rate of the parent R15?

    1. eLife Assessment

      This study provides valuable findings on how the activity of the E3 ubiquitin ligase Highwire (Hiw/Phr1) is regulated and its impact on synaptic growth. The authors propose that impaired endocytosis leads to condensation of Hiw, resulting in increased synaptic growth. They also integrate such a mechanism within the known JNK (c-JUN N-terminal Kinase) and BMP (Bone Morphogenetic Protein) signalling pathways involved in synapse regulation. While the work raises an interesting mechanistic framework, several aspects of the experimental design and methodology are incomplete, and key conclusions, particularly those regarding the liquid-liquid phase separation of the E3 ubiquitin ligase, are not fully supported by the presented data.

    2. Joint Public Review:

      Pippadpally et al. investigate how the conserved E3 ubiquitin ligase Highwire (Hiw/Phr1), a well-established negative regulator of synaptic growth, is functionally and spatially regulated. Using a GFP-tagged Hiw transgene in Drosophila, the authors report that disruption of endocytosis via loss of AP-2, synaptojanin, or Rab11-mediated recycling endosome function leads to accumulation of Hiw in neuronal cell bodies as enlarged foci, altogether accompanied by synaptic overgrowth. Provided that the Hiw foci are sensitive to aliphatic alcohol treatment, the authors propose that impaired endocytosis promotes liquid-liquid phase separation of the E3 ubiquitin ligase, reducing its ability to degrade the MAPKKK Wallenda and thereby activating JNK signalling. Crosstalk with BMP signalling and roles for autophagy are also explored within this framework.

      Strengths

      The work provides a novel tool, the GFP-tagged Hiw transgene, to study the spatio-temporal regulation of the E3 ubiquitin ligase Highwire (Hiw/Phr1) in Drosophila, and its impact on synaptic growth. The results presented point to a potentially thought-provoking connection between endocytic defects, Hiw condensation, Hiw down-regulation and synaptic overgrowth. The specific effects of the endocytic mutants on the redistribution of the Hiw to the neuronal cell body and the genetic interactions between the endocytosis and JNK pathway mutants are convincing.

      Weaknesses

      Several conclusions are insufficiently supported at this point. For example, evidence that the Hiw foci represent bona fide liquid-liquid phase (LLP) separated condensates is limited. Sensitivity to 1,6-hexanediol is not definitive proof of their liquid condensate nature, and their recovery kinetics after 1,6-hexanediol wash-out and their morphology are inconsistent with a pure liquid behaviour. Furthermore, the claim that the Hiw foci are non-vesicular is not strongly supported, as it is only based on the lack of colocalization with a handful of endosomal proteins.

      Importantly, the appearance of the putative condensates is correlative rather than causative for synaptic overgrowth, and in the absence of a mechanistic link between endocytosis and Hiw condensation, the causality is difficult to address. Of note is that the putative condensates are already present (albeit to a lesser extent) in the absence of endocytic defects and that the conclusions rely heavily on overexpressed GFP-Hiw, which may perturb normal protein behaviour and artificially induce condensation or aggregation.

      The use of hypomorphic mutants in genetic experiments also introduces some ambiguity in their interpretation, as the results may reflect dosage effects from multiple pathways rather than pathway order. Finally, the manuscript would benefit from a more comprehensive reference to relevant literature on JNKKKs and BMP signalling, as well as on the recycling endosome function in synaptic growth and the regulation of the aforementioned pathways.

      Overall, while the work presents thought-provoking observations and a potentially interesting regulatory model, additional experimental rigor and broader contextualization are needed to substantiate the proposed mechanism and its biological relevance.

    3. Author response:

      Weaknesses:

      (1) Several conclusions are insufficiently supported at this point. For example, evidence that the Hiw foci represent bona fide liquid-liquid phase (LLP) separated condensates is limited. Sensitivity to 1,6-hexanediol is not definitive proof of their liquid condensate nature, and their recovery kinetics after 1,6-hexanediol wash-out and their morphology are inconsistent with a pure liquid behaviour. Furthermore, the claim that the Hiw foci are non-vesicular is not strongly supported, as it is only based on the lack of colocalization with a handful of endosomal proteins.

      We agree that, at the current stage of the manuscript, we have presented data only on Hiw foci in the VNC and shown that they are sensitive to 1,6-HD but not to 2,5-HD. To further provide definitive proof that these are bona fide condensates, we will now perform in vitro analysis of different domains of Hiw and the Hiw IDR region. In addition, we will also investigate the Hiw-GFP behavior in non-neuronal and transiently transfected cell lines using FRAP and other protocols previously applied to condensate-forming proteins.

      Finally, we will perform an in-depth analysis of the Hiw condensates for their colocalization with endocytic proteins and cellular compartments and determine whether they are part of any known vesicular structures.

      (2) Importantly, the appearance of the putative condensates is correlative rather than causative for synaptic overgrowth, and in the absence of a mechanistic link between endocytosis and Hiw condensation, the causality is difficult to address. Of note is that the putative condensates are already present (albeit to a lesser extent) in the absence of endocytic defects and that the conclusions rely heavily on overexpressed GFP-Hiw, which may perturb normal protein behaviour and artificially induce condensation or aggregation.

      To investigate the formation of condensates and their relation to synaptic growth, we will perform a time-course analysis of changes at the NMJ and correlate with the Hiw condensate appearance in the VNC of shi<sup>ts</sup> expressing GFP-Hiw, along with appropriate controls. The GFP transgene used is a functional transgene and well established for studying Hiw behaviour. The Hiw condensates do not form when expressed on an otherwise wild-type background. We will further assess the formation of Hiw condensates in other endocytic mutants with appropriate controls.

      (3) The use of hypomorphic mutants in genetic experiments also introduces some ambiguity in their interpretation, as the results may reflect dosage effects from multiple pathways rather than pathway order. Finally, the manuscript would benefit from a more comprehensive reference to relevant literature on JNKKKs and BMP signalling, as well as on the recycling endosome function in synaptic growth and the regulation of the aforementioned pathways.

      We will perform genetic analysis using homozygous mutants of the wit and saxophone genes to further support epistatic interactions between the BMP signaling pathway and synaptic growth. We will strengthen the discussion part.

    1. eLife Assessment

      The authors use sequencing of nascent DNA (DNA linked to an RNA primer, “SNS-Seq”) to localise DNA replication origins in Trypanosoma brucei, so this work will be of interest to those studying either Kinetoplastids or DNA replication. The paper presents the SNS-seq results for only part of the genome, and there are significant discrepancies between the SNS-Seq results and those from other, previously-published results obtained using other origin mapping methods. The reasons for the differences are unknown and from the data available, it is not possible to assess which origin-mapping method is most suitable for origin mapping in T. brucei. Thus at present, the evidence that origins are distributed as the authors claim - and not where previously mapped - is inadequate.

    2. Reviewer #1 (Public review):

      In this paper, Stanojcic and colleagues attempt to map sites of DNA replication initiation in the genome of the African trypanosome, Trypanosoma brucei. Their approach to this mapping is to isolate 'short-nascent strands' (SNSs), a strategy adopted previously in other eukaryotes (including in the related parasite Leishmania major), which involves isolation of DNA molecules whose termini contain replication-priming RNA. By mapping the isolated and sequenced SNSs to the genome (SNS-seq), the authors suggest that they have identified origins, which they localise to intergenic (strictly, inter-CDS) regions within polycistronic transcription units and suggest display very extensive overlap with previously mapped R-loops in the same loci. Finally, having defined locations of SNS-seq mapping, they suggest they have identified G4 and nucleosome features of origins, again using previously generated data. Though there is merit in applying a new approach to understand DNA replication initiation in T. brucei, where previous work has used MFA-seq and ChIP of a subunit of the Origin Replication Complex (ORC), there are two significant deficiencies in the study that must be addressed to ensure rigour and accuracy.

      (1) The suggestion that the SNS-seq data is mapping DNA replication origins that are present in inter-CDS regions of the polycistronic transcription units of T. brucei is novel and does not agree with existing data on the localisation of ORC1/CDC6, and it is very unclear if it agrees with previous mapping of DNA replication by MFA-seq due to the way the authors have presented this correlation. For these reasons, the findings essentially rely on a single experimental approach, which must be further tested to ensure SNS-seq is truly detecting origins. Indeed, in this regard, the very extensive overlap of SNS-seq signal with RNA-DNA hybrids should be tested further to rule out the possibility that the approach is mapping these structures and not origins.

      (2) The authors' presentation of their SNS-seq data is too limited and therefore potentially provides a misleading view of DNA replication in the genome of T. brucei. The work is presented through a narrow focus on SNS-seq signal in the inter-CDS regions within polycistronic transcription units, which constitute only part of the genome, ignoring both the transcription start and stop sites at the ends of the units and the large subtelomeres, which are mainly transcriptionally silent. The authors must present a fuller and more balanced view of SNS-seq mapping across the whole genome to ensure full understanding and clarity.

    3. Reviewer #2 (Public review):

      Summary:

      Stanojcic et al. investigate the origins of DNA replication in the unicellular parasite Trypanosoma brucei. They perform two experiments, stranded SNS-seq and DNA molecular combing. Further, they integrate various publicly available datasets, such as G4-seq and DRIP-seq, into their extensive analysis. Using this data, they elucidate the structure of the origins of replication. In particular, they find various properties located at or around origins, such as polynucleotide stretches, G-quadruplex structures, regions of low and high nucleosome occupancy, R-loops, and that origins are mostly present in intergenic regions. Combining their population-level SNS-seq and their single-molecule DNA molecular combing data, they elucidate the total number of origins as well as the number of origins active in a single cell.

      Strengths:

      (1) A very strong part of this manuscript is that the authors integrate several other datasets and investigate a large number of properties around origins of replication. Data analysis clearly shows the enrichment of various properties at the origins, and the manuscript concludes with a very well-presented model that clearly explains the authors' understanding and interpretation of the data.

      (2) The DNA combing experiment is an excellent orthogonal approach to the SNS-seq data. The authors used the different properties of the two experiments (one giving location information, one giving single-molecule information) well to extract information and contrast the experiments.

      (3) The discussion is exemplary, as the authors openly discuss the strengths and weaknesses of the approaches used. Further, the discussion serves its purpose of putting the results in both an evolutionary and a trypanosome-focused context.

      Weaknesses:

      I have major concerns about the origin of replication sites determined from the SNS-seq data. As a caveat, I want to state that, before reading this manuscript, SNS-seq was unknown to me; hence, some of my concerns might be misplaced.

      (1) I do not understand why SNS-seq would create peaks. Replication should originate in one locus, then move outward in both directions until the replication fork moving outward from another origin is encountered. Hence, in an asynchronous population average measurement, I would expect SNS data to be broad regions of + and -, which, taken together, cover the whole genome. Why are there so many regions not covered at all by reads, and why are there such narrow peaks?

      (2) I am concerned that up to 96% percent of all peaks are filtered away. If there is so much noise in the data, how can one be sure that the peaks that remain are real? Specifically, if the authors placed the same number of peaks as was measured randomly in intergenic regions, would 4% of these peaks pass the filtering process by chance?

      (3) There are 3 previous studies that map origins of replication in T. brucei. Devlin et al. 2016, Tiengwe et al. 2012, and Krasiļņikova et al. 2025 (https://doi.org/10.1038/s41467-025-56087-3), all with a different technique: MFA-seq. All three previous studies mostly agree on the locations and number of origins. The authors compared their results to the first two, but not the last study; they found that their results are vastly different from the previous studies (see Supplementary Figure 8A). In their discussion, the authors defend this discrepancy mostly by stating that the discrepancy between these methods has been observed in other organisms. I believe that, given the situation that the other studies precede this manuscript, it is the authors' duty to investigate the differences more than by merely pointing to other organisms. A conclusion should be reached on why the results are different, e.g., by orthogonally validating origins absent in the previous studies.

      (4) Some patterns that were identified to be associated with origins of replication, such as G-quadruplexes and nucleosomes phasing, are known to be biases of SNS-seq (see Foulk et al. Characterizing and controlling intrinsic biases of lambda exonuclease in nascent strand sequencing reveals phasing between nucleosomes and G-quadruplex motifs around a subset of human replication origins. Genome Res. 2015;25(5):725-735. doi:10.1101/gr.183848.114).

      Are the claims well substantiated?:

      My opinion on whether the authors' results support their conclusions depends on whether my concerns about the sites determined from the SNS-seq data can be dismissed. In the case that these concerns can be dismissed, I do think that the claims are compelling.

      Impact:

      If the origins of replication prove to be distributed as claimed, this study has the potential to be important for two fields. Firstly, in research focused on T. brucei as a disease agent, where essential processes that function differently than in mammals are excellent drug targets. Secondly, this study would impact basic research analyzing DNA replication over the evolutionary tree, where T. brucei can be used as an early-divergent eukaryotic model organism.

    4. Author response:

      eLife Assessment

      The authors use sequencing of nascent DNA (DNA linked to an RNA primer, "SNS-Seq") to localise DNA replication origins in Trypanosoma brucei, so this work will be of interest to those studying either Kinetoplastids or DNA replication. The paper presents the SNS-seq results for only part of the genome, and there are significant discrepancies between the SNS-Seq results and those from other, previously-published results obtained using other origin mapping methods. The reasons for the differences are unknown and from the data available, it is not possible to assess which origin-mapping method is most suitable for origin mapping in T. brucei. Thus at present, the evidence that origins are distributed as the authors claim - and not where previously mapped - is inadequate.

      We would like to clarify a few points regarding our study. Our primary objective was to characterise the topology and genome-wide distribution of short nascent-strand (SNS) enrichments. The stranded SNS-seq approach provides the high strand-specific resolution required to analyse origins. The observation that SNS-seq peaks (potential origins) are most frequently found in intergenic regions is not an artefact of analysing only part of the genome; rather, it is a result of analysing the entire genome.

      We agree that orthogonal validation is necessary. However, neither MFA-seq nor TbORC1/CDC6 ChIP-on-chip has yet been experimentally validated as definitive markers of origin activity in T. brucei, nor do they validate each other. 

      Public Reviews:

      Reviewer #1 (Public review):

      In this paper, Stanojcic and colleagues attempt to map sites of DNA replication initiation in the genome of the African trypanosome, Trypanosoma brucei. Their approach to this mapping is to isolate 'short-nascent strands' (SNSs), a strategy adopted previously in other eukaryotes (including in the related parasite Leishmania major), which involves isolation of DNA molecules whose termini contain replication-priming RNA. By mapping the isolated and sequenced SNSs to the genome (SNS-seq), the authors suggest that they have identified origins, which they localise to intergenic (strictly, inter-CDS) regions within polycistronic transcription units and suggest display very extensive overlap with previously mapped R-loops in the same loci. Finally, having defined locations of SNS-seq mapping, they suggest they have identified G4 and nucleosome features of origins, again using previously generated data.

      Though there is merit in applying a new approach to understand DNA replication initiation in T. brucei, where previous work has used MFA-seq and ChIP of a subunit of the Origin Replication Complex (ORC), there are two significant deficiencies in the study that must be addressed to ensure rigour and accuracy.

      (1) The suggestion that the SNS-seq data is mapping DNA replication origins that are present in inter-CDS regions of the polycistronic transcription units of T. brucei is novel and does not agree with existing data on the localisation of ORC1/CDC6, and it is very unclear if it agrees with previous mapping of DNA replication by MFA-seq due to the way the authors have presented this correlation. For these reasons, the findings essentially rely on a single experimental approach, which must be further tested to ensure SNS-seq is truly detecting origins. Indeed, in this regard, the very extensive overlap of SNS-seq signal with RNA-DNA hybrids should be tested further to rule out the possibility that the approach is mapping these structures and not origins.

      (2) The authors' presentation of their SNS-seq data is too limited and therefore potentially provides a misleading view of DNA replication in the genome of T. brucei. The work is presented through a narrow focus on SNS-seq signal in the inter-CDS regions within polycistronic transcription units, which constitute only part of the genome, ignoring both the transcription start and stop sites at the ends of the units and the large subtelomeres, which are mainly transcriptionally silent. The authors must present a fuller and more balanced view of SNS-seq mapping across the whole genome to ensure full understanding and clarity.

      Regarding comparisons with previous work:

      Two other attempts to identify origins in T. brucei —ORC1/CDC6 binding sites (ChIP-on-chip, PMID: 22840408) and MFA-seq (PMID: 22840408, 27228154)—were both produced by the McCulloch group. These methods do not validate each other; in fact, MFA-seq origins overlap with only 4.4% of the 953 ORC1/CDC6 sites (PMID: 29491738). Therefore, low overlap between SNS-seq peaks and ORC1/CDC6 sites cannot disqualify our findings. Similar low overlaps are observed in other parasites (PMID: 38441981, PMID: 38038269, PMID: 36808528) and in human cells (PMID: 38567819).

      We also would like to emphasize that the ORC1/CDC6 dataset originally published (PMID: 22840408) is no longer available; only a re-analysis by TritrypDB exists, which differs significantly from the published version (personal communication from Richard McCulloch). While the McCulloch group reported a predominant localization of ORC1/CDC6 sites within SSRs at transcription start and termination regions, our re-analysis indicates that only 10.3% of TbORC1/CDC6-12Myc sites overlapped with 41.8% of SSRs.

      MFA-seq does not map individual origins, it rather detects replicated genomic regions by comparing DNA copy number between S- and G1-phases of the cell cycle (PMID: 36640769; PMID: 37469113; PMID: 36455525). The broad replicated regions (0.1–0.5 Mbp) identified by MFA-seq in T. brucei are likely to contain multiple origins, rather than just one. In that sense we disagree with the McCulloch's group who claimed that there is a single origin per broad peak. Our analysis shows that up to 50% of the origins detected by stranded SNS-seq locate within broad MFA-seq regions. The methodology used by McCulloch’s group to infer single origins from MFA-seq regions has not been published or made available, as well as the precise position of these regions, making direct comparison difficult.

      Finally, the genomic features we describe—poly(dA/dT) stretches, G4 structures and nucleosome occupancy patterns—are consistent with origin topology described in other organisms.

      On the concern that SNS-seq may map RNA-DNA hybrids rather than replication origins: Isolation and sequencing of short nascent strands (SNS) is a well-established and widely used technique for high-resolution origin mapping. This technique has been employed for decades in various laboratories, with numerous publications documenting its use. We followed the published protocol for SNS isolation (Cayrou et al., Methods, 2012, PMID: 22796403). RNA-DNA hybrids cannot persist through the multiple denaturation steps in our workflow, as they melt at 95°C (Roberts and Crothers, Science, 1992; PMID: 1279808). Even in the unlikely event that some hybrids remained, they would not be incorporated into libraries prepared using a single-stranded DNA protocol and therefore would not be sequenced (see Figure 1B and Methods).

      Furthermore, our analysis shows that only a small proportion (1.7%) of previously reported RNA-DNA hybrids overlap with SNS-seq origins. It is important to note that RNA-primed nascent strands naturally form RNA-DNA hybrids during replication initiation, meaning the enrichment of RNA-DNA hybrids near origins is both expected and biologically relevant.

      On the claim that our analysis focuses narrowly on inter-CDS regions and ignores other genomic compartments: this is incorrect. We mapped and analyzed stranded SNS-seq data across the entire genome of T. brucei 427 wild-type strain (Müller et al., Nature, 2018; PMID: 30333624), including both core and subtelomeric regions. Our findings indicate that most origins are located in intergenic regions, but all analyses were performed using the full set of detected origins, regardless of location.

      We did not ignore transcription start and stop sites (TSS/TTS). The manuscript already includes origin distribution across genomic compartments as defined by TriTrypDB (Fig. 2C) and addresses overlap with TSS, TTS and HT in the section “Spatial coordination between the activity of the origin and transcription”. While this overlap is minimal, we have included metaplots in the revised manuscript for clarity.

      Reviewer #2 (Public review):

      Summary: 

      Stanojcic et al. investigate the origins of DNA replication in the unicellular parasite Trypanosoma brucei. They perform two experiments, stranded SNS-seq and DNA molecular combing. Further, they integrate various publicly available datasets, such as G4-seq and DRIP-seq, into their extensive analysis. Using this data, they elucidate the structure of the origins of replication. In particular, they find various properties located at or around origins, such as polynucleotide stretches, G-quadruplex structures, regions of low and high nucleosome occupancy, R-loops, and that origins are mostly present in intergenic regions. Combining their population-level SNS-seq and their single-molecule DNA molecular combing data, they elucidate the total number of origins as well as the number of origins active in a single cell.

      Strengths:

      (1) A very strong part of this manuscript is that the authors integrate several other datasets and investigate a large number of properties around origins of replication. Data analysis clearly shows the enrichment of various properties at the origins, and the manuscript concludes with a very well-presented model that clearly explains the authors' understanding and interpretation of the data.

      We sincerely thank you for this positive feedback.

      (2) The DNA combing experiment is an excellent orthogonal approach to the SNS-seq data. The authors used the different properties of the two experiments (one giving location information, one giving single-molecule information) well to extract information and contrast the experiments.

      Thank you very much for this remark.

      (3) The discussion is exemplary, as the authors openly discuss the strengths and weaknesses of the approaches used. Further, the discussion serves its purpose of putting the results in both an evolutionary and a trypanosome-focused context.

      Thank you for appreciating our discussion.

      Weaknesses:

      I have major concerns about the origin of replication sites determined from the SNS-seq data. As a caveat, I want to state that, before reading this manuscript, SNS-seq was unknown to me; hence, some of my concerns might be misplaced.

      (1) I do not understand why SNS-seq would create peaks. Replication should originate in one locus, then move outward in both directions until the replication fork moving outward from another origin is encountered. Hence, in an asynchronous population average measurement, I would expect SNS data to be broad regions of + and -, which, taken together, cover the whole genome. Why are there so many regions not covered at all by reads, and why are there such narrow peaks?

      Thank you for asking these questions. As you correctly point out, replication forks progress in both directions from their origins and ultimately converge at termination sites. However, the SNS-seq method specifically isolates short nascent strands (SNSs) of 0.5–2.5 kb using a sucrose gradient. These short fragments are generated immediately after origin firing and mark the sites of replication initiation, rather than the entire replicated regions. Consequently: (i) SNS-seq does not capture long replication forks or termination regions, only the immediate vicinity of origins. (ii) The narrow peaks indicate the size of selected SNSs (0.5–2.5 kb) and the fact that many cells initiate replication at the same genomic sites, leading to localized enrichment. (iii) Regions without coverage refer to genomic areas that do not serve as efficient origins in the analyzed cell population. Thus, SNS-seq is designed to map origin positions, but not the entire replicated regions.

      (2) I am concerned that up to 96% percent of all peaks are filtered away. If there is so much noise in the data, how can one be sure that the peaks that remain are real? Specifically, if the authors placed the same number of peaks as was measured randomly in intergenic regions, would 4% of these peaks pass the filtering process by chance?

      Maintaining the strandness of the sequenced DNA fibres enabled us to filter the peaks, thereby increasing the probability that the filtered peak pairs corresponded to origins. Two SNS peaks must be oriented in a way that reflects the topology of the SNS strands within an active origin: the upstream peak must be on the minus strand and followed by the downstream peak on the plus strand.

      As suggested by the reviewer, we tested whether randomly placed plus and minus peaks could reproduce the number of filter-passing peaks using the same bioinformatics workflow. Only 1–6% of random peaks passed the filters, compared with 4–12% in our experimental data, resulting in about 50% fewer selected regions (origins). Moreover, the “origins” from random peaks showed 0% reproducibility across replicates, whereas the experimental data showed 7–64% reproducibility. These results indicate that the retainee peaks are highly unlikely to arise by chance and support the specificity of our approach. Thank you for this suggestion.

      (3) There are 3 previous studies that map origins of replication in T. brucei. Devlin et al. 2016, Tiengwe et al. 2012, and Krasiļņikova et al. 2025 (https://doi.org/10.1038/s41467-025-56087-3), all with a different technique: MFA-seq. All three previous studies mostly agree on the locations and number of origins. The authors compared their results to the first two, but not the last study; they found that their results are vastly different from the previous studies (see Supplementary Figure 8A). In their discussion, the authors defend this discrepancy mostly by stating that the discrepancy between these methods has been observed in other organisms. I believe that, given the situation that the other studies precede this manuscript, it is the authors' duty to investigate the differences more than by merely pointing to other organisms. A conclusion should be reached on why the results are different, e.g., by orthogonally validating origins absent in the previous studies.

      The MFA-seq data for T. brucei were published in two studies by McCulloch’s group: Tiengwe et al. (2012) using TREU927 PCF cells, and Devlin et al. (2016) using PCF and BSF Lister427 cells. In Krasilnikova et al. (2025), previously published MFA-seq data from Devlin et al. were remapped to a new genome assembly without generating new MFA-seq data, which explains why we did not include that comparison.

      Clarifying the differences between MFA-seq and our stranded SNS-seq data is essential. MFA-seq and SNS-seq interrogate different aspects of replication. SNS-seq is a widely used, high-resolution method for mapping individual replication origins, whereas MFA-seq detects replicated regions by comparing DNA copy number between S and G1 phases. MFA-seq identified broad replicated regions (0.1–0.5 Mb) that were interpreted by McCulloch’s group as containing a single origin. We disagree with this interpretation and consider that there are multiple origins in each broad peaks; theoretical considerations of replication timing indicate that far more origins are required for complete genome duplication during the short S-phase. Once this assumption is reconsidered, MFA-seq and SNS-seq results become complementary: MFA-seq identifies replicated regions, while SNS-seq pinpoints individual origins within those regions. Our analysis revealed that up to 50% of the origins detected by stranded SNS-seq were located within the broad MFA peaks. This pattern—broad MFA-seq regions containing multiple initiation sites—has also recently been found in Leishmania by McCulloch’s team using nanopore sequencing (PMID: 26481451). Nanopore sequencing showed numerous initiation sites within MFA-seq regions and additional numerous sites outside these regions in asynchronous cells, consistent with what we observed using stranded SNS-seq in T. brucei. We will expand our discussion and conclude that the discrepancy arises from methodological differences and interpretation. The two approaches provide complementary insights into replication dynamics, rather than ‘vastly different’ results.

      We recognize the importance of validating our results in future using an alternative mapping method and functional assays. However, it is important to emphasize that stranded SNS-seq is an origin mapping technique with a very high level of resolution. This technique can detect regions between two divergent SNS peaks, which should represent regions of DNA replication initiation. At present, no alternative technique has been developed that can match this level of resolution.

      (4) Some patterns that were identified to be associated with origins of replication, such as G-quadruplexes and nucleosomes phasing, are known to be biases of SNS-seq (see Foulk et al. Characterizing and controlling intrinsic biases of lambda exonuclease in nascent strand sequencing reveals phasing between nucleosomes and G-quadruplex motifs around a subset of human replication origins. Genome Res. 2015;25(5):725-735. doi:10.1101/gr.183848.114).

      It is important to note that the conditions used in our study differ significantly from those applied in the Foulk et al. Genome Res. 2015. We used SNS isolation and enzymatic treatments as described in previous reports (Cayrou, C. et al. Genome Res, 2015 and Cayrou, C et al. Methods, 2012). Here, we enriched the SNS by size on a sucrose gradient and then treated this SNS-enriched fraction with high amounts of repeated λ-exonuclease treatments (100u for 16h at 37oC - see Methods). In contrast, Foulk et al. used sonicated total genomic DNA for origin mapping, without enrichment of SNS on a sucrose gradient as we did, and then they performed a λ-exonuclease treatment. A previous study (Cayrou, C. et al. Genome Res, 2015, Figure S2, which can be found at https://genome.cshlp.org/content/25/12/1873/suppl/DC1) has shown that complete digestion of G4-rich DNA sequences is achieved under the conditions we used.

      Furthermore, the SNS depleted control (without RNA) was included in our experimental approach. This control represents all molecules that are difficult to digest with lambda exonuclease, including G4 structures. Peak calling was performed against this background control, with the aim of removing false positive peaks resulting from undigested DNA structures. We explained better this step in the revised manuscript.

      The key benefit of our study is that the orientation of the enrichments (peaks) remains consistent throughout the sequencing process. We identified an enrichment of two divergent strands synthesised on complementary strands containing G4s. These two divergent strands themselves do not, however, contain G4s (see Fig. 8 for the model). Therefore, the enriched molecules detected in our study do not contain G4s. They are complementary to the strands enriched with G4s. This means that the observed enrichment of

      G4s cannot be an artefact of the enzymatic treatments used in this study. We added this part in the discussion of the revised manuscript.

      We also performed an additional control which is not mentioned in the manuscript. In parallel with replicating cells, we isolated the DNA from the stationary phase of growth, which primarily contains non-replicating cells. Following the three λ-exonuclease treatments, there was insufficient DNA remaining from the stationary phase cells to prepare the libraries for sequencing. This control strongly indicated that there was little to no contaminating DNA present with the SNS molecules after λ-exonuclease enrichment.

    1. eLife Assessment

      This study offers valuable insights into how humans detect and adapt to regime shifts, highlighting dissociable contributions of the frontoparietal network and ventromedial prefrontal cortex to sensitivity to signal diagnosticity and transition probabilities. The combination of an innovative instructed-probability task, Bayesian behavioural modeling, and model-based fMRI analyses provides a solid foundation for the main claims; however, major interpretational limitations remain, particularly a potential confound between posterior switch probability and time in the neuroimaging results. At the behavioural level, reliance on explicitly instructed conditional probabilities leaves open alternative explanations that complicate attribution to a single computational mechanism, such that clearer disambiguation between competing accounts and stronger control of temporal and representational confounds would further strengthen the evidence.

    2. Reviewer #1 (Public review):

      Summary:

      The study examines human biases in a regime-change task, in which participants have to report the probability of a regime change in the face of noisy data. The behavioral results indicate that humans display systematic biases, in particular, overreaction in stable but noisy environments and underreaction in volatile settings with more certain signals. fMRI results suggest that a frontoparietal brain network is selectively involved in representing subjective sensitivity to noise, while the vmPFC selectively represents sensitivity to the rate of change.

      Strengths:

      The study relies on a task that measures regime-change detection primarily based on descriptive information about the noisiness and rate of change. This distinguishes the study from prior work using reversal-learning or change-point tasks in which participants are required to learn these parameters from experiences. The authors discuss these differences comprehensively.

      The study uses a simple Bayes-optimal model combined with model fitting, which seems to describe the data well. The model is comprehensively validated.

      The authors apply model-based fMRI analyses that provide a close link to behavioral results, offering an elegant way to examine individual biases.

      Weaknesses:

      The authors have adequately addressed my prior concerns.

    3. Reviewer #3 (Public review):

      This study concerns how observers (human participants) detect changes in the statistics of their environment, termed regime shifts. To make this concrete, a series of 10 balls are drawn from an urn that contains mainly red or mainly blue balls. If there is a regime shift, the urn is changed over (from mainly red to mainly blue) at some point in the 10 trials. Participants report their belief that there has been a regime shift as a % probability. Their judgement should (mathematically) depend on the prior probability of a regime shift (which is set at one of three levels) and the strength of evidence (also one of three levels, operationalized as the proportion of red balls in the mostly-blue urn and vice versa). Participants are directly instructed of the prior probability of regime shift and proportion of red balls, which are presented on-screen as numerical probabilities. The task therefore differs from most previous work on this question in that probabilities are instructed rather than learned by observation, and beliefs are reported as numerical probabilities rather than being inferred from participants' choice behaviour (as in many bandit tasks, such as Behrens 2007 Nature Neurosci).

      The key behavioural finding is that participants over-estimate the prior probability of regime change when it is low, and under estimate it when it is high; and participants over-estimate the strength of evidence when it is low and under-estimate it when it is high. In other words participants make much less distinction between the different generative environments than an optimal observer would. This is termed 'system neglect'. A neuroeconomic-style mathematical model is presented and fit to data.

      Functional MRI results how that strength of evidence for a regime shift (roughly, the surprise associated with a blue ball from an apparently red urn) is associated with activity in the frontal-parietal orienting network. Meanwhile at time-points where the probability of a regime shift is high, there is activity in another network including vmPFC. Both networks show individual differences effects, such that people who were more sensitive to strength of evidence and prior probability show more activity in the frontal-parietal and vmPFC-linked networks respectively.

      Strengths

      (1) The study provides a different task for looking at change-detection and how this depends on estimates of environmental volatility and sensory evidence strength, in which participants are directly and precisely informed of the environmental volatility and sensory evidence strength rather than inferring them through observation as in most previous studies

      (2) Participants directly provide belief estimates as probabilities rather than experimenters inferring them from choice behaviour as in most previous studies

      (3) The results are consistent with well-established findings that surprising sensory events activate the frontal-parietal orienting network whilst updating of beliefs about the word ('regime shift') activates vmPFC.

      Weaknesses

      (1) The use of numerical probabilities (both to describe the environments to participants, and for participants to report their beliefs) may be problematic because people are notoriously bad at interpreting probabilities presented in this way, and show poor ability to reason with this information (see Kahneman's classic work on probabilistic reasoning, and how it can be improved by using natural frequencies). Therefore the fact that, in the present study, people do not fully use this information, or use it inaccurately, may reflect the mode of information delivery.

      In the response to this comment the authors have pointed out their own previous work showing that system neglect can occur even when numerical probabilities are not used. This is reassuring but there remains a large body of classic work showing that observers do struggle with conditional probabilities of the type presented in the task.

      (2) Although a very precise model of 'system neglect' is presented, many other models could fit the data.

      For example, you would get similar effects due to attraction of parameter estimates towards a global mean - essentially application of a hyper-prior in which the parameters applied by each participant in each block are attracted towards the experiment-wise mean values of these parameters. For example, the prior probability of regime shift ground-truth values [0.01, 0.05, 0.10] are mapped to subjective values of [0.037, 0.052, 0.069]; this would occur if observers apply a hyper-prior that the probability of regime shift is about 0.05 (the average value over all blocks). This 'attraction to the mean' is a well-established phenomenon and cannot be ruled out with the current data (I suppose you could rule it out by comparing to another dataset in which the mean ground-truth value was different).

      More generally, any model in which participants don't fully use the numerical information they were given would produce apparent 'system neglect'. Four qualitatively different example reasons are: 1. Some individual participants completely ignored the probability values given. 2. Participants did not ignore the probability values given, but combined them with a hyperprior as above. 3. Participants had a reporting bias where their reported beliefs that a regime-change had occurred tend to be shifted towards 50% (rather than reporting 'confident' values such 5% or 95%). 4. Participants underweighted probability outliers, resulting in underweighting of evidence in the 'high signal diagnosticity' environment (10.1016/j.neuron.2014.01.020 )

      In summary I agree that any model that fits the data would have to capture the idea that participants don't differentiate between the different environments as much as they should, but I think there are a number of qualitatively different reasons why they might do this - of which the above are only examples - hence I find it problematic that the authors present the behaviour as evidence for one extremely specific model.

      (3) Despite efforts to control confounds in the fMRI study, including two control experiments, I think some confounds remain.

      For example, a network of regions is presented as correlating with the cumulative probability that there has been a regime shift in this block of 10 samples (Pt). However, regardless of the exact samples shown, Pt always increases with sample number (as by the time of later samples, there have been more opportunities for a regime shift)? To control for this the authors include, in a supplementary analysis, an 'intertemporal prior.' I would have preferred to see the results of this better-controlled analysis presented in the main figure. From the tables in the SI it is very difficult to tell how the results change with the includion of the control regressors.

      On the other hand, two additional fMRI experiments are done as control experiments and the effect of Pt in the main study is compared to Pt in these control experiments. Whilst I admire the effort in carrying out control studies, I can't understand how these particular experiment are useful controls. For example, in experiment 3 participants simply type in numbers presented on the screen - how can we even have an estimate of Pt from this task?

      (4) The Discussion is very long, and whilst a lot of related literature is cited, I found it hard to pin down within the discussion, what the key contributions of this study are. In my opinion it would be better to have a short but incisive discussion highlighting the advances in understanding that arise from the current study, rather than reviewing the field so broadly.

    4. Author response:

      The following is the authors’ response to the current reviews

      eLife Assessment

      This study offers valuable insights into how humans detect and adapt to regime shifts, highlighting dissociable contributions of the frontoparietal network and ventromedial prefrontal cortex to sensitivity to signal diagnosticity and transition probabilities. The combination of an innovative instructed-probability task, Bayesian behavioural modeling, and model-based fMRI analyses provides a solid foundation for the main claims; however, major interpretational limitations remain, particularly a potential confound between posterior switch probability and time in the neuroimaging results. At the behavioural level, reliance on explicitly instructed conditional probabilities leaves open alternative explanations that complicate attribution to a single computational mechanism, such that clearer disambiguation between competing accounts and stronger control of temporal and representational confounds would further strengthen the evidence.

      Thank you. In this revision, we will focus on addressing Reviewer 3’s concern on the potential confound between posterior probability and time in neuroimaging results. First, we will present whole-brain results of subjects’ probability estimates (their subjective posterior probability of switch) after controlling for the effect of time on probability of switch (the intertemporal prior). Second, we will compare the effect of probability estimates (Pt) on vmPFC and ventral striatum activity—which we found to correlate with Pt—with and without including intertemporal prior in the GLM. Third, to address Reviewer 3’s comment that from the Tables of activation in the supplement vmPFC and ventral striatum cannot be located, we will add slice-by-slice image of the whole-brain results on Pt in the Supplemental Information in addition to the Tables of Activation.

      Public Reviews:

      Reviewer #1 (Public review):<br /> Summary:

      The study examines human biases in a regime-change task, in which participants have to report the probability of a regime change in the face of noisy data. The behavioral results indicate that humans display systematic biases, in particular, overreaction in stable but noisy environments and underreaction in volatile settings with more certain signals. fMRI results suggest that a frontoparietal brain network is selectively involved in representing subjective sensitivity to noise, while the vmPFC selectively represents sensitivity to the rate of change.

      Strengths:

      The study relies on a task that measures regime-change detection primarily based on descriptive information about the noisiness and rate of change. This distinguishes the study from prior work using reversal-learning or change-point tasks in which participants are required to learn these parameters from experiences. The authors discuss these differences comprehensively.

      The study uses a simple Bayes-optimal model combined with model fitting, which seems to describe the data well. The model is comprehensively validated.

      The authors apply model-based fMRI analyses that provide a close link to behavioral results, offering an elegant way to examine individual biases.

      Weaknesses:

      The authors have adequately addressed my prior concerns.

      Thank you for reviewing our paper and providing constructive comments that helped us improve our paper.

      Reviewer #3 (Public review):

      Thank you again for reviewing the manuscript. In this revision, we will focus on addressing your concern on the potential confound between posterior probability and time in neuroimaging results. First, we will present whole-brain results of subjects’ probability estimates (Pt, their subjective posterior probability of switch) after controlling for the effect of time on probability of switch (the intertemporal prior). Second, we will compare the effect of probability estimates (Pt) on vmPFC and ventral striatum activity—which we found to correlate with Pt—with and without including intertemporal prior in the GLM. These results will be summarized in a new figure (Figure 4).

      Finally, to address that you were not able to locate vmPFC and ventral striatum from the Tables of activation, we will add slice-by-slice image of the whole-brain results on Pt in the supplement in addition to the Tables of Activation.

      This study concerns how observers (human participants) detect changes in the statistics of their environment, termed regime shifts. To make this concrete, a series of 10 balls are drawn from an urn that contains mainly red or mainly blue balls. If there is a regime shift, the urn is changed over (from mainly red to mainly blue) at some point in the 10 trials. Participants report their belief that there has been a regime shift as a % probability. Their judgement should (mathematically) depend on the prior probability of a regime shift (which is set at one of three levels) and the strength of evidence (also one of three levels, operationalized as the proportion of red balls in the mostly-blue urn and vice versa). Participants are directly instructed of the prior probability of regime shift and proportion of red balls, which are presented on-screen as numerical probabilities. The task therefore differs from most previous work on this question in that probabilities are instructed rather than learned by observation, and beliefs are reported as numerical probabilities rather than being inferred from participants' choice behaviour (as in many bandit tasks, such as Behrens 2007 Nature Neurosci).

      The key behavioural finding is that participants over-estimate the prior probability of regime change when it is low, and under estimate it when it is high; and participants over-estimate the strength of evidence when it is low and under-estimate it when it is high. In other words participants make much less distinction between the different generative environments than an optimal observer would. This is termed 'system neglect'. A neuroeconomic-style mathematical model is presented and fit to data.

      Functional MRI results how that strength of evidence for a regime shift (roughly, the surprise associated with a blue ball from an apparently red urn) is associated with activity in the frontal-parietal orienting network. Meanwhile at time-points where the probability of a regime shift is high, there is activity in another network including vmPFC. Both networks show individual differences effects, such that people who were more sensitive to strength of evidence and prior probability show more activity in the frontal-parietal and vmPFC-linked networks respectively.

      Strengths

      (1) The study provides a different task for looking at change-detection and how this depends on estimates of environmental volatility and sensory evidence strength, in which participants are directly and precisely informed of the environmental volatility and sensory evidence strength rather than inferring them through observation as in most previous studies

      (2) Participants directly provide belief estimates as probabilities rather than experimenters inferring them from choice behaviour as in most previous studies

      (3) The results are consistent with well-established findings that surprising sensory events activate the frontal-parietal orienting network whilst updating of beliefs about the word ('regime shift') activates vmPFC.

      Weaknesses

      (1) The use of numerical probabilities (both to describe the environments to participants, and for participants to report their beliefs) may be problematic because people are notoriously bad at interpreting probabilities presented in this way, and show poor ability to reason with this information (see Kahneman's classic work on probabilistic reasoning, and how it can be improved by using natural frequencies). Therefore the fact that, in the present study, people do not fully use this information, or use it inaccurately, may reflect the mode of information delivery.

      In the response to this comment the authors have pointed out their own previous work showing that system neglect can occur even when numerical probabilities are not used. This is reassuring but there remains a large body of classic work showing that observers do struggle with conditional probabilities of the type presented in the task.

      Thank you. Yes, people do struggle with conditional probabilities in many studies. However, as our previous work suggested (Massey and Wu, 2005), system-neglect was likely not due to response mode (having to enter probability estimates or making binary predictions, and etc.).

      (2) Although a very precise model of 'system neglect' is presented, many other models could fit the data.

      For example, you would get similar effects due to attraction of parameter estimates towards a global mean - essentially application of a hyper-prior in which the parameters applied by each participant in each block are attracted towards the experiment-wise mean values of these parameters. For example, the prior probability of regime shift ground-truth values [0.01, 0.05, 0.10] are mapped to subjective values of [0.037, 0.052, 0.069]; this would occur if observers apply a hyper-prior that the probability of regime shift is about 0.05 (the average value over all blocks). This 'attraction to the mean' is a well-established phenomenon and cannot be ruled out with the current data (I suppose you could rule it out by comparing to another dataset in which the mean ground-truth value was different).

      We thank the reviewer for this comment. We do not disagree that there are alternative models that can describe over- and underreactions seen in the dataset. However, we do wish to point out that since we began with the normative Bayesian model, the natural progression in case the normative model fails to capture data is to modify the starting model. It is under this context that we developed the system-neglect model. It was a simple extension (a parameterized version) of the normative Bayesian model.

      Regarding the hyperprior idea, even if the participants have a hyperprior, there has to be some function that describes/implements attraction to the mean. Having a hyperprior itself does not imply attraction to this hyperprior. We therefore were not sure why the hyperprior itself can produce attraction to the mean.

      We do look further into the possibility of attraction to the mean. First, as suggested by the reviewer, we looked into another dataset with different mean ground-truth value. In Massey and Wu (2005), the transition probabilities were [0.02 0.05 0.1 0.2], which is different from the current study [0.01 0.05 0.1], and there they also found over- and underreactions as well. Second, we reason that for the attraction to the mean idea to work subjects need to know the mean of the system parameters. This would take time to develop because we did not tell subjects about the mean. If this is caused by attraction to the mean, subjects’ behavior would be different in the early stage of the experiment where they had little idea about the mean, compared with the late stage of the experiment where they knew about the mean. We will further analyze and compare participants’ data at the beginning of the experiment with data at the end of the experiment.

      More generally, any model in which participants don't fully use the numerical information they were given would produce apparent 'system neglect'. Four qualitatively different example reasons are: 1. Some individual participants completely ignored the probability values given. 2. Participants did not ignore the probability values given, but combined them with a hyperprior as above. 3. Participants had a reporting bias where their reported beliefs that a regime-change had occurred tend to be shifted towards 50% (rather than reporting 'confident' values such 5% or 95%). 4. Participants underweighted probability outliers, resulting in underweighting of evidence in the 'high signal diagnosticity' environment (10.1016/j.neuron.2014.01.020 )

      We thank the reviewer for pointing out these potential explanations. Again, we do not disagree that any model in which participants don’t fully use numerical information they were given would produce system neglect. It is hard to separate ‘not fully using numerical information’ from ‘lack of sensitivity to the numerical information’. We will respond in more details to the four example reasons later.

      In summary I agree that any model that fits the data would have to capture the idea that participants don't differentiate between the different environments as much as they should, but I think there are a number of qualitatively different reasons why they might do this - of which the above are only examples - hence I find it problematic that the authors present the behaviour as evidence for one extremely specific model.

      Again, we do not disagree with the reviewer on the modeling statement. However, we also wish to point out that the system-neglect model we had is a simple extension of the normative Bayesian model. Had we gone to a non-Bayesian framework, we would have faced the criticism of why we simply do not consider a simple extension of the starting model. In response, we will add a section in Discussion summarizing our exchange on this matter.

      (3) Despite efforts to control confounds in the fMRI study, including two control experiments, I think some confounds remain.

      For example, a network of regions is presented as correlating with the cumulative probability that there has been a regime shift in this block of 10 samples (Pt). However, regardless of the exact samples shown, Pt always increases with sample number (as by the time of later samples, there have been more opportunities for a regime shift)? To control for this the authors include, in a supplementary analysis, an 'intertemporal prior.' I would have preferred to see the results of this better-controlled analysis presented in the main figure. From the tables in the SI it is very difficult to tell how the results change with the includion of the control regressors.

      Thank you. In response, we will add a new figure, now Figure 4, showing the results of Pt and delta Pt from GLM-2 where we added the intertemporal prior as a regressor to control for temporal confounds. We compared Pt and delta Pt results in vmPFC and ventral striatum between GLM-1 and GLM-2. We also will show the results of intertemporal prior on vmPFC and ventral striatum under GLM-2.

      On the other hand, two additional fMRI experiments are done as control experiments and the effect of Pt in the main study is compared to Pt in these control experiments. Whilst I admire the effort in carrying out control studies, I can't understand how these particular experiment are useful controls. For example, in experiment 3 participants simply type in numbers presented on the screen - how can we even have an estimate of Pt from this task?

      We thank the reviewer for this comment. On the one hand, the effect of Pt we see in brain activity can be simply due to motor confounds and the purpose of Experiment 3 was to control for them. Our question was, if subjects saw the similar visual layout and were just instructed to press buttons to indicate two-digit numbers, would we observe the vmPFC, ventral striatum, and the frontoparietal network like what we did in the main experiment (Experiment 1)?

      On the other hand, the effect of Pt can simply reflect probability estimates of that the current regime is the blue regime, and therefore not particularly about change detection. In Experiment 2, we tested that idea, namely whether what we found about Pt was unique to change detection. In Experiment 2, subjects estimated the probability that the current regime is the blue regime (just as they did in Experiment 1) except that there were no regime shifts involved. In other words, it is possible that the regions we identified were generally associated with probability estimation and not particularly about probability estimates of change. We used Experiment 2 to examine whether this were true.

      To make the purpose of the two control experiments clearer, we updated the paragraph describing the control experiments on page 9:

      “To establish the neural representations for regime-shift estimation, we performed three fMRI experiments ( subjects for each experiment, 90 subjects in total). Experiment 1 was the main experiment, while Experiments 2 to 3 were control experiments that ruled out two important confounds (Fig. 1E). The control experiments were designed to clarify whether any effect of subjects’ probability estimates of a regime shift, , in brain activity can be uniquely attributed to change detection. Here we considered two major confounds that can contribute to the effect of . First, since subjects in Experiment 1 made judgments about the probability that the current regime is the blue regime (which corresponded to probability of regime change), the effect of  did not particularly have to do with change detection. To address this issue, in Experiment 2 subjects made exactly the same judgments as in Experiment 1 except that the environments were stationary (no transition from one regime to another was possible), as in Edwards (1968) classic “bookbag-and-poker chip” studies. Subjects in both experiments had to estimate the probability that the current regime is the blue regime, but this estimation corresponded to the estimates of regime change only in Experiment 1. Therefore, activity that correlated with probability estimates in Experiment 1 but not in Experiment 2 can be uniquely attributed to representing regime-shift judgments. Second, the effect of  can be due to motor preparation and/or execution, as subjects in Experiment 1 entered two-digit numbers with button presses to indicate their probability estimates. To address this issue, in Experiment 3 subjects performed a task where they were presented with two-digit numbers and were instructed to enter the numbers with button presses. By comparing the fMRI results of these experiments, we were therefore able to establish the neural representations that can be uniquely attributed to the probability estimates of regime-shift.”

      To further make sure that the probability-estimate signals in Experiment 1 were not due to motor confounds, we implemented an action-handedness regressor in the GLM, as we described below on page 19:

      “Finally, we note that in GLM-1, we implemented an “action-handedness” regressor to directly address the motor-confound issue, that higher probability estimates preferentially involved right-handed responses for entering higher digits. The action-handedness regressor was parametric, coding -1 if both finger presses involved the left hand (e.g., a subject pressed “23” as her probability estimate when seeing a signal), 0 if using one left finger and one right finger (e.g., “75”), and 1 if both finger presses involved the right hand (e.g., “90”). Taken together, these results ruled out motor confounds and suggested that vmPFC and ventral striatum represent subjects’ probability estimates of change (regime shifts) and belief revision.”

      (4) The Discussion is very long, and whilst a lot of related literature is cited, I found it hard to pin down within the discussion, what the key contributions of this study are. In my opinion it would be better to have a short but incisive discussion highlighting the advances in understanding that arise from the current study, rather than reviewing the field so broadly.

      Thank you. We thank the reviewer for pushing us to highlight the key contributions. In response, we added a paragraph at the beginning of Discussion to better highlight our contributions:

      “In this study, we investigated how humans detect changes in the environments and the neural mechanisms that contribute to how we might under- and overreact in our judgments. Combining a novel behavioral paradigm with computational modeling and fMRI, we discovered that sensitivity to environmental parameters that directly impact change detection is a key mechanism for under- and overreactions. This mechanism is implemented by distinct brain networks in the frontal and parietal cortices and in accordance with the computational roles they played in change detection. By introducing the framework in system neglect and providing evidence for its neural implementations, this study offered both theoretical and empirical insights into how systematic judgment biases arise in dynamic environments.”

      **Recommendations for the authors:

      Reviewer #3 (Recommendations for the authors):**

      Thank you for pointing out the inclusion of the intertemporal prior in glm2, this seems like an important control that would address my criticism. Why not present this better-controlled analysis in the main figure, rather than the results for glm1 which has no effective control of the increasing posterior probability of a reversal with time?

      Thank you for this suggestion. We added a new figure (Figure 4) that showed results from GLM-2. In this new figure, we showed whole-brain results on Pt and delta Pt, ROI results of vmPFC and ventral striatum on Pt, delta Pt, and intertemporal prior.

      The reason we kept results from GLM-1 (Figure 3) was primarily because we wanted to compare the effect of Pt between experiments under identical GLM. In other words, the regressors in GLM-1 was identical across all 3 experiments. In Experiments 1 and 2, Pt and delta Pt were respectively probability estimates and belief updates that current regime was the Blue regime. In Experiment 3, Pt and delta Pt were simply the number subjects were instructed to press (Pt) and change in number between successive periods (delta Pt).

      As a further point I could not navigate the tables of fMRI activations in SI and recommend replacing or supplementing these with images. For example I cannot actually find a vmPFC or ventral striatum cluster listed for the effect of Pt in GLM1 (version in table S1), which I thought were the main results? Beyond that, comparing how much weaker (or not) those results are when additional confound regressors are included in GLM2 seems impossible.

      The vmPFC and ventral striatum were part of the cluster labeled as Central Opercular cortex. In response, we will provide information about coordinates on the local maxima within the cluster. We will also add slice-by-slice images showing the effect of Pt.


      The following is the authors’ response to the original reviews

      eLife Assessment

      This study offers valuable insights into how humans detect and adapt to regime shifts, highlighting distinct contributions of the frontoparietal network and ventromedial prefrontal cortex to sensitivity to signal diagnosticity and transition probabilities. The combination of an innovative task design, behavioral modeling, and model-based fMRI analyses provides a solid foundation for the conclusions; however, the neuroimaging results have several limitations, particularly a potential confound between the posterior probability of a switch and the passage of time that may not be fully controlled by including trial number as a regressor. The control experiments intended to address this issue also appear conceptually inconsistent and, at the behavioral level, while informing participants of conditional probabilities rather than requiring learning is theoretically elegant, such information is difficult to apply accurately, as shown by well-documented challenges with conditional reasoning and base-rate neglect. Expressing these probabilities as natural frequencies rather than percentages may have improved comprehension. Overall, the study advances understanding of belief updating under uncertainty but would benefit from more intuitive probabilistic framing and stronger control of temporal confounds in future work.

      We thank the editors for the assessment and we appreciate your efforts in reviewing the paper. The editors added several limitations in the assessment based on the new reviewer 3 in this round, which we would like to clarify below.

      With regard to temporal confounds, we clarified in the main text and response to Reviewer 3 that we had already addressed the potential confound between posterior probability of a switch and passage of time in GLM-2 with the inclusion of intertemporal prior. After adding intertemporal prior in the GLM, we still observed the same fMRI results on probability estimates. In addition, we did two other robustness checks, which we mentioned in the manuscript.

      With regard to response mode (probability estimation rather than choice or indicating natural frequencies), we wish to point out that the in previous research by Massey and Wu (2005), which the current study was based on, the concern of participants showing system-neglect tendencies due to the mode of information delivery, namely indicating beliefs through reporting probability estimates rather than through choice or other response mode was addressed. Massy and Wu (2005, Study 3) found the same biases when participants performed a choice task that did not require them to indicate probability estimates.

      With regard to the control experiments, the control experiments in fact were not intended to address the confounds between posterior probability and passage of time. Rather, they aimed to address whether the neural findings were unique to change detection (Experiment 2) and to address visual and motor confounds (Experiment 3). These and the results of the control experiments were mentioned on page 18-19.

      We also wish to highlight that we had performed detailed model comparisons after reviewer 2’s suggestions. Although reviewer 2 was unable to re-review the manuscript, we believe this provides insight into the literature on change detection. See “Incorporating signal dependency into system-neglect model led to better models for regime-shift detection” (p.27-30). The model comparison showed that system-neglect models that incorporate signal dependency are better models than the original system-neglect model in describing participants probability estimates. This suggests that people respond to change-consistent and change-inconsistent signals differently when judging whether the regime had changed. This was not reported in previous behavioral studies and was largely inspired by the neural finding on signal dependency in the frontoparietal cortex. It indicates that neural findings can provide novel insights into computational modeling of behavior.

      To better highlight and summarize our key contributions, we added a paragraph at the beginning of Discussion:

      “In this study, we investigated how humans detect changes in the environments and the neural mechanisms that contribute to how we might under- and overreact in our judgments. Combining a novel behavioral paradigm with computational modeling and fMRI, we discovered that sensitivity to environmental parameters that directly impact change detection is a key mechanism for under- and overreactions. This mechanism is implemented by distinct brain networks in the frontal and parietal cortices and in accordance with the computational roles they played in change detection. By introducing the framework in system neglect and providing evidence for its neural implementations, this study offered both theoretical and empirical insights into how systematic judgment biases arise in dynamic environments.”    

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study examines human biases in a regime-change task, in which participants have to report the probability of a regime change in the face of noisy data. The behavioral results indicate that humans display systematic biases, in particular, overreaction in stable but noisy environments and underreaction in volatile settings with more certain signals. fMRI results suggest that a frontoparietal brain network is selectively involved in representing subjective sensitivity to noise, while the vmPFC selectively represents sensitivity to the rate of change.

      Strengths:

      - The study relies on a task that measures regime-change detection primarily based on descriptive information about the noisiness and rate of change. This distinguishes the study from prior work using reversal-learning or change-point tasks in which participants are required to learn these parameters from experiences. The authors discuss these differences comprehensively.

      - The study uses a simple Bayes-optimal model combined with model fitting, which seems to describe the data well. The model is comprehensively validated.

      - The authors apply model-based fMRI analyses that provide a close link to behavioral results, offering an elegant way to examine individual biases.

      We thank the reviewer for the comments.

      Weaknesses:

      The authors have adequately addressed most of my prior concerns.

      We thank the reviewer for recognizing our effort in addressing your concerns.

      My only remaining comment concerns the z-test of the correlations. I agree with the non-parametric test based on bootstrapping at the subject level, providing evidence for significant differences in correlations within the left IFG and IPS.

      However, the parametric test seems inadequate to me. The equation presented is described as the Fisher z-test, but the numerator uses the raw correlation coefficients (r) rather than the Fisher-transformed values (z). To my understanding, the subtraction should involve the Fisher z-scores, not the raw correlations.

      More importantly, the Fisher z-test in its standard form assumes that the correlations come from independent samples, as reflected in the denominator (which uses the n of each independent sample). However, in my opinion, the two correlations are not independent but computed within-subject. In such cases, parametric tests should take into account the dependency. I believe one appropriate method for the current case (correlated correlation coefficients sharing a variable [behavioral slope]) is explained here:

      Meng, X.-l., Rosenthal, R., & Rubin, D. B. (1992). Comparing correlated correlation coefficients. Psychological Bulletin, 111(1), 172-175. https://doi.org/10.1037/0033-2909.111.1.172

      It should be implemented here:

      Diedenhofen B, Musch J (2015) cocor: A Comprehensive Solution for the Statistical Comparison of Correlations. PLoS ONE 10(4): e0121945. https://doi.org/10.1371/journal.pone.0121945

      My recommendation is to verify whether my assumptions hold, and if so, perform a test that takes correlated correlations into account. Or, to focus exclusively on the non-parametric test.

      In any case, I recommend a short discussion of these findings and how the authors interpret that some of the differences in correlations are not significant.

      Thank you for the careful check. Yes. This was indeed a mistake from us. We also agree that the two correlations are not independent. Therefore, we modified the test that accounts for dependent correlations by following Meng et al. (1992) suggested by the reviewer. We updated in the Methods section on p.56-57:

      “In the parametric test, we adopted the approach of Meng et al. (1992) to statistically compare the two correlation coefficients. This approach specifically tests differences between dependent correlation coefficients according to the following equation

      Where N is the number of subjects, z<sub>ri</sub> is the Fisher z-transformed value of r<sub>i</sub>,(r<sub>1</sub> = r<sub>blue</sub> and r<sub>2</sub> = r<sub>red</sub>), and r<sub>x</sub> is the correlation between the neural sensitivity at change-consistent signals and change-inconsistent signals. The computation of h is based on the following equations

      Where is the mean of the , and f should be set to 1 if > 1.”

      We updated on the Results section on p.29:

      “Since these correlation coefficients were not independent, we compared them using the test developed in Meng et al. (1992) (see Methods). We found that among the five ROIs in the frontoparietal network, two of them, namely the left IFG and left IPS, the difference in correlation was significant (one-tailed z test; left IFG: z = 1.8908, p = 0.0293; left IPS: z = 2.2584, p = 0.0049). For the remaining three ROIs, the difference in correlation was not significant (dmPFC: z = 0.9522, p = 0.1705; right IFG: z = 0.9860, p = 0.1621; right IPS: z = 1.4833, p = 0.0690).”

      We added a Discussion on these results on p.41:

      “Interestingly, such sensitivity to signal diagnosticity was only present in the frontoparietal network when participants encountered change-consistent signals. However, while most brain areas within this network responded in this fashion, only the left IPS and left IFG showed a significant difference in coding individual participants’ sensitivity to signal diagnosticity between change-consistent and change-inconsistent signals. Unlike the left IPS and left IFG, we observed in dmPFC a marginally significant correlation with behavioral sensitivity at change-inconsistent signals as well. Together, these results indicate that while different brain areas in the frontoparietal network responded similarly to change-consistent signals, there was a greater degree of heterogeneity in responding to change-inconsistent signals.”

      Reviewer #3 (Public review):

      This study concerns how observers (human participants) detect changes in the statistics of their environment, termed regime shifts. To make this concrete, a series of 10 balls are drawn from an urn that contains mainly red or mainly blue balls. If there is a regime shift, the urn is changed over (from mainly red to mainly blue) at some point in the 10 trials. Participants report their belief that there has been a regime shift as a % probability. Their judgement should (mathematically) depend on the prior probability of a regime shift (which is set at one of three levels) and the strength of evidence (also one of three levels, operationalized as the proportion of red balls in the mostly-blue urn and vice versa). Participants are directly instructed of the prior probability of regime shift and proportion of red balls, which are presented on-screen as numerical probabilities. The task therefore differs from most previous work on this question in that probabilities are instructed rather than learned by observation, and beliefs are reported as numerical probabilities rather than being inferred from participants' choice behaviour (as in many bandit tasks, such as Behrens 2007 Nature Neurosci).

      The key behavioural finding is that participants over-estimate the prior probability of regime change when it is low, and under estimate it when it is high; and participants over-estimate the strength of evidence when it is low and under-estimate it when it is high. In other words participants make much less distinction between the different generative environments than an optimal observer would. This is termed 'system neglect'. A neuroeconomic-style mathematical model is presented and fit to data.

      Functional MRI results how that strength of evidence for a regime shift (roughly, the surprise associated with a blue ball from an apparently red urn) is associated with activity in the frontal-parietal orienting network. Meanwhile, at time-points where the probability of a regime shift is high, there is activity in another network including vmPFC. Both networks show individual differences effects, such that people who were more sensitive to strength of evidence and prior probability show more activity in the frontal-parietal and vmPFC-linked networks respectively.

      We thank the reviewer for the overall descriptions of the manuscript.

      Strengths

      (1) The study provides a different task for looking at change-detection and how this depends on estimates of environmental volatility and sensory evidence strength, in which participants are directly and precisely informed of the environmental volatility and sensory evidence strength rather than inferring them through observation as in most previous studies

      (2) Participants directly provide belief estimates as probabilities rather than experimenters inferring them from choice behaviour as in most previous studies

      (3) The results are consistent with well-established findings that surprising sensory events activate the frontal-parietal orienting network whilst updating of beliefs about the word ('regime shift') activates vmPFC.

      Thank you for these assessments.

      Weaknesses

      (1) The use of numerical probabilities (both to describe the environments to participants, and for participants to report their beliefs) may be problematic because people are notoriously bad at interpreting probabilities presented in this way, and show poor ability to reason with this information (see Kahneman's classic work on probabilistic reasoning, and how it can be improved by using natural frequencies). Therefore the fact that, in the present study, people do not fully use this information, or use it inaccurately, may reflect the mode of information delivery.

      We appreciate the reviewer’s concern on this issue. The concern was addressed in Massey and Wu (2005) as participants performed a choice task in which they were not asked to provide probability estimates (Study 3 in Massy and Wu, 2005). Instead, participants in Study 3 were asked to predict the color of the ball before seeing a signal. This was a more intuitive way of indicating his or her belief about regime shift. The results from the choice task were identical to those found in the probability estimation task (Study 1 in Massey and Wu). We take this as evidence that the system-neglect behavior the participants showed was less likely to be due to the mode of information delivery.

      (2) Although a very precise model of 'system neglect' is presented, many other models could fit the data.

      For example, you would get similar effects due to attraction of parameter estimates towards a global mean - essentially application of a hyper-prior in which the parameters applied by each participant in each block are attracted towards the experiment-wise mean values of these parameters. For example, the prior probability of regime shift ground-truth values [0.01, 0.05, 0.10] are mapped to subjective values of [0.037, 0.052, 0.069]; this would occur if observers apply a hyper-prior that the probability of regime shift is about 0.05 (the average value over all blocks). This 'attraction to the mean' is a well-established phenomenon and cannot be ruled out with the current data (I suppose you could rule it out by comparing to another dataset in which the mean ground-truth value was different).

      We thank the reviewer for this comment. It is true that the system-neglect model is not entirely inconsistent with regression to the mean, regardless of whether the implementation has a hyper prior or not. In fact, our behavioral measure of sensitivity to transition probability and signal diagnosticity, which we termed the behavioral slope, is based on linear regression analysis. In general, the modeling approach in this paper is to start from a generative model that defines ideal performance and consider modifying the generative model when systematic deviations in actual performance from the ideal is observed. In this approach, a generative Bayesian model with hyper priors would be more complex to begin with, and a regression to the mean idea by itself does not generate a priori predictions.

      More generally, any model in which participants don't fully use the numerical information they were given would produce apparent 'system neglect'. Four qualitatively different example reasons are: 1. Some individual participants completely ignored the probability values given. 2. Participants did not ignore the probability values given, but combined them with a hyperprior as above. 3. Participants had a reporting bias where their reported beliefs that a regime-change had occurred tend to be shifted towards 50% (rather than reporting 'confident' values such 5% or 95%). 4. Participants underweighted probability outliers resulting in underweighting of evidence in the 'high signal diagnosticity' environment (10.1016/j.neuron.2014.01.020)

      In summary I agree that any model that fits the data would have to capture the idea that participants don't differentiate between the different environments as much as they should, but I think there are a number of qualitatively different reasons why they might do this - of which the above are only examples - hence I find it problematic that the authors present the behaviour as evidence for one extremely specific model.

      Thank you for raising this point. The modeling principle we adopt is the following. We start from the normative model—the Bayesian model—that defined what normative behavior should look like. We compared participants’ behavior with the Bayesian model and found systematic deviations from it. To explain those systematic deviations, we considered modeling options within the confines of the same modeling framework. In other words, we considered a parameterized version of the Bayesian model, which is the system-neglect model and examined through model comparison the best modeling choice. This modeling approach is not uncommon in economics and psychology. For example, Kahneman and Tversky adopted this approach when proposing prospect theory, a modification of expected utility theory where expected utility theory can be seen as one specific model for how utility of an option should be computed.

      (3) Despite efforts to control confounds in the fMRI study, including two control experiments, I think some confounds remain.

      For example, a network of regions is presented as correlating with the cumulative probability that there has been a regime shift in this block of 10 samples (Pt). However, regardless of the exact samples shown, doesn't Pt always increase with sample number (as by the time of later samples, there have been more opportunities for a regime shift)? Unless this is completely linear, the effect won't be controlled by including trial number as a co-regressor (which was done).

      Thank you for raising this concern. Yes, Pt always increases with sample number regardless of evidence (seeing change-consistent or change-inconsistent signals). This is captured by the ‘intertemporal prior’ in the Bayesian model, which we included as a regressor in our GLM analysis (GLM-2), in addition to Pt. In short, GLM-1 had Pt and sample number. GLM-2 had Pt, intertemporal prior, and sample number, among other regressors. And we found that, in both GLM-1 and GLM-2, both vmPFC and ventral striatum correlated with Pt.

      To make this clearer, we updated the main text to further clarify this on p.18:

      “We examined the robustness of P<sub>t</sub> representations in these two regions in several follow-up analyses. First, we implemented a GLM (GLM-2 in Methods) that, in addition to P<sub>t</sub>, included various task-related variables contributing to P<sub>t</sub> as regressors (Fig. S7 in SI). Specifically, to account for the fact that the probability of regime change increased over time, we included the intertemporal prior as a regressor in GLM-2. The intertemporal prior is the natural logarithm of the odds in favor of regime shift in the t-th period, where q is transition probability and t = 1,…,10 is the period (see Eq. 1 in Methods). It describes normatively how the prior probability of change increased over time regardless of the signals (blue and red balls) the subjects saw during a trial. Including it along with P<sub>t</sub> would clarify whether any effect of P<sub>t</sub> can otherwise be attributed to the intertemporal prior. Second, we implemented a GLM that replaced P<sub>t</sub> with the log odds of P<sub>t</sub>, ln (P<sub>t</sub>/(1-P<sub>t</sub>)) (Fig. S8 in SI). Third, we implemented a GLM that examined  separately on periods when change-consistent (blue balls) and change-inconsistent (red balls) signals appeared (Fig. S9 in SI). Each of these analyses showed the same pattern of correlations between P<sub>t</sub> and activation in vmPFC and ventral striatum, further establishing the robustness of the P<sub>t</sub> findings.”

      On the other hand, two additional fMRI experiments are done as control experiments and the effect of Pt in the main study is compared to Pt in these control experiments. Whilst I admire the effort in carrying out control studies, I can't understand how these particular experiment are useful controls. For example in experiment 3 participants simply type in numbers presented on the screen - how can we even have an estimate of Pt from this task?

      We thank the reviewer for this comment. On the one hand, the effect of Pt we see in brain activity can be simply due to motor confounds and the purpose of Experiment 3 was to control for them. Our question was, if subjects saw the similar visual layout and were just instructed to press buttons to indicate two-digit numbers, would we observe the vmPFC, ventral striatum, and the frontoparietal network like what we did in the main experiment (Experiment 1)?

      On the other hand, the effect of Pt can simply reflect probability estimates of that the current regime is the blue regime, and therefore not particularly about change detection. In Experiment 2, we tested that idea, namely whether what we found about Pt was unique to change detection. In Experiment 2, subjects estimated the probability that the current regime is the blue regime (just as they did in Experiment 1) except that there were no regime shifts involved. In other words, it is possible that the regions we identified were generally associated with probability estimation and not particularly about probability estimates of change. We used Experiment 2 to examine whether this were true.

      To make the purpose of the two control experiments clearer, we updated the paragraph describing the control experiments on page 9:

      “To establish the neural representations for regime-shift estimation, we performed three fMRI experiments (n\=30 subjects for each experiment, 90 subjects in total). Experiment 1 was the main experiment, while Experiments 2 to 3 were control experiments that ruled out two important confounds (Fig. 1E). The control experiments were designed to clarify whether any effect of subjects’ probability estimates of a regime shift, P<sub>t</sub>, in brain activity can be uniquely attributed to change detection. Here we considered two major confounds that can contribute to the effect of . First, since subjects in Experiment 1 made judgments about the probability that the current regime is the blue regime (which corresponded to probability of regime change), the effect of P<sub>t</sub> did not particularly have to do with change detection. To address this issue, in Experiment 2 subjects made exactly the same judgments as in Experiment 1 except that the environments were stationary (no transition from one regime to another was possible), as in Edwards (1968) classic “bookbag-and-poker chip” studies. Subjects in both experiments had to estimate the probability that the current regime is the blue regime, but this estimation corresponded to the estimates of regime change only in Experiment 1. Therefore, activity that correlated with probability estimates in Experiment 1 but not in Experiment 2 can be uniquely attributed to representing regime-shift judgments. Second, the effect of P<sub>t</sub> can be due to motor preparation and/or execution, as subjects in Experiment 1 entered two-digit numbers with button presses to indicate their probability estimates. To address this issue, in Experiment 3 subjects performed a task where they were presented with two-digit numbers and were instructed to enter the numbers with button presses. By comparing the fMRI results of these experiments, we were therefore able to establish the neural representations that can be uniquely attributed to the probability estimates of regime-shift.”

      To further make sure that the probability-estimate signals in Experiment 1 were not due to motor confounds, we implemented an action-handedness regressor in the GLM, as we described below on page 19:

      “Finally, we note that in GLM-1, we implemented an “action-handedness” regressor to directly address the motor-confound issue, that higher probability estimates preferentially involved right-handed responses for entering higher digits. The action-handedness regressor was parametric, coding -1 if both finger presses involved the left hand (e.g., a subject pressed “23” as her probability estimate when seeing a signal), 0 if using one left finger and one right finger (e.g., “75”), and 1 if both finger presses involved the right hand (e.g., “90”). Taken together, these results ruled out motor confounds and suggested that vmPFC and ventral striatum represent subjects’ probability estimates of change (regime shifts) and belief revision.”

      (4) The Discussion is very long, and whilst a lot of related literature is cited, I found it hard to pin down within the discussion, what the key contributions of this study are. In my opinion it would be better to have a short but incisive discussion highlighting the advances in understanding that arise from the current study, rather than reviewing the field so broadly.

      Thank you. We thank the reviewer for pushing us to highlight the key contributions. In response, we added a paragraph at the beginning of Discussion to better highlight our contributions:

      “In this study, we investigated how humans detect changes in the environments and the neural mechanisms that contribute to how we might under- and overreact in our judgments. Combining a novel behavioral paradigm with computational modeling and fMRI, we discovered that sensitivity to environmental parameters that directly impact change detection is a key mechanism for under- and overreactions. This mechanism is implemented by distinct brain networks in the frontal and parietal cortices and in accordance with the computational roles they played in change detection. By introducing the framework in system neglect and providing evidence for its neural implementations, this study offered both theoretical and empirical insights into how systematic judgment biases arise in dynamic environments.”

      Recommendations for the authors:

      Reviewer #3 (Recommendations for the authors):

      Many of the figures are too tiny - the writing is very small, as are the pictures of brains. I'd suggest adjusting these so they will be readable without enlarging.

      Thank you. We apologize for the poor readability of the figures. We had enlarged the figures (Fig. 5 in particular) and their font size to make them more readable.

    1. eLife Assessment

      This article reports an algorithm for inferring the presence of synaptic connection between neurons based on naturally occurring spiking activity of a neuronal network. One key improvement is to combine self-supervised and synthetic approaches to learn to focus on features that generalize to the conditions of the observed network. This valuable contribution is currently supported by incomplete evidence.

    2. Reviewer #1 (Public review):

      Summary:

      The authors proposed a new method to infer connectivity from spike trains whose main novelty relies on their approach to mitigate the problem of model mismatch. The latter arises when the inference algorithm is trained or based on a model that does not accurately describe the data. They propose combining domain adaptation with a deep neural architecture and in an architecture called DeepDAM. They apply DeepDAM to an in vivo ground-truth dataset previously recorded in mouse CA1, show that it performs better than methods without domain adaptation, and evaluate its robustness. Finally, they show that their approach can also be applied to a different problem i.e., inferring biophysical properties of individual neurons.

      Strengths:

      (1) The problem of inferring connectivity from extracellular recording is a very timely one: as the yield of silicon probes steadily increases, the number of simultaneously recorded pairs does so quadratically, drastically increasing the possibility of detecting connected pairs.

      (2) Using domain adaptation to address model mismatch is a clever idea, and the way the authors introduced it into the larger architecture seems sensible.

      (3) The authors clearly put a great effort into trying to communicate the intuitions to the reader.

      Weaknesses:

      (1) The validation of the approach is incomplete: due to its very limited size, the single ground-truth dataset considered does not provide a sufficient basis to draw a strong conclusion. While the authors correctly note that this is the only dataset of its kind, the value of this validation is limited compared to what could be done by carefully designing in silico experiments.

      (2) Surprisingly, the authors fail to compare their method to the approach originally proposed for the data they validate on (English et al., 2017).

      (3) The authors make a commendable effort to study the method's robustness by pushing the limits of the dataset. However, the logic of the robustness analysis is often unclear, and once again, the limited size of the dataset poses major limitations to the authors.

      (4) The lack of details concerning both the approach and the validation makes it challenging for the reader to establish the technical soundness of the study.

      Although in the current form this study does not provide enough basis to judge the impact of DeepDAM in the broader neuroscience community, it nevertheless puts forward a valuable and novel idea: using domain adaptation to mitigate the problem of model mismatch. This approach might be leveraged in future studies and methods to infer connectivity.

    3. Reviewer #2 (Public review):

      The article is very well written, and the new methodology is presented with care. I particularly appreciated the step-by-step rationale for establishing the approach, such as the relationship between K-means centers and the various parameters. This text is conveniently supported by the flow charts and t-SNE plots. Importantly, I thought the choice of state-of-the-art method was appropriate and the choice of dataset adequate, which together convinced me in believing the large improvement reported. I thought that the crossmodal feature-engineering solution proposed was elegant and seems exportable to other fields. Here are a few notes.<br /> While the validation data set was well chosen and of high quality, it remains a single dataset and also remains a non-recurrent network. The authors acknowledge this in the discussion, but I wanted to chime in to say that for the method to be more than convincing, it would need to have been tested on more datasets. It should be acknowledged that the problem becomes more complicated in a recurrent excitatory network, and thus the method may not work as well in the cortex or in CA3.

      While the data is shown to work in this particular dataset (plus the two others at the end), I was left wondering when the method breaks. And it should break if the models are sufficiently mismatched. Such a question can be addressed using synthetic-synthetic models. This was an important intuition that I was missing, and an important check on the general nature of the method that I was missing.

      While the choice of state-of-the-art is good in my opinion, I was looking for comments on the methods prior to that. For instance, methods such based on GLMs have been used by the Pillow, Paninski, and Truccolo groups. I could not find a decent discussion of these methods in the main text and thought that both their acknowledgement and rationale for dismissing were missing.

      While most of the text was very clear, I thought that page 11 was odd and missing much in terms of introductions. Foremost is the introduction of the dataset, which is never really done. Page 11 refers to 'this dataset', while the previous sentence was saying that having such a dataset would be important and is challenging. The dataset needs to be properly described: what's the method for labeling, what's the brain area, what were the spike recording methodologies, what is meant by two labeling methodologies, what do we know about the idiosyncrasies of the particular network the recording came from (like CA1 is non-recurrent, so which connections)? I was surprised to see 'English et al.' cited in text only on page 13 since their data has been hailed from the beginning.

      Further elements that needed definition are the Nsyn and i, which were not defined in the cortex of Equation 2-3: I was not sure if it referred to different samples or different variants of the synthetic model. I also would have preferred having the function f defined earlier, as it is defined for Equation 3, but appears in Equation 2.

      When the loss functions are described, it would be important to define 'data' and 'labels' here. This machine learning jargon has a concrete interpretation in this context, and making this concrete would be very important for the readership.

      While I appreciated that there was a section on robustness, I did not find that the features studied were the most important. In this context, I was surprised that the other datasets were relegated to supplementary, as these appeared more relevant.

      Some of the figures have text that is too small. In particular, Figure 2 has text that is way too small. It seemed to me that the pseudo code could stand alone, and the screenshot of the equations did not need to be repeated in a figure, especially if their size becomes so small that we can't even read them.

    4. Author response:

      General Response

      We thank the reviewers for their positive assessment of our work and for acknowledging the timeliness of the problem and the novelty of using domain adaptation to address model mismatch. We appreciate the constructive feedback regarding validation and clarity. In the revised manuscript, we will address these points as follows:

      (1) Systematic Validation: We will design and perform systematic in silico experiments to evaluate the method beyond the single in vivo dataset , including robustness tests regarding recording length and network synchrony.

      (2) Recurrent Networks & Failure Analysis: We will test our method on synthetic datasets generated from highly recurrent networks and analyze exactly when the method breaks as a function of mismatch magnitude.

      (3) Method Comparisons: We will report the Matthews Correlation Coefficient (MCC) for the approach by English et al. (2017) and expand our comparison and discussion of GLM-based methods.

      (4) Clarifications: We will rigorously define the dataset details (labeling, recording methodology), mathematical notation, and machine learning terminology ('data', 'labels').

      (5) Discussion of Limitations: We will explicitly discuss the challenges and limitations inherent in generalizing to more recurrently connected regions.

      Below are our more detailed responses:

      Public Reviews:

      Reviewer #1 (Public review):

      Weaknesses:

      (1) The validation of the approach is incomplete: due to its very limited size, the single ground-truth dataset considered does not provide a sufficient basis to draw a strong conclusion. While the authors correctly note that this is the only dataset of its kind, the value of this validation is limited compared to what could be done by carefully designing in silico experiments.

      We thank the reviewer for acknowledging the scarcity of suitable in vivo ground-truth datasets and the limitations this poses. We agree that additional validation is necessary to draw strong conclusions. In the revised manuscript, we will systematically design and perform in silico experiments for evaluations beyond the single in vivo dataset.

      (2) Surprisingly, the authors fail to compare their method to the approach originally proposed for the data they validate on (English et al., 2017).

      We agree that this is an essential comparison. We will report the Matthews Correlation Coefficient (MCC) result of the approach by English et al. (2017) on the spontaneous period of the recording.

      (3) The authors make a commendable effort to study the method's robustness by pushing the limits of the dataset. However, the logic of the robustness analysis is often unclear, and once again, the limited size of the dataset poses major limitations to the authors.

      We appreciate the reviewer recognizing our initial efforts to evaluate robustness. In our original draft, we tested recording length, network model choices, and analyzed failure cases. However, we agree that the limited real data restricts the scope of these tests. To address this, we will perform more systematic robustness tests on the newly generated synthetic datasets in the revised version, allowing us to evaluate performance under a wider range of conditions.

      (4) The lack of details concerning both the approach and the validation makes it challenging for the reader to establish the technical soundness of the study.

      We will revise the manuscript thoroughly to better present the methodology of our framework and the validation pipelines. We will ensure that the figures and text clearly articulate the technical details required to assess the soundness of the study.

      Although in the current form this study does not provide enough basis to judge the impact of DeepDAM in the broader neuroscience community, it nevertheless puts forward a valuable and novel idea: using domain adaptation to mitigate the problem of model mismatch. This approach might be leveraged in future studies and methods to infer connectivity.

      We thank the reviewer again for acknowledging the novelty and importance of our work.

      Reviewer #2 (Public review):

      While the validation data set was well chosen and of high quality, it remains a single dataset and also remains a non-recurrent network. The authors acknowledge this in the discussion, but I wanted to chime in to say that for the method to be more than convincing, it would need to have been tested on more datasets. It should be acknowledged that the problem becomes more complicated in a recurrent excitatory network, and thus the method may not work as well in the cortex or in CA3.

      We will carefully revise our text to specifically discuss this limitation and the challenges inherent in generalizing to more recurrently connected regions. Furthermore, to empirically address this concern, we will test our method extensively on synthetic datasets generated from highly recurrent networks to quantify performance in these regimes.

      While the data is shown to work in this particular dataset (plus the two others at the end), I was left wondering when the method breaks. And it should break if the models are sufficiently mismatched. Such a question can be addressed using synthetic-synthetic models. This was an important intuition that I was missing, and an important check on the general nature of the method that I was missing.

      We thank the reviewer for this insight regarding the general nature of the method. While we previously analyzed failure cases regarding strong covariation and low spike counts, we agree that a systematic analysis of mismatch magnitude is missing. Building on our planned experiments with synthetic data, we will analyze and discuss exactly when the method breaks as a function of the mismatch magnitude between datasets.

      While the choice of state-of-the-art is good in my opinion, I was looking for comments on the methods prior to that. For instance, methods such based on GLMs have been used by the Pillow, Paninski, and Truccolo groups. I could not find a decent discussion of these methods in the main text and thought that both their acknowledgement and rationale for dismissing were missing.

      As the reviewer noted, we extensively compared our method with a GLM-based method (GLMCC) and CoNNECT, whose superiority over other GLM-based methods, such as extend GLM method (Ren et al., 2020, J Neurophysiol), have already been demonstrated in their papers (Endo et al., Sci Rep, 2021). However, we acknowledge that the discussion of the broader GLM literature was insufficient. To make the comparison more thorough, we will conduct comparisons with additional GLM-based methods and include a detailed discussion of these approaches.

      Endo, D., Kobayashi, R., Bartolo, R., Averbeck, B. B., Sugase-Miyamoto, Y., Hayashi, K., ... & Shinomoto, S. (2021). A convolutional neural network for estimating synaptic connectivity from spike trains. Scientific Reports, 11(1), 12087.

      Ren, N., Ito, S., Hafizi, H., Beggs, J. M., & Stevenson, I. H. (2020). Model-based detection of putative synaptic connections from spike recordings with latency and type constraints. Journal of Neurophysiology, 124(6), 1588-1604.

      While most of the text was very clear, I thought that page 11 was odd and missing much in terms of introductions. Foremost is the introduction of the dataset, which is never really done. Page 11 refers to 'this dataset', while the previous sentence was saying that having such a dataset would be important and is challenging. The dataset needs to be properly described: what's the method for labeling, what's the brain area, what were the spike recording methodologies, what is meant by two labeling methodologies, what do we know about the idiosyncrasies of the particular network the recording came from (like CA1 is non-recurrent, so which connections)? I was surprised to see 'English et al.' cited in text only on page 13 since their data has been hailed from the beginning.

      Further elements that needed definition are the Nsyn and i, which were not defined in the cortex of Equation 2-3: I was not sure if it referred to different samples or different variants of the synthetic model. I also would have preferred having the function f defined earlier, as it is defined for Equation 3, but appears in Equation 2.

      When the loss functions are described, it would be important to define 'data' and 'labels' here. This machine learning jargon has a concrete interpretation in this context, and making this concrete would be very important for the readership.

      We thank the reviewer for these constructive comments on the writing. We will clarify the introduction of the dataset (labeling method, brain area, recording methodology) and ensure all mathematical terms (such as Nsyn, i, and function f) and machine learning terminology (definitions of 'data' and 'labels' in this context) are rigorously defined upon first use in the revised manuscript.

      While I appreciated that there was a section on robustness, I did not find that the features studied were the most important. In this context, I was surprised that the other datasets were relegated to supplementary, as these appeared more relevant.

      Robustness is an important aspect of our framework to demonstrate its applicability to real experimental scenarios. We specifically analyzed how synchrony between neurons, the number of recorded spikes and the choice of the network influence the performance of our method. We also agree that these aspects are limited by the one dataset we evaluated on. Therefore, we will test the robustness of our method more systematically on synthetic datasets.

      With more extensive analysis on synthetic datasets, we believe that the results on inferring biophysical properties of single neuron and microcircuit models remain in the supplement, such that the main figures focus purely on synaptic connectivity inference.

      Some of the figures have text that is too small. In particular, Figure 2 has text that is way too small. It seemed to me that the pseudo code could stand alone, and the screenshot of the equations did not need to be repeated in a figure, especially if their size becomes so small that we can't even read them.

      We will remove the pseudo-code and equations from Figure 2 to improve readability. The pseudo-code will be presented as a distinct box in the main text.

    1. eLife Assessment

      This useful paper describes a software tool, "DrosoMating", which allows automated, high-throughput quantification of 6 common metrics of courtship and mating behaviors in Drosophila melanogaster. The validity of the tool is quite convincingly demonstrated by comparing expert human assessments with those made by DrosoMating. The work, however, does not address how DrosoMating compares with or advances on other existing tools for exactly the same purpose, whether it can be used for studies of other Drosophila species, and/or whether finer aspects of courtship response timing - which depend on proximal female signals to the male - could be extracted with more detailed analyses. Some additional statistical analyses would also help further strengthen the authors' current conclusions.

    2. Reviewer #1 (Public review):

      Summary:

      The study of Drosophila mating behaviors has offered a powerful entry point for understanding how complex innate behaviors are instantiated in the brain. The effectiveness of this behavioral model stems from how readily quantifiable many components of the courtship ritual are, facilitating the fine-scale correlations between the behaviors and the circuits that underpin their implementation. Detailed quantification, however, can be both time-consuming and error-prone, particularly when scored manually. Song et al. have sought to address this challenge by developing DrosoMating, software that facilitates the automated and high-throughput quantification of 6 common metrics of courtship and mating behaviors. Compared to a human observer, DrosoMating matches courtship scoring with high fidelity. Further, the authors demonstrate that the software effectively detects previously described variations in courtship resulting from genetic background or social conditioning. Finally, they validate its utility in assaying the consequences of neural manipulations by silencing Kenyon cells involved in memory formation in the context of courtship conditioning.

      Strengths:

      (1) The authors demonstrate that for three key courtship/mating metrics, DrosoMating performs virtually indistinguishably from a human observer, with differences consistently within 10 seconds and no statistically significant differences detected. This demonstrates the software's usefulness as a tool for reducing bias and scoring time for analyses involving these metrics.

      (2) The authors validate the tool across multiple genetic backgrounds and experimental manipulations to confirm its ability to detect known influences on male mating behavior.

      (3) The authors present a simple, modular chamber design that is integrated with DrosoMating and allows for high-throughput experimentation, capable of simultaneously analyzing up to 144 fly pairs across all chambers.

      Weaknesses:

      (1) DrosoMating appears to be an effective tool for the high-throughput quantification of key courtship and mating metrics, but a number of similar tools for automated analysis already exist. FlyTracker (CalTech), for instance, is a widely used software that offers a similar machine vision approach to quantifying a variety of courtship metrics. It would be valuable to understand how DrosoMating compares to such approaches and what specific advantages it might offer in terms of accuracy, ease of use, and sensitivity to experimental conditions.

      (2) The courtship behaviors of Drosophila males represent a series of complex behaviors that unfold dynamically in response to female signals (Coen et al., 2014; Ning et al., 2022; Roemschied et al., 2023). While metrics like courtship latency, courtship index, and copulation duration are useful summary statistics, they compress the complexity of actions that occur throughout the mating ritual. The manuscript would be strengthened by a discussion of the potential for DrosoMating to capture more of the moment-to-moment behaviors that constitute courtship. Even without modifying the software, it would be useful to see how the data can be used in combination with machine learning classifiers like JAABA to better segment the behavioral composition of courtship and mating across genotypes and experimental manipulations. Such integration could substantially expand the utility of this tool for the broader Drosophila neuroscience community.

      (3) While testing the software's capacity to function across strains is useful, it does not address the "universality" of this method. Cross-species studies of mating behavior diversity are becoming increasingly common, and it would be beneficial to know if this tool can maintain its accuracy in Drosophila species with a greater range of morphological and behavioral variation. Demonstrating the software's performance across species would strengthen claims about its broader applicability.

    3. Reviewer #2 (Public review):

      This paper introduces "DrosoMating," an integrated hardware and software solution for automating the analysis of male Drosophila courtship. The authors aim to provide a low-cost, accessible alternative to expensive ethological rigs by utilizing a custom acrylic chamber and smartphone-based recording. The system focuses on quantifying key temporal metrics-Courtship Index (CI), Copulation Latency (CL), and Mating Duration (MD)-and is applied to behavioral paradigms involving memory mutants (orb2, rut).

      The development of open-source behavioral tools is a significant contribution to neuroethology, and the authors successfully demonstrate a system that simplifies the setup for large-scale screens. A major strength of the work is the specific focus on automating Copulation Latency and Mating Duration, metrics that are often labor-intensive to score manually.

      However, there are several limitations in the current analysis and validation that affect the strength of the conclusions:

      First, the statistical rigor requires substantial improvement. The analysis of multi-group experiments (e.g., comparing four distinct strains or factorial designs with genotype and training) currently relies on multiple independent Student's t-tests. This approach is statistically invalid for these experimental designs as it inflates the family-wise Type I error rate. To support the claims of strain-specific differences or learning deficits, the data must be analyzed using Analysis of Variance (ANOVA) to properly account for multiple comparisons and to explicitly test for interaction effects between genotype and training conditions.

      Second, the biological validation using w1118 and y1 mutants entails a potential confound. The authors attribute the low Courtship Index in these strains to courtship-specific deficits. However, both strains are known to exhibit general locomotor sluggishness (due to visual or pigmentation/behavioral defects). Since "following" behavior is likely a component of the Courtship Index, a reduction in this metric could reflect a general motor deficit rather than a specific lack of reproductive motivation. Without controlling for general locomotion, the interpretation of these behavioral phenotypes remains ambiguous.

      Third, the benchmarking of the system is currently limited to comparisons against manual scoring. Given that the field has largely adopted sophisticated open-source tracking tools (e.g., Ctrax, FlyTracker, JAABA), the utility of DrosoMating would be better contextualized by comparing its performance - in terms of accuracy, speed, or identity maintenance - against these existing automated standards, rather than solely against human observation.

      Finally, the visual presentation of the data hinders the assessment of the system's temporal precision. While the system is designed to capture time-resolved metrics, the results are presented primarily as aggregate bar plots. The absence of behavioral ethograms or raster plots makes it difficult to verify the software's ability to accurately detect specific transitions, such as the exact onset of copulation.

    4. Author response:

      Thank you very much for the constructive feedback on our manuscript, "Simple Methods to Acutely Measure Multiple Timing Metrics among Sexual Repertoire of Male Drosophila," and for the opportunity to address the reviewers' comments. We appreciate the time and effort the reviewers have invested in evaluating our work, and we agree that their suggestions will significantly strengthen the manuscript.

      We are currently working diligently to address all the concerns raised in the public reviews and recommendations. Below is an outline of the major revisions we plan to implement in the revised version:

      (1) Statistical Rigor and Analysis

      We acknowledge the statistical limitations pointed out by Reviewer #2. We will re-analyze the multi-group data in Figures 3 and 4 using One-way and Two-way ANOVA with appropriate post-hoc tests (e.g., Tukey's HSD), respectively, to properly account for multiple comparisons and interaction effects between genotype and training conditions.

      (2) Comparison with Existing Tools

      As suggested by both reviewers, we will provide a detailed comparison of DrosoMating with established automated tracking systems (e.g., FlyTracker, JAABA, Ctrax),and specific use cases where DrosoMating offers distinct advantages in terms of cost, accessibility, and ease of use for high-throughput screening.

      (3) Control for Locomotor Activity

      To address the potential confound of general locomotor deficits in w1118 and y1 mutants, we will calculate and present general locomotion metrics (e.g., average velocity, total distance traveled) from our tracking data to dissociate motor defects from specific courtship deficits.

      (4) Software Capabilities and Cross-Species Applicability

      We will clarify how DrosoMating handles fly identification during mating (including occlusion management). We will also discuss or test the software's applicability across different *Drosophila* species, as requested.

      (5) Minor Corrections

      We will address all textual errors, standardize terminology (e.g., "Mating Duration" vs. "Copulation Duration"), improve figure legibility, and provide complete statistical details for all figures.

      We believe these revisions will substantially improve the rigor, clarity, and utility of our manuscript. We aim to resubmit the revised version within the standard timeframe and will ensure the preprint is updated accordingly.

    1. eLife Assessment

      This valuable study provides convincing evidence that MgdE, a conserved mycobacterial nucleomodulin, downregulates inflammatory gene transcription by interacting with the histone methyltransferase COMPASS complex and altering histone H3 lysine methylation. This work will interest microbiologists as well as cell and cancer biologists.

    2. Reviewer #1 (Public review):

      Summary:

      This fundamental study identifies a new mechanism that involves a mycobacterial nucleomodulin manipulation of the host histone methyltransferase COMPASS complex to promote infection. Although other intracellular pathogens are known to manipulate histone methylation, this is the first report demonstrating specific targeting the COMPASS complex by a pathogen. The rigorous experimental design using of state-of-the art bioinformatic analysis, protein modeling, molecular and cellular interaction and functional approaches, culminating with in vivo infection modeling provide convincing, unequivocal evidence that supports the authors claims. This work will be of particular interest to cellular microbiologist working on microbial virulence mechanisms and effectors, specifically nucleomodulins, and cell/cancer biologists that examine COMPASS dysfunction in cancer biology.

      Strengths:

      (1) The strengths of this study include the rigorous and comprehensive experimental design that involved numerous state-of-the-art approaches to identify potential nucleomodulins, define molecular nucleomodulin-host interactions, cellular nucleomodulin localization, intracellular survival, and inflammatory gene transcriptional responses, and confirmation of the inflammatory and infection phenotype in a small animal model.

      (2) The use of bioinformatic, cellular and in vivo modeling that are consistent and support the overall conclusions is a strengthen of the study. In addition, the rigorous experimental design and data analysis including the supplemental data provided, further strengthens the evidence supporting the conclusions.

      Weaknesses:

      (1) This work could be stronger if the MgdE-COMPASS subunit interactions that negatively impact COMPASS complex function were more well defined. Since the COMPASS complex consists of many enzymes, examining functional impact on each of the components would be interesting.

      (2) Examining the impact of WDR5 inhibitors on histone methylation, gene transcription and mycobacterial infection could provide additional rigor and provide useful information related to mechanisms and specific role of WDR5 inhibition on mycobacteria infection.

      (3) The interaction between MgdE and COMPASS complex subunit ASH2L is relatively undefined and studies to understand the relationship between WDR5 and ASH2L in COMPASS complex function during infection could provide interesting molecular details that are undefined in this study.

      (4) The AlphaFold prediction results for all the nuclear proteins examined could be useful. Since the interaction predictions with COMPASS subunits range from 0.77 for WDR5 and 0.47 for ASH2L, it is not clear how the focus on COMPASS complex over other nuclear proteins was determined.

      Comments on revisions:

      The authors have addressed the weaknesses that were identified by this reviewer by providing rational explanation and specific references that support the findings and conclusions.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by Chen et al addresses an important aspect of pathogenesis for mycobacterial pathogens, seeking to understand how bacterial effector proteins disrupt the host immune response. To address this question the authors sought to identify bacterial effectors from M. tuberculosis (Mtb) that localize to the host nucleus and disrupt host gene expression as a means of impairing host immune function. Their revised manuscript has strengthened their observations by performing additional experiments with BCG strains expressing tagged MgdE.

      Strengths:

      The researchers conducted a rigorous bioinformatic analysis to identify secreted effectors containing mammalian nuclear localization signal (NLS) sequences, which formed the basis of quantitative microscopy analysis to identify bacterial proteins that had nuclear targeting within human cells. The study used two complementary methods to detect protein-protein interaction: yeast two-hybrid assays and reciprocal immunoprecipitation (IP). The combined use of these techniques provides strong evidence of interactions between MgdE and SET1 components and suggests the interactions are in fact direct. The authors also carried out rigorous analysis of changes in gene expression in macrophages infected with MgdE mutant BCG. They found strong and consistent effects on key cytokines such as IL6 and CSF1/2, suggesting that nuclear-localized MgdE does in fact alter gene expression during infection of macrophages. The revised manuscript contains additional biochemical analyses of BCG strains expressing tagged MgdE that further supports their microscopy findings.

      Weaknesses:

      There are some drawbacks in this study that limit the application of the findings to M. tuberculosis (Mtb) pathogenesis. Much of the study relies on transfected/ overexpressed proteins in non-immune cells (HEK293T) or in yeast using 2-hybrid approaches, and pathogenesis is studied using the BCG vaccine strain rather than virulent Mtb. In addition, the magnitude of some of the changes they observe are quite small. However, overall the key findings of the paper - that MgdE interacts with COMPASS and alters gene expression are well-supported.

      Comments on revisions:

      The authors have performed additional experiments that have addressed several important concerns from the original manuscript and they now include an analysis of BCG strains expressing FLAG-tagged MgdE that supports their model. However here are still a few areas where the data are difficult to interpret or do not support their claims.

    4. Reviewer #3 (Public review):

      In this study, Chen L et al. systematically analyzed the mycobacterial nucleomodulins and identified MgdE as a key nucleomodulin in pathogenesis. They found that MgdE enters into host cell nucleus through two nuclear localization signals, KRIR108-111 and RLRRPR300-305, and then interacts with COMPASS complex subunits ASH2L and WDR5 to suppress H3K4 methylation-mediated transcription of pro-inflammatory cytokines, thereby promoting mycobacterial survival.

      Comments on revisions:

      The authors have adequately addressed previous concerns through additional experimentation. The revised data robustly support the main conclusions, demonstrating that MgdE engages the host COMPASS complex to suppress H3K4 methylation, thereby repressing pro-inflammatory gene expression and promoting mycobacterial survival. This work represents a significant conceptual advance.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This fundamental study identifies a new mechanism that involves a mycobacterial nucleomodulin manipulation of the host histone methyltransferase COMPASS complex to promote infection. Although other intracellular pathogens are known to manipulate histone methylation, this is the first report demonstrating the specific targeting of the COMPASS complex by a pathogen. The rigorous experimental design using state-of-the art bioinformatic analysis, protein modeling, molecular and cellular interaction, and functional approaches, culminating with in vivo infection modeling, provides convincing, unequivocal evidence that supports the authors' claims. This work will be of particular interest to cellular microbiologists working on microbial virulence mechanisms and effectors, specifically nucleomodulins, and cell/cancer biologists that examine COMPASS dysfunction in cancer biology.

      Strengths:

      (1) The strengths of this study include the rigorous and comprehensive experimental design that involved numerous state-of-the-art approaches to identify potential nucleomodulins, define molecular nucleomodulin-host interactions, cellular nucleomodulin localization, intracellular survival, and inflammatory gene transcriptional responses, and confirmation of the inflammatory and infection phenotype in a small animal model.

      (2) The use of bioinformatic, cellular, and in vivo modeling that are consistent and support the overall conclusions is a strength of the study. In addition, the rigorous experimental design and data analysis, including the supplemental data provided, further strengthen the evidence supporting the conclusions.

      Weaknesses:

      (1) This work could be stronger if the MgdE-COMPASS subunit interactions that negatively impact COMPASS complex function were better defined. Since the COMPASS complex consists of many enzymes, examining the functional impact on each of the components would be interesting.

      We thank the reviewer for this insightful comment. A biochemistry assays could be helpful to interpret the functional impact on each of the components by MgdE interaction. However, the purification of the COMPASS complex could be a hard task itself due to the complexity of the full COMPASS complex along with its dynamic structural properties and limited solubility.

      (2) Examining the impact of WDR5 inhibitors on histone methylation, gene transcription, and mycobacterial infection could provide additional rigor and provide useful information related to the mechanisms and specific role of WDR5 inhibition on mycobacterial infection.

      We thank the reviewer for the comment. A previous study showed that WIN-site inhibitors, such as compound C6, can displace WDR5 from chromatin, leading to a reduction in global H3K4me3 levels and suppression of immune-related gene expression (Hung et al., Nucleic Acids Res, 2018; Bryan et al., Nucleic Acids Res, 2020). These results closely mirror the functional effects we observed for MgdE, suggesting that MgdE may act as a functional mimic of WDR5 inhibition. This supports our proposed model in which MgdE disrupts COMPASS activity by targeting WDR5, thereby dampening host pro-inflammatory responses.

      (3) The interaction between MgdE and COMPASS complex subunit ASH2L is relatively undefined, and studies to understand the relationship between WDR5 and ASH2L in COMPASS complex function during infection could provide interesting molecular details that are undefined in this study.

      We thank the reviewer for the comment. In this study, we constructed single and multiple point mutants of MgdE at residues S<sup>80</sup>, D<sup>244</sup>, and H<sup>247</sup> to identify key amino acids involved in its interaction with ASH2L (Figure 5A and B; New Figure S4C). However, these mutations did not interrupt the interaction with MgdE, suggesting that more residues are involved in the interaction.

      ASH2L and WDR5 function cooperatively within the WRAD module to stabilize the SET domain and promote H3K4 methyltransferase activity with physiological conditions (Couture and Skiniotis, Epigenetics, 2013; Qu et al., Cell, 2018; Rahman et al., Proc Natl Acad Sci U S A, 2022). ASH2L interacts with RbBP5 via its SPRY domain, whereas WDR5 bridges MLL1 and RbBP5 through the WIN and WBM motifs (Chen et al., Cell Res, 2012; Park et al., Nat Commun, 2019). The interaction status between ASH2L and WDR5 during mycobacterial infection could not be determined in our current study.

      (4) The AlphaFold prediction results for all the nuclear proteins examined could be useful. Since the interaction predictions with COMPASS subunits range from 0.77 for WDR5 and 0.47 for ASH2L, it is not clear how the focus on COMPASS complex over other nuclear proteins was determined.

      We thank the reviewer for the comment. We employed AlphaFold to predict the interactions between MgdE and the major nuclear proteins. This screen identified several subunits of the SET1/COMPASS complex as high-confidence candidates for interaction with MgdE (Figure S4A). This result is consistent with a proteomic study by Penn et al. which reported potential interactions between MgdE and components of the human SET1/COMPASS complex based on affinity purification-mass spectrometry analysis (Penn et al., Mol Cell, 2018).

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Chen et al addresses an important aspect of pathogenesis for mycobacterial pathogens, seeking to understand how bacterial effector proteins disrupt the host immune response. To address this question, the authors sought to identify bacterial effectors from M. tuberculosis (Mtb) that localize to the host nucleus and disrupt host gene expression as a means of impairing host immune function.

      Strengths:

      The researchers conducted a rigorous bioinformatic analysis to identify secreted effectors containing mammalian nuclear localization signal (NLS) sequences, which formed the basis of quantitative microscopy analysis to identify bacterial proteins that had nuclear targeting within human cells. The study used two complementary methods to detect protein-protein interaction: yeast two-hybrid assays and reciprocal immunoprecipitation (IP). The combined use of these techniques provides strong evidence of interactions between MgdE and SET1 components and suggests that the interactions are, in fact, direct. The authors also carried out a rigorous analysis of changes in gene expression in macrophages infected with the mgdE mutant BCG. They found strong and consistent effects on key cytokines such as IL6 and CSF1/2, suggesting that nuclear-localized MgdE does, in fact, alter gene expression during infection of macrophages.

      Weaknesses:

      There are some drawbacks in this study that limit the application of the findings to M. tuberculosis (Mtb) pathogenesis. The first concern is that much of the study relies on ectopic overexpression of proteins either in transfected non-immune cells (HEK293T) or in yeast, using 2-hybrid approaches. Some of their data in 293T cells is hard to interpret, and it is unclear if the protein-protein interactions they identify occur during natural infection with mycobacteria. The second major concern is that pathogenesis is studied using the BCG vaccine strain rather than virulent Mtb. However, overall, the key findings of the paper - that MgdE interacts with SET1 and alters gene expression are well-supported.

      We thank the reviewer for the comment. We agree that the ectopic overexpression could not completely reflect a natural status, although these approaches were adopted in many similar experiments (Drerup et al., Molecular plant, 2013; Chen et al., Cell host & microbe, 2018; Ge et al., Autophagy, 2021). Further, the MgdE localization experiment using Mtb infected macrophages will be performed to increase the evidence in the natural infection.

      We agree with the reviewer that BCG strain could not fully recapitulate the pathogenicity or immunological complexity of M. tuberculosis infection. We employed BCG as a biosafe surrogate model since it was acceptable in many related studies (Wang et al., Nat Immunol, 2025; Wang et al., Nat Commun, 2017; Péan et al., Nat Commun, 2017; Li et al., J Biol Chem, 2020).

      Reviewer #3 (Public review):

      In this study, Chen L et al. systematically analyzed the mycobacterial nucleomodulins and identified MgdE as a key nucleomodulin in pathogenesis. They found that MgdE enters into host cell nucleus through two nuclear localization signals, KRIR<sup>108-111</sup> and RLRRPR<sup>300-305</sup>, and then interacts with COMPASS complex subunits ASH2L and WDR5 to suppress H3K4 methylation-mediated transcription of pro-inflammatory cytokines, thereby promoting mycobacterial survival. This study is potentially interesting, but there are several critical issues that need to be addressed to support the conclusions of the manuscript.

      (1) Figure 2: The study identified MgdE as a nucleomodulin in mycobacteria and demonstrated its nuclear translocation via dual NLS motifs. The authors examined MgdE nuclear translocation through ectopic expression in HEK293T cells, which may not reflect physiological conditions. Nuclear-cytoplasmic fractionation experiments under mycobacterial infection should be performed to determine MgdE localization.

      We thank the reviewer for this insightful comment. In the revised manuscript, we addressed this concern by performing nuclear-cytoplasmic fractionation experiments using M. bovis BCG-infected macrophages to assess the subcellular localization of MgdE (New Figure 2F) (Lines 146–155). Nuclear-cytoplasmic fractionation experiments showed that WT MgdE and the NLS single mutants (MgdE<sup>ΔNLS1</sup> and MgdE<sup>ΔNLS2</sup>) could be detected both in the cytoplasm and in the nucleus, while the double mutant MgdE<sup>ΔNLS1-2</sup> was detectable only in the cytoplasm. These findings strongly indicate that MgdE is capable of translocating into the host cell nucleus during BCG infection, and that this nuclear localization relies on the dual NLS motifs.

      (2) Figure 2F: The authors detected MgdE-EGFP using an anti-GFP antibody, but EGFP as a control was not detected in its lane. The authors should address this technical issue.

      We thank the reviewer for this question. In the revised manuscript, we have included the uncropped immunoblot images, which clearly show the EGFP band in the corresponding lane. These have been provided in the New Figure 2E.

      (3) Figure 3C-3H: The data showing that the expression of all detected genes in 24 h is comparable to that in 4 h (but not 0 h) during WT BCG infection is beyond comprehension. The issue is also present in Figure 7C, Figure 7D, and Figure S7. Moreover, since Il6, Il1β (pro-inflammatory), and Il10 (anti-inflammatory) were all upregulated upon MgdE deletion, how do the authors explain the phenomenon that MgdE deletion simultaneously enhanced these gene expressions?

      We thank the reviewer for the comment. A relative quantification method was used in our qPCR experiments to normalize the WT expression levels in Figure 3C–3H, Figure 7C, 7D, and New Figure S6.

      The concurrent induction of both types of cytokines likely represents a dynamic host strategy to fine-tune immune responses during infection. This interpretation is supported by previous studies (Podleśny-Drabiniok et al., Cell Rep, 2025; Cicchese et al., Immunological Reviews, 2018).

      (4) Figure 5: The authors confirmed the interactions between MgdE and WDR5/ASH2L. How does the interaction between MgdE and WDR5 inhibit COMPASS-dependent methyltransferase activity? Additionally, the precise MgdE-ASH2L binding interface and its functional impact on COMPASS assembly or activity require clarification.

      We thank the reviewer for this insightful comment. We cautiously speculate that the MgdE interaction inhibits COMPASS-dependent methyltransferase activity by interfering with the integrity and stability of the COMPASS complex. Accordingly, we have incorporated the following discussion into the revised manuscript (Lines 303-315):

      “The COMPASS complex facilitates H3K4 methylation through a conserved assembly mechanism involving multiple core subunits. WDR5, a central scaffolding component, interacts with RbBP5 and ASH2L to promote complex assembly and enzymatic activity (Qu et al., 2018; Wysocka et al., 2005). It also recognizes the WIN motif of methyltransferases such as MLL1, thereby anchoring them to the complex and stabilizing the ASH2L-RbBP5 dimer (Hsu et al., Cell, 2018). ASH2L further contributes to COMPASS activation by interacting with both RbBP5 and DPY30 and by stabilizing the SET domain, which is essential for efficient substrate recognition and catalysis (Qu et al., Cell, 2018; Park et al., Nat Commun, 2019). Our work shows that MgdE binds both WDR5 and ASH2L and inhibits the methyltransferase activity of the COMPASS complex. Site-directed mutagenesis revealed that residues D<sup>224</sup> and H<sup>247</sup> of MgdE are critical for WDR5 binding, as the double mutant MgdE-D<sup>224</sup>A/H<sup>247</sup>A fails to interact with WDR5 and shows diminished suppression of H3K4me3 levels (Figure 5D).”

      Regarding the precise MgdE-ASH2L binding interface, we attempted to identify the key interaction site by introducing point mutations into ASH2L. However, these mutations did not disrupt the interaction (Figure 5A and B; New Figure S4C), suggesting that more residues are involved in the interaction.

      (5) Figure 6: The authors proposed that the MgdE-regulated COMPASS complex-H3K4me3 axis suppresses pro-inflammatory responses, but the presented data do not sufficiently support this claim. H3K4me3 inhibitor should be employed to verify cytokine production during infection.

      We thank the reviewer for the comment. We have now revised the description in lines 220-221 and lines 867-868 "MgdE suppresses host inflammatory responses probably by inhibition of COMPASS complex-mediated H3K4 methylation."

      (6) There appears to be a discrepancy between the results shown in Figure S7 and its accompanying legend. The data related to inflammatory responses seem to be missing, and the data on bacterial colonization are confusing (bacterial DNA expression or CFU assay?).

      We thank the reviewer for the comment. New Figure S6 specifically addresses the effect of MgdE on bacterial colonization in the spleens of infected mice, which was assessed by quantitative PCR rather than by CFU assay.

      We have now revised the legend of New Figure S6 as below (Lines 986-991):

      “MgdE facilitates bacterial colonization in the spleens of infected mice. Bacterial colonization was assessed in splenic homogenates from infected mice (as described in Figure 7A) by quantifying bacterial DNA using quantitative PCR at 2, 14, 21, 28, and 56 days post-infection.”

      (7) Line 112-116: Please provide the original experimental data demonstrating nuclear localization of the 56 proteins harboring putative NLS motifs.

      We thank the reviewer for the comment. We will provide this data in the New Table S3.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      There are a few concerns about specific experiments:

      Major Comments:

      (1) Questions about the exact constructs used in their microscopy studies and the behavior of their controls. GFP is used as a negative control, but in the data they provide, the GFP signal is actually nuclear-localized (for example, Figure 1c, Figure 2a). Later figures do show other constructs with clear cytoplasmic localization, such as the delta-NLS-MgdE-GFP in Figure 2D. This raises significant questions about how the microscopy images were analyzed and clouds the interpretation of these findings. It is also not clear if their microscopy studies use the mature MdgE, lacking the TAT signal peptide after signal peptidase cleavage (the form that would be delivered into the host cell) or if they are transfecting the pro-protein that still has the TAT signal peptide (a form that would present in the bacterial cell but that would not be found in the host cell). This should be clarified, and if their construct still has the TAT peptide, then key findings such as nuclear localization and NLS function should be confirmed with the mature protein lacking the signal peptide.

      We thank the reviewer for this question.  EGFP protein can passively diffuse through nuclear pores due to its smaller size (Petrovic et al., Science, 2022; Yaseen et al., Nat Commun, 2015; Bhat et al., Nucleic Acids Res, 2015). However, upon transfection with EGFP-tagged wild-type MdgE and its NLS deletion mutants (MdgE<sup>ΔNLS1</sup>, MdgE<sup>ΔNLS2</sup>, and MdgE<sup>ΔNLS1-2</sup>), we observed significantly stronger nuclear fluorescence in cells expressing wild-type MdgE compared to the EGFP protein. Notably, the MdgE<sup>ΔNLS1-2</sup>-EGFP mutant showed almost no detectable nuclear fluorescence (Figure 2C, D, and E). These results indicate that (i) MdgE-EGFP fusion protein could not enter the nucleus by passive diffusion, and (ii) EGFP does not interfere with the nuclear targeting ability of MdgE.

      We did not construct a signal peptide-deleted MgdE for transfection assays. Instead, we performed an infection experiment using recombinant M. bovis BCG strains expressing Flag-tagged wild-type MgdE. The mature MgdE protein (signal peptide cleaved) can be detected in the nucleus fractionation (New Figure 2F), suggesting that the signal peptide does not play a role for the nuclear localization of MgdE.

      (2) The localization of MdgE is not shown during actual infection. The study would be greatly strengthened by an analysis of the BCG strain expressing their MdgE-FLAG construct.

      We thank the reviewer for the comment. In the revised manuscript, we constructed M. bovis BCG strains expressing FLAG-tagged wild-type MdgE as well as NLS deletion mutants (MdgE<sup>ΔNLS1</sup>, MdgE<sup>ΔNLS2</sup>, and MdgE<sup>ΔNLS1-2</sup>). These strains were used to infect THP-1 cells, and nuclear-cytoplasmic fractionation was performed 24 hours post-infection.

      Nuclear-cytoplasmic fractionation experiments showed that WT MgdE and the NLS single mutants could be detected both in the cytoplasm and in the nucleus by immunoblotting, while the double mutant MgdE<sup>ΔNLS1-2</sup> was detectable only in the cytoplasm (New Figure 2F) (Lines 146–155). These findings indicate that MdgE is capable of entering the host cell nucleus during BCG infection, and that this nuclear localization depends on the presence of both its N-terminal and C-terminal NLS motifs.

      (3) Their pathogenesis studies suggesting a role for MdgE would be greatly strengthened by studying MdgE in virulent Mtb rather than the BCG vaccine strain. If this is not possible because of technical limitations (such as lack of a BSL3 facility), then at least a thorough discussion of studies that examined Rv1075c/MdgE in Mtb is important. This would include a discussion of the phenotype observed in a previously published study examining the Mtb Rv1075c mutant that showed a minimal phenotype in mice (PMID: 31001637) and would also include a discussion of whether Rv1075c was identified in any of the several in vivo Tn-Seq studies done on Mtb.

      We thank the reviewer for this insightful comment. In the revised manuscript, we have incorporated a more thorough discussion of prior studies that examined Rv1075c/MgdE in Mtb, including the reported minimal phenotype of an Mtb MgdE mutant in mice (PMID: 31001637) (Lines 288–294).

      In the latest TnSeq studies in M. tuberculosis, Rv1075c/MgdE was not classified as essential for in vivo survival or virulence (James et al., NPJ Vaccines, 2025; Zhang et al., Cell, 2013). However, this absence should not be interpreted as evidence of dispensability since these datasets also failed to identify some well characterized virulence factors including Rv2067c (Singh et al., Nat Commun, 2023), PtpA (Qiang et al., Nat Commun, 2023), and PtpB (Chai et al., Science, 2022) which were demonstrated to be required for the virulence of Mtb.

      Minor Comments:

      (1) Multiple figures with axes with multiple discontinuities used when either using log-scale or multiple graphs is more appropriate, including 3B, 7A.

      We sincerely thank the reviewer for pointing this out. In the revised manuscript, we have updated Figure 3B and Figure 7A.

      (2) Figure 1C - Analysis of only nuclear MFI can be very misleading because it is affected by the total expression of each construct. Ratios of nuclear to cytoplasmic MFI are a more rigorous analysis.

      We thank the reviewer for this comment. We agree that analyzing the ratio of nuclear to cytoplasmic mean fluorescence intensity (MFI) provides a more rigorous quantification of nuclear localization, particularly when comparing constructs with different expression levels. However, the analysis presented in Figure 1C was intended as a preliminary qualitative screen to identify Tat/SPI-associated proteins with potential nuclear localization, rather than a detailed quantitative assessment.

      (3) Figure 5C - Controls missing and unclear interpretation of their mutant phenotype. There is no mock or empty-vector control transfection, and their immunoblot shows a massive increase in total cellular H3K4me3 signal in the bulk population, although their prior transfection data show only a small fraction of cells are expressing MdgE. They also see a massive increase in methylation in cells transfected with the inactive mutant, but the reason for this is unclear. Together, these data raise questions about the specificity of the increasing methylation they observe. An empty vector control should be included, and the phenotype of the mutant explained.

      We thank the reviewer for this comment. In the revised manuscript, we transfected HEK293T cells with an empty EGFP vector and performed a quantitative analysis of H3K4me3 levels. The results demonstrated that, at the same time point, cells expressing MdgE showed significantly lower levels of H3K4me3 compared to both the EGFP control and the catalytically inactive mutant MdgE (D<sup>244</sup>A/H<sup>247</sup>A) (New Figure 5D) (Lines 213–216). These findings support the conclusion that MdgE specifically suppresses H3K4me3 levels in cells.

      (4) Figure S1A - The secretion assay is lacking a critical control of immunoblotting a cytoplasmic bacterial protein to demonstrate that autolysis is not releasing proteins into the culture filtrate non-specifically - a common problem with secretion assays in mycobacteria.

      We thank the reviewer for this comment. To address the concerns, we examined FLAG-tagged MgdE and the secreted antigen Ag85B in the culture supernatants by monitoring the cytoplasmic protein GlpX. The absence of GlpX in the supernatant confirmed that there was no autolysis in the experiment. We could detect MgdE-Flag in the culture supernatant (New Figure S2A), indicating that MgdE is a secreted protein.

      (5) The volcano plot of their data shows that the proteins with the smallest p-values have the smallest fold-changes. This is unusual for a transcriptomic dataset and should be explained.

      We thank the reviewer for this comment. We are not sure whether the p-value is correlated with fold-change in the transcriptomic dataset. This is probably case by case.

      Reviewer #3 (Recommendations for the authors):

      There are several minor comments:

      (1) Line 104-109: The number of proteins harboring NLS motifs and candidate proteins assigned to the four distinct pathways does not match the data presented in Table S2. Please recheck the details. Figure 1A and B, as well as Figure S1A and B, should also be corrected accordingly.

      We thank the reviewer for the comment. We have carefully checked the details and the numbers were confirmed and updated.

      (2) Please add the scale bar in all image figures, including Figure 1C, Figure 2D, Figure 5C, Figure 7B, and Figure S2.

      We thank the reviewer for this suggestion. We have now added scale bars to all relevant image figures in the revised manuscript, including Figure 1C, New Figure 2C, Figure 5C, Figure 7B, and New Figure S2B.

      (3) Please add the molecular marker in all immunoblotting figures, including Figure 2C, Figure 2F, Figure 4B, Figure 4C, Figure 5B, Figure 5D, and Figure S5.

      We thank the reviewer for this suggestion. We have now added the molecular marker in all immunoblotting figures in the revised manuscript, including New Figure 2E–F, Figure 4B–C, Figure 5B and D, Figure S2A, New Figure S2E and New Figure S4C.

      References

      Bryan AF, Wang J, Howard GC, Guarnaccia AD, Woodley CM, Aho ER, Rellinger EJ, Matlock BK, Flaherty DK, Lorey SL, Chung DH, Fesik SW, Liu Q, Weissmiller AM, Tansey WP (2020) WDR5 is a conserved regulator of protein synthesis gene expression Nucleic Acids Res 48:2924-2941.

      Chai Q, Yu S, Zhong Y, Lu Z, Qiu C, Yu Y, Zhang X, Zhang Y, Lei Z, Qiang L, Li BX, Pang Y, Qiu XB, Wang J, Liu CH (2022) A bacterial phospholipid phosphatase inhibits host pyroptosis by hijacking ubiquitin Science 378(6616):eabq0132.

      Chen C, Nguyen BN, Mitchell G, Margolis SR, Ma D, Portnoy DA (2018) The listeriolysin O PEST-like sequence co-opts AP-2-mediated endocytosis to prevent plasma membrane damage during Listeria infection Cell host & microbe 23: 786-795.

      Chen Y, Cao F, Wan B, Dou Y, Lei M (2012) Structure of the SPRY domain of human Ash2L and its interactions with RbBP5 and DPY30 Cell Res 22:598–602.

      Cicchese JM, Evans S, Hult C, Joslyn LR, Wessler T, Millar JA, Marino S, Cilfone NA, Mattila JT, Linderman JJ, Kirschner DE (2018) Dynamic balance of pro‐ and anti‐inflammatory signals controls disease and limits pathology Immunological Reviews 285: 147–167.

      Couture JF, Skiniotis G (2013) Assembling a COMPASS Epigenetics 8:349-54

      Drerup MM, Schlücking K, Hashimoto K, Manishankar P, Steinhorst L, Kuchitsu K, Kudla J (2013) The calcineurin B-like calcium sensors CBL1 and CBL9 together with their interacting protein kinase CIPK26 regulate the Arabidopsis NADPH oxidase RBOHF Molecular plant 6: 559-569.

      Ge P, Lei Z, Yu Y, Lu Z, Qiang L, Chai Q, Zhang Y, Zhao D, Li B, Pang Y, Liu C, Wang J (2021) M. tuberculosis PknG Manipulates Host Autophagy Flux to Promote Pathogen Intracellular Survival Autophagy 18: 576–94.

      Hung KH, Woo YH, Lin IY, Liu CH, Wang LC, Chen HY, Chiang BL, Lin KI (2018) The KDM4A/KDM4C/NF-κB and WDR5 epigenetic cascade regulates the activation of B cells Nucleic Acids Res 46:5547–5560.

      James KS, Jain N, Witzl K, Cicchetti N, Fortune SM, Ioerger TR, Martinot AJ, Carey AF (2025) TnSeq identifies genetic requirements of Mycobacterium tuberculosis for survival under vaccine-induced immunity NPJ Vaccines 10:103.

      Li X, Chen L, Liao J, Hui J, Li W, He ZG (2020) A novel stress-inducible CmtR-ESX3-Zn²⁺ regulatory pathway essential for survival of Mycobacterium bovis under oxidative stress J Biol Chem 295:17083–17099.

      Park SH, Ayoub A, Lee YT, Xu J, Kim H, Zheng W, Zhang B, Sha L, An S, Zhang Y, Cianfrocco MA, Su M, Dou Y, Cho US (2019) Cryo-EM structure of the human MLL1 core complex bound to the nucleosome Nat Commun 10:5540.

      Penn BH, Netter Z, Johnson JR, Von Dollen J, Jang GM, Johnson T, Ohol YM, Maher C, Bell SL, Geiger K (2018) An Mtb-human protein-protein interaction map identifies a switch between host antiviral and antibacterial responses Mol Cell 71:637-648.e5.

      Petrovic S, Samanta D, Perriches T, Bley CJ, Thierbach K, Brown B, Nie S, Mobbs GW, Stevens TA, Liu X, Tomaleri GP, Schaus L, Hoelz A (2022) Architecture of the linker-scaffold in the nuclear pore Science 376: eabm9798.

      Podleśny-Drabiniok A, Romero-Molina C, Patel T, See WY, Liu Y, Marcora E, Goate AM (2025) Cytokine-induced reprogramming of human macrophages toward Alzheimer's disease-relevant molecular and cellular phenotypes in vitro Cell Rep 44:115909.

      Qiang L, Zhang Y, Lei Z, Lu Z, Tan S, Ge P, Chai Q, Zhao M, Zhang X, Li B, Pang Y, Zhang L, Liu CH, Wang J (2023) A mycobacterial effector promotes ferroptosis-dependent pathogenicity and dissemination Nat Commun 14:1430.

      Qu Q, Takahashi YH, Yang Y, Hu H, Zhang Y, Brunzelle JS, Couture JF, Shilatifard A, Skiniotis G (2018) Structure and Conformational Dynamics of a COMPASS Histone H3K4 Methyltransferase Complex Cell 174:1117-1126.e12.

      Rahman S, Hoffmann NA, Worden EJ, Smith ML, Namitz KEW, Knutson BA, Cosgrove MS, Wolberger C (2022) Multistate structures of the MLL1-WRAD complex bound to H2B-ubiquitinated nucleosome Proc Natl Acad Sci U S A 119:e2205691119.

      Sharma G, Upadhyay S, Srilalitha M, Nandicoori VK, Khosla S 2015 The interaction of mycobacterial protein Rv2966c with host chromatin is mediated through non-CpG methylation and histone H3/H4 binding Nucleic Acids Res 43:3922-37.

      Singh PR, Dadireddy V, Udupa S, Kalladi SM, Shee S, Khosla S, Rajmani RS, Singh A, Ramakumar S, Nagaraja V (2023) The Mycobacterium tuberculosis methyltransferase Rv2067c manipulates host epigenetic programming to promote its own survival Nat Commun 14:8497.

      Wang J, Ge P, Qiang L, Tian F, Zhao D, Chai Q, Zhu M, Zhou R, Meng G, Iwakura Y, Gao GF, Liu CH (2017) The mycobacterial phosphatase PtpA regulates the expression of host genes and promotes cell proliferation Nat Commun 8:244.

      Wang J, Li BX, Ge PP, Li J, Wang Q, Gao GF, Qiu XB, Liu CH (2015) Mycobacterium tuberculosis suppresses innate immunity by coopting the host ubiquitin system Nat Immunol 16:237–245

      Wysocka J, Swigut T, Milne TA, Dou Y, Zhang X, Burlingame AL, Roeder RG, Brivanlou AH, Allis CD (2005) WDR5 associates with histone H3 methylated at K4 and is essential for H3 K4 methylation and vertebrate development Cell 121:859-72.

      Yaseen I, Kaur P, Nandicoori VK, Khosla S (2015) Mycobacteria modulate host epigenetic machinery by Rv1988 methylation of a non-tail arginine of histone H3 Nat Commun 6:8922.

      Zhang L, Kent JE, Whitaker M, Young DC, Herrmann D, Aleshin AE, Ko YH, Cingolani G, Saad JS, Moody DB, Marassi FM, Ehrt S, Niederweis M (2022) A periplasmic cinched protein is required for siderophore secretion and virulence of Mycobacterium tuberculosis Nat Commun 13:2255.

      Zhang YJ, Reddy MC, Ioerger TR, Rothchild AC, Dartois V, Schuster BM, Trauner A, Wallis D, Galaviz S, Huttenhower C, Sacchettini JC, Behar SM, Rubin EJ (2013) Tryptophan biosynthesis protects mycobacteria from CD4 T-cell-mediated killing Cell 155:1296-308.

    1. eLife Assessment

      This work describes a useful computational tool for automated morphometry of dynamic organelles from microscope images. However, the supporting evidence and novelty of the manuscript as presented are incomplete and could be improved. The work will be of interest to microscopists and bioimage analysts who are non-experts but wish to improve quantitative analysis of cellular structures.

    2. Reviewer #1 (Public review):

      Summary:

      The authors develop a Python-based analysis framework for cellular organelle segmentation, feature extraction, and analysis for live-cell imaging videos. They demonstrate that their pipeline works for two organelles (mitochondria and lysosomes) and provide a step-by-step overview of the AutoMorphoTrack package.

      Strengths:

      The authors provide evidence that the package is functional and can provide publication-quality data analysis for mitochondrial and lysosomal segmentation and analysis.

      Weaknesses:

      (1) I was enthusiastic about the manuscript as a good end-to-end cell/organelle segmentation and quantification pipeline that is open-source, and is indeed useful to the field. However, I'm not certain AutoMorphoTrack fully fulfills this need. It appears to stitch together basic FIJI commands in a Python script that an experienced user can put together within a day. The paper reads as a documentation page, and the figures seem to be individual analysis outputs of a handful of images. Indeed, a recent question on the image.sc forum prompted similar types of analysis and outputs as a simple service to the community, and with seemingly better results and integrated organelle identity tracking (which is necessary in my opinion for live imaging). I believe this is a better fit in the methods section of a broader work. https://forum.image.sc/t/how-to-analysis-organelle-contact-in-fiji-with-time-series-data/116359/5.

      (2) The authors do not discuss or compare to any other pipelines that can accomplish similar analyses, such as Imaris, CellProfiler, or integrate options for segmentation, etc., such as CellPose, StarDist.

      (3) Although LLM-based chatbot integration seems to have been added for novelty, the authors do not demonstrate in the manuscript, nor provide instructions for making this easy-to-implement, given that it is directed towards users who do not code, presumably.

    3. Reviewer #2 (Public review):

      Summary:

      AutoMorphoTrack provides an end-to-end workflow for organelle-scale analysis of multichannel live-cell fluorescence microscopy image stacks. The pipeline includes organelle detection/segmentation, extraction of morphological descriptors (e.g., area, eccentricity, "circularity," solidity, aspect ratio), tracking and motility summaries (implemented via nearest-neighbor matching using cKDTree), and pixel-level overlap/colocalization metrics between two channels. The manuscript emphasizes a specific application to live imaging in neurons, demonstrated on iPSC-derived dopaminergic neuronal cultures with mitochondria in channel 0 and lysosomes in channel 1, while asserting adaptability to other organelle pairs.

      The tool is positioned for cell biologists, including users with limited programming experience, primarily through two implemented modes of use: (i) a step-by-step Jupyter notebook and (ii) a modular Python package for scripted or batch execution, alongside an additional "AI-assisted" mode that is described as enabling analyses through natural-language prompts.

      The motivation and general workflow packaging are clear, and the notebook-plus-modules structure is a reasonable engineering choice. However, in its current form, the manuscript reads more like a convenient assembly of standard methods than a validated analytical tool. Key claims about robustness, accuracy, and scope are not supported by quantitative evidence, and the 'AI-assisted' framing is insufficiently defined and attributes to the tool capabilities that are provided by external LLM platforms rather than by AutoMorphoTrack itself. In addition, several figure, metric, and statistical issues-including physically invalid plots and inconsistent metric definitions-directly undermine trust in the quantitative outputs.

      Strengths:

      (1) Clear motivation: lowering the barrier for organelle-scale quantification for users who do not routinely write custom analysis code.

      (2) Multiple entry points: an interactive notebook together with importable modules, emphasizing editable parameters rather than a fully opaque black box.

      (3) End-to-end outputs: automated generation of standardized visualizations and tables that, if trustworthy, could help users obtain quantitative summaries without assembling multiple tools.

      Weaknesses:

      (1) "AI-assisted / natural-language" functionality is overstated.

      The manuscript implies an integrated natural-language interface, but no such interface is implemented in the software. Instead, users are encouraged to use external chatbots to help generate or modify Python code or execute notebook steps. This distinction is not made clearly and risks misleading readers.

      (2) No quantitative validation against trusted ground truth.

      There is no systematic evaluation of segmentation accuracy, tracking fidelity, or interaction/overlap metrics against expert annotations or controlled synthetic data. Without such validation, accuracy, parameter sensitivity, and failure modes cannot be assessed.

      (3) Limited benchmarking and positioning relative to existing tools.

      The manuscript does not adequately compare AutoMorphoTrack to established platforms that already support segmentation, morphometrics, tracking, and colocalization (e.g., CellProfiler) or to mitochondria-focused toolboxes (e.g., MiNA, MitoGraph, Mitochondria Analyzer). This is particularly problematic given the manuscript's implicit novelty claims.

      (4) Core algorithmic components are basic and likely sensitive to imaging conditions.

      Heavy reliance on thresholding and morphological operations raises concerns about robustness across varying SNR, background heterogeneity, bleaching, and organelle density; these issues are not explored.

      (5) Multiple figure, metric, and statistical issues undermine confidence.

      The most concerning include:<br /> (i) "Circularity (4πA/P²)" values far greater than 1 (Figures 2 and 7, and supplementary figures), which is inconsistent with the stated definition and strongly suggests a metric/label mismatch or computational error.

      (ii) A displacement distribution extending to negative values (Figure 3B). This is likely a plotting artifact (e.g., KDE boundary bias), but as shown, it is physically invalid and undermines confidence in the motility analysis.

      (iii) Colocalization/overlap metrics that are inconsistently defined and named, with axis ranges and terminology that can mislead (e.g., Pearson r reported for binary masks without clarification).

      (iv) Figure legends that do not match the displayed panels, and insufficient reporting of Ns, p-values, sampling units, and statistical assumptions.

    4. Reviewer #3 (Public review):

      Summary:

      AutoMorphoTrack is a Python package for quantitatively evaluating organelle shape, movement, and colocalization in high-resolution live cell imaging experiments. It is designed to be a beginning-to-end workflow from segmentation through metric graphing, which is easy to implement. The paper shows example results from their images of mitochondria and lysosomes within cultured neurons, demonstrating how it can be used to understand organelle processing.

      Strengths:

      The text is well-written and easy to follow. I particularly appreciate tables 1 and 2, which clearly define the goals of each module, the tunable parameters, and the input and outputs. I can see how the provided metrics would be useful to other groups studying organelle dynamics. Additionally, because the code is open-source, it should be possible for experienced coders to use this as a backbone and then customize it for their own purposes.

      Weaknesses:

      Unfortunately, I was not able to install the package to test it myself using any standard install method. This is likely fixable by the authors, but until a functional distribution exists, the utility of this tool is highly limited. I would be happy to re-review this work after this is fixed.

      The authors claim that there is "AI-Assisted Execution and Natural-Language Interface". However, this is never defended in any of the figures, and from quickly reviewing the .py files, there does not seem to be any built-in support or interface for this. Without significantly more instructions on how to connect this package to a (free) LLM, along with data to prove that this works reproducibly to produce equivalent results, this section should be removed.

      Additionally, I have a few suggestions/questions:

      (1) Red-green images are difficult for colorblind readers. I recommend that the authors change all raw microscopy images to a different color combination.

      (2) For all of the velocity vs displacement graphs (Figure 3C and subpart G of every supplemental figure), there is a diagonal line clearly defining a minimum limit of detected movement. Is this a feature of the dataset (drift /shakiness /etc) or some sort of minimum movement threshold in the tracking algorithm? This should be discussed in the text.

      (3) Integrated Correlation Summary (Figure 5) - Pearson is likely the wrong metric for most of these metric pairs because even interesting relationships may be non-linear. Please replace with Spearman correlation, which is less dependent on linearity.

    5. Author response:

      Reviewer #1

      We thank the reviewer for their thoughtful and constructive assessment of AutoMorphoTrack and for recognizing its potential utility as an open-source end-to-end workflow for organelle analysis.

      (1) Novelty and relationship to existing tools / FIJI workflows

      We appreciate this concern and agree that many of the underlying image-processing operations (e.g., thresholding, morphological cleanup, region properties) are well-established. Our goal with AutoMorphoTrack is not to introduce new segmentation algorithms, but rather to provide a curated, reproducible, and extensible end-to-end workflow that integrates segmentation, morphology, tracking, motility, and colocalization into a single, transparent pipeline tailored for live-cell organelle imaging.

      While an experienced user could assemble similar analyses ad hoc using FIJI or custom scripts, our contribution lies in:

      Unifying these steps into a single workflow with consistent parameterization and outputs

      Generating standardized, publication-ready visualizations and tables at each step,

      Enabling batch and longitudinal analyses across cells and conditions, and

      Lowering the barrier for users who do not routinely write custom analysis code.

      We note that the documentation-style presentation of the manuscript is intentional, as it serves both as a methods paper and a practical reference for users implementing the workflow. We agree, however, that the manuscript currently overemphasizes step-by-step execution at the expense of positioning. In revision, we will more explicitly frame AutoMorphoTrack as a workflow integration and usability contribution, rather than a fundamentally new algorithmic advance.

      We will also cite and discuss the image.sc example referenced by the reviewer, clarifying conceptual overlap and differences in scope.

      (2) Comparison to existing pipelines (Imaris, CellProfiler, CellPose, StarDist)

      We agree and thank the reviewer for highlighting this omission. In the revised manuscript, we will expand the related-work and positioning section to explicitly compare AutoMorphoTrack with established commercial (e.g., Imaris) and open-source (e.g., CellProfiler, MiNA, MitoGraph) platforms, as well as learning-based segmentation tools such as CellPose and StarDist.

      Rather than claiming superiority, we will clarify trade-offs, emphasizing that AutoMorphoTrack prioritizes:

      Transparency and parameter interpretability,

      Lightweight dependencies suitable for small live-imaging datasets

      Direct integration of morphology, tracking, and colocalization in a single workflow, and

      Ease of modification for domain-specific use cases.

      (3) AI / chatbot integration

      We appreciate this critique and agree that the current description is insufficiently precise. AutoMorphoTrack does not implement a native natural-language interface. Instead, our intent was to convey that the workflow can be executed and modified with assistance from external large language models (LLMs) in a notebook-based environment.

      In revision, we will revise this section to:

      Clearly distinguish AutoMorphoTrack’s functionality from that of external LLM tools,

      Remove any implication of a built-in AI interface, and

      Provide concrete, reproducible examples of how non-coding users may interact with the pipeline using natural-language prompts mediated by external tools.

      Reviewer #2

      We thank the reviewer for their detailed and technically rigorous evaluation. We appreciate the recognition of the workflow’s motivation and structure, and we agree that several aspects of validation, positioning, and quantitative reporting must be strengthened.

      (1) AI-assisted / natural-language functionality

      We agree with this critique. AutoMorphoTrack does not provide a native natural-language execution layer, and the manuscript currently overstates this aspect. In revision, we will explicitly scope any reference to AI assistance as external, optional support for code generation and parameter editing, with clearly documented examples and stated limitations.

      We agree that conflating external LLM capabilities with the software itself risks misleading readers, and we will correct this accordingly.

      (2) Lack of quantitative validation

      We fully agree that the current manuscript lacks formal quantitative validation. In the revised version, we will add a dedicated validation section including:

      Segmentation accuracy compared to expert annotations using overlap metrics (e.g., Dice / IoU),

      Tracking fidelity assessed using manually annotated tracks and/or synthetic ground truth,

      Sensitivity analyses for key parameters (e.g., thresholding and linking distance), and

      Explicit discussion of failure modes and quality-control indicators.

      We acknowledge that without such validation, claims of robustness are not sufficiently supported.

      (3) Benchmarking and positioning relative to existing tools

      We agree and will substantially strengthen AutoMorphoTrack’s benchmarking and positioning relative to existing platforms. Rather than framing novelty algorithmically, we will clarify that the primary contribution is a reproducible, integrated workflow designed specifically for two-organelle live imaging in neurons, with transparent parameters and standardized outputs.

      We note that our goal is not to exhaustively benchmark against all available tools, but rather to provide representative comparisons that clarify operating regimes, assumptions, and trade-offs. We will add a comparative table and/or qualitative comparison highlighting strengths, assumptions, and limitations relative to existing tools.

      (4) Core algorithms and robustness

      We agree that reliance on threshold-based segmentation introduces sensitivity to imaging conditions. In revision, we will:

      Explicitly discuss the operating regime and assumptions under which AutoMorphoTrack performs reliably,

      Clarify that the framework is modular and can accept alternative segmentation backends, and

      Include guidance on when outputs should be treated with caution.

      (5) Figure, metric, and statistical issues

      We thank the reviewer for identifying several critical issues and agree that these undermine confidence. In revision, we will correct all figure, metric-definition, and reporting inconsistencies, including:

      Resolving circularity values exceeding 1 by correcting computation and/or labeling errors,

      Revising physically invalid displacement plots and clarifying kernel-density limitations,

      Ensuring colocalization metrics are consistently defined, named, and interpreted, with explicit clarification of whether calculations are intensity- or mask-based,

      Correcting figure legends to match displayed panels, and

      Clearly reporting sample size, sampling units, and statistical assumptions, including handling of multiple comparisons where applicable.

      (6) Value-added demonstration

      We agree that the manuscript would benefit from a clearer demonstration of value-added use cases. In revision, we will include at least one realistic example showing how AutoMorphoTrack enables a complete, reproducible analysis workflow with reduced setup burden compared to manually assembling multiple tools.

      (7) Editorial suggestions

      We agree and will streamline the Results section to reduce procedural repetition and focus more on validation, limitations, and quality-control guidance.

      Reviewer #3

      We thank the reviewer for their positive assessment of clarity and organization, and for the constructive practical feedback.

      Installation issues

      We appreciate the detailed report of installation failures and acknowledge that the current packaging and distribution are inadequate. Prior to revision, we will:

      Fix the package structure to support standard installation methods,

      Ensure all required files (e.g., setup configuration, README) are correctly included,

      Test installation on clean environments across platforms, and

      Correct broken links to notebooks and documentation.

      We agree that without a functional installation pathway, the utility of the tool is severely limited.

      AI-assisted claims

      We agree with the reviewer and echo our responses above. The AI-assisted description will be clarified and appropriately scoped in the revised manuscript.

      Additional suggestions

      Color accessibility: We will revise all figures to use colorblind-safe palettes.

      Velocity–displacement diagonal: We will explicitly explain the origin of this relationship, including whether it reflects dataset properties, tracking assumptions, or minimum detectable motion.

      Integrated correlation metric: We agree that Spearman correlation is more appropriate for many of these relationships and will replace Pearson correlations accordingly.

      Supplementary movies: We agree that providing raw movies would improve interpretability and will add representative examples as supplementary material.

    1. eLife Assessment

      This important study by Bartas and colleagues examined how patterns of monosynaptic input to specific cell types in the ventral tegmental area are altered by drugs of abuse. The authors applied a dimensionality reduction approach (principal component analysis) and showed that various drugs of abuse, and somewhat surprisingly the anesthesia alone (ketamine/xylasin), caused changes in the distribution of inputs labeled by the transsynaptic rabies virus. The evidence supporting the conclusions is overall convincing and provides foundational information, as well as a cautionary note on the interpretation of rabies virus-based tracing experiments.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, the authors mapped afferent inputs to distinct cell populations in the ventral tegmental area (VTA) using dimensionality reduction techniques, revealing markedly different connectivity patterns under normal versus drug-treated conditions. They further showed that drug-induced changes in inputs were negatively correlated with the expression of ion channels and proteins involved in synaptic transmission. Functional validation demonstrated that knockdown of a specific voltage-gated calcium channel led to reduced afferent inputs, highlighting a causal link between gene expression and connectivity.

      The authors have clearly addressed the reviewers' previous comments. The study's earlier weaknesses were thoroughly discussed, and additional data were provided to strengthen the findings. Overall, the revised version incorporates more extensive datasets and analyses, resulting in a more robust and compelling study.

    3. Reviewer #2 (Public review):

      The application of rabies virus (RabV)-mediated transsynaptic tracing has been widely utilized for mapping cell-type-specific neural connectivities and examining potential modifications in response to biological phenomena or pharmacological interventions. Despite the predominant focus of studies on quantifying and analyzing labeling patterns within individual brain regions based on labeling abundance, such an approach may inadvertently overlook systemic alterations. There exists a considerable opportunity to integrate RabV tracing data with the global connectivity patterns and the transcriptomic signatures of labeled brain regions. In the present study, the authors take an important step towards achieving these objectives.

      Specifically, the authors conducted an intensive reanalysis of a previously generated large dataset of RabV tracing to the ventral tegmental area (VTA) using dimension reduction methods such as PCA and UMPA. This reaffirmed the authors's earlier conclusion that different cell types in the VTA, namely dopamine neurons (DA) and GABAergic neurons, exhibit quantitatively distinct input patterns, and a single dose of addictive drugs, such as cocaine and morphine, induced altered labeling patterns. Additionally, the authors illustrate that distinct axes of PCA can discriminate experimental variations, such as minor differences in the injection site of viral tracers, from bona fide alterations in labeling patterns caused by drugs of abuse. While the specific mechanisms underlying altered labeling in most brain regions remain unclear, whether involving synaptic strength, synaptic numbers, pre-synaptic activities, or other factors, the present study underscores the efficacy of an informatics approach in extracting more comprehensive information from the RabV-based circuit mapping data.

      Moreover, the authors showcased the utility of their previously devised bulk gene expression patterns inferred by the Allen Gene Expression Atlas (AGEA) and "projection portrait" derived from bulk axon mapping data sourced from the Allen Mouse Brain Connectivity Atlas. The utilization of such bulk data rests upon several limitations. For instance, the collection of axon mapping data involves an arbitrary selection of both cell type-specific and non-specific data, which might overlook crucial presynaptic partners, and often includes contamination from neighboring undesired brain regions. Concerns arise regarding the quantitativeness of AGEA, which may also include the potential oversight of key presynaptic partners. Nevertheless, the authors conscientiously acknowledged these potential limitations associated with the dataset.

      Notably, building on the observation of a positive correlation between the basal expression levels of Ca2+ channels and the extent of drug-induced changes in RabV labeling patterns, the authors conducted a CRISPRi-based knockdown of a single Ca2+ channel gene. This intervention resulted in a reduction of RabV labeling, supporting that the observed gene expression patterns have causality in RabV labeling efficiency. While a more nuanced discussion is necessary for interpreting this result (see below), overall I commend the authors for their efforts to leverage the existing dataset in a more meaningful way. This endeavor has the potential to contribute significantly to our understanding of the mechanisms underlying alterations in RabV labeling induced by drugs of abuse.

      Finally, drawing upon the aforementioned reanalysis of previous data, the authors underscored that a single administration of ketamine/xylazine anesthesia could induce enduring modifications in RabV labeling patterns for VTA DA neurons, specifically those projecting to the nucleus accumbens and amygdala. Given the potential impact of such alterations on motivational behaviors at a broader level, I fully agree that prudent consideration is warranted when employing ketamine/xylazine for the investigation of motivational behaviors in mice.

      Comments on revisions:

      In the re-revised version, the authors have addressed all of my previous comments. I no longer have any major concerns.

    4. Reviewer #3 (Public review):

      Summary:

      Authors mapped monosynaptic inputs to dopamine, GABA, and glutamate neurons in the ventral tegmental area (VTA) under different anesthesia methods, and under drug (cocaine, morphine, methamphetamine, amphetamine, nicotine, fluoxetine). First, they propose an analysis method to separate the actual manipulation effects from the variability caused by experimental procedures. Using this method, they found differences in the anatomical location of monosynaptic inputs to dopamine neurons under different conditions, and identified some key brain areas for such separation. They also searched the database for gene expression patterns that are common across input brain areas, with some changes by anesthesia or drug administration.

      Strengths:

      The whole-brain approach to address drug effects is appealing, and their conclusion is clear. The methodology and motivation are clearly explained.

      Weaknesses:

      While gene expression analyses may not be related to their findings on the anatomical effects of drugs, this is a nice starting point for follow-up studies.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1(Public review):

      Summary:

      In this study, the authors distinguished afferent inputs to different cell populations in the VTA using dimensionality reduction approaches and found significantly distinct patterns between normal and drug treatment conditions. They also demonstrated negative correlations of the inputs induced by drugs with gene expression of ion channels or proteins involved in synaptic transmission and demonstrated the knockdown of one of the voltage-gated calcium ion channels caused decreased inputs.

      Weaknesses:

      (1) For quantifications of brain regions in this study, boundaries were based on the Franklin-Paxinos (FP) atlas according to previous studies (Beier KT et al 2015, Beier KT et al 2019). It has been reported significant discrepancies exist between the anatomical labels on the FP atlas and the Allen Brain Atlas (ref: Chon U et al., Nat Commun 2019). Although a summary of conversion is provided as a sheet, the authors need to describe how consistent or different the brain boundaries they defined in the manuscript with Allen Brain Atlas by adding histology images. Also, I wonder how reliable the annotations were for over a hundred of animals with manual quantification. The authors should briefly explain it rather than citing previous studies in the Material and Methods Section.

      We thank the reviewer for attention to this point; indeed, neuroanatomical detail is often overlooked in modern neuroscience, occasionally leading to spurious conclusions. We acknowledge that there are significant discrepancies in brain region definitions across atlases, which can make cross-study comparisons difficult. Here, all cells were manually quantified by Dr. Kevin Beier, as in previous studies (Beier et al., Cell 2015; Nature 2017; Cell Reports 2019; Tian et al., Cell Reports 2022; Tian et al., Neuron 2024; Hubbard et al., Neuropsychopharmacology, 2025). As such, these studies are internally consistent as relates to the definition of brain regions, which is critical here since our analysis in this manuscript relates to data quantified only by a single individual. Several brain regions were quite easy to distinguish anatomically, such as the medial habenula and lateral habenula. Others, such as the extended amygdala area, are much more difficult. We have now provided example images in Figure S1 that detail the anatomical boundaries that we used, overlayed on images of Neurotrace blue (fluorescent Nissl stain).

      (2) Regarding the ellipsoids in the PC, although it's written in the manuscript that "Ellipsoids were centered at the average coordinate of a condition and stretched one standard deviation along the primary and secondary axes", it's intuitively hard to understand in some figures such as Figure 2O, P and Figure S1. The authors need to make their data analysis methods more accessible by providing source code to the public.

      The source code is now available to the public at https://github.com/ktbartas/Bartas_et_al_eLife_2024, which is noted in the Code Availability statement. The code for generating ellipsoids is in the first notebook, `0-dataexploration-master-euclidean.ipynb`, in the function `confidence_ellipse`, which is called from `make_pca_plots` and `umap_and_heatmap`. Example plots are all live in the notebooks as can be viewed directly from GitHub.

      (3) In histology images (Figure 1B and 3K), the authors need to add dashed lines or arrows to guide the reader's attention.

      Dashed lines have been added to these figure panels as requested.

      (4) In Figure 2A and G, apparently there are significant differences in other brain regions such as NAcMed or PBN. If they are also statistically significant, the authors should note them as well and draw asterisks(*).

      We appreciate the care in ensuring that statistics are being applied and shown appropriately. In panel A (now Figure 3A), the Two-way ANOVA interaction term was not significant (p = 0.9365), we did not find it justified to do further comparisons. However, for Figure 3G, the interaction term was significant (p = 0.0001), and thus further pairwise comparisons were performed with Sidak's correction for multiple comparisons. When done, the only two brain regions that were significantly different were the DStr (p = 0.0051) and GPe (p = 0.0036). While the NAcMed and PBN visually look different, according to the corrected statistics, they were not significantly different (NAcMed p = 0.5037, PBN p = 0.8123). The notations in our original figure thus accurately reflected these statistics.

      (5) In Figure 2N about the spatial distribution of starter cells, the authors need to add histology images for each experimental condition (i.e. saline, fluoxetine, cocaine, methamphetamine, amphetamine, nicotine, and morphine) as supplement figures

      We have now provided these as Figure S2.

      (6) In the manuscript, it is necessary to explain why Cacna1e was selected among other calcium ion channels.

      We have added a sentence to the "Functional validation of link between gene expression and RABV labeling" section (lines 722-724).

      Reviewer #2 (Public review):

      The application of rabies virus (RabV)-mediated transsynaptic tracing has been widely utilized for mapping celltype-specific neural connectivities and examining potential modifications in response to biological phenomena or pharmacological interventions. Despite the predominant focus of studies on quantifying and analyzing labeling patterns within individual brain regions based on labeling abundance, such an approach may inadvertently overlook systemic alterations. There exists a considerable opportunity to integrate RabV tracing data with the global connectivity patterns and the transcriptomic signatures of labeled brain regions. In the present study, the authors take an important step towards achieving these objectives. Specifically, the authors conducted an intensive reanalysis of a previously generated large dataset of RabV tracing to the ventral tegmental area (VTA) using dimension reduction methods such as PCA and UMPA. This reaffirmed the authors' earlier conclusion that different cell types in the VTA, namely dopamine neurons (DA) and GABAergic neurons, exhibit quantitatively distinct input patterns, and a single dose of addictive drugs, such as cocaine and morphine, induced altered labeling patterns. Additionally, the authors illustrate that distinct axes of PCA can discriminate experimental variations, such as minor differences in the injection site of viral tracers, from bona fide alternations in labeling patterns caused by drugs of abuse. While the specific mechanisms underlying altered labeling in most brain regions remain unclear, whether involving synaptic strength, synaptic numbers, pre-synaptic activities, or other factors, the present study underscores the efficacy of an informatics approach in extracting more comprehensive information from the RabV-based circuit mapping data. Moreover, the authors showcased the utility of their previously devised bulk gene expression patterns inferred by the Allen Gene Expression Atlas (AGEA) and "projection portrait" derived from bulk axon mapping data sourced from the Allen Mouse Brain Connectivity Atlas. The utilization of such bulk data rests upon several limitations. For instance, the collection of axon mapping data involves an arbitrary selection of both cell type-specific and non-specific data, which might overlook crucial presynaptic partners, and often includes contamination from neighboring undesired brain regions. Concerns arise regarding the quantitativeness of AGEA, which may also include the potential oversight of key presynaptic partners. Nevertheless, the authors conscientiously acknowledged these potential limitations associated with the dataset. Notably, building on the observation of a positive correlation between the basal expression levels of Ca2+ channels and the extent of drug-induced changes in RabV labeling patterns, the authors conducted a CRISPRi-based knockdown of a single Ca2+ channel gene. This intervention resulted in a reduction of RabV labeling, supporting that the observed gene expression patterns have causality in RabV labeling efficiency. While a more nuanced discussion is necessary for interpreting this result (see below), overall I commend the authors for their efforts to leverage the existing dataset in a more meaningful way. This endeavor has the potential to contribute significantly to our understanding of the mechanisms underlying alterations in RabV labeling induced by drugs of abuse. Finally, drawing upon the aforementioned reanalysis of previous data, the authors underscored that a single administration of ketamine/xylazine anesthesia could induce enduring modifications in RabV labeling patterns for VTA DA neurons, specifically those projecting to the nucleus accumbens and amygdala. Given the potential impact of such alterations on motivational behaviors at a broader level, I fully agree that prudent consideration is warranted when employing ketamine/xylazine for the investigation of motivational behaviors in mice.

      Specific Points:

      (1) Beyond advancements in bioinformatics, readers may find it insightful to explore whether the PCA/UMPAbased approach yields novel biological insights. For example, the authors are encouraged to discuss more functional implications of PBN and LH in the context of drugs of abuse, as their labeling abundance could elucidate the PC2 axis in Fig. 2M.

      Thank you for this suggestion: we added text (Lines 787-795) discussing the LH and PBN (and GPe) specifically, but also highlighted the importance of our approach in hypothesis-generating science.

      (2) While I appreciate the experimental data on Cacna1e knockdown, I am unclear about the rationale behind specifically focusing on Cacna1e. The logic behind the statement, "This means that expression of this gene is not inhibitory towards RABV transmission," is also unclear. Loss-of-function experiments only signify the necessity or permissive functions of a gene. In this context, Cacna1e expression levels are required for efficient RabV labeling, but this neither supports nor excludes the possibility that this gene expression instructively suppresses RabV labeling/transmission, which could be assessed through gain-of-function experiments.

      We thank the reviewer for their suggestions regarding this result, and agree that a gain-of-function would be required to provide clearer evidence on this point.  We therefore understand that our original phrasing may be misleading. Thus, we have edited this section to the more conservative statement: “These results indicate that reduced levels of Cacna1e likely lower the number of RABV-labeled inputs from the NAcLat, and directly link the levels of Cacna1e and RABV input labeling” (lines 742-744) - we refrain from over-interpreting the results. As mentioned above in response to R1, we added a sentence to explain the rationale behind focusing on Cacna1e (lines 722-724).

      Reviewer #3 (Public Review):

      Summary:

      Authors mapped monosynaptic inputs to dopamine, GABA, and glutamate neurons in VTA under different anesthesia methods, and under drugs (cocaine, morphine, methamphetamine, amphetamine, nicotine, fluoxetine). They found that input patterns under different conditions are separated, and identified some key brain areas to contribute to such separation. They also searched a database for gene expression patterns that are common across input brain areas with some changes by anesthesia or drug administration.

      Strengths:

      The whole-brain approach to address drug effects is appealing and their conclusion is clear. The methodology and motivation are clearly explained.

      Weaknesses:

      While gene expression analyses may not be related to their findings on the anatomical effects of drugs, this will be a nice starting point for follow-up studies. 

      We understand and agree with the suggestion that gene expression allows us to provide correlative observations between in situ hybridization datasets and rabies mapping datasets, and that these results do not show causality. As such, future studies would be needed to assess this in more detail. We have added a line in the discussion to this effect (lines 851-853).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Recommendations for improving the writing and presentation:

      (1) There are a couple of packages available for 3D whole-brain reconstructions based on Allen Brain Atlas (eg. https://github.com/tractatus/wholebrain, https://github.com/lahammond/BrainJ), which would be helpful to align with the gene expression or other data from Allen Institute.

      This comment is related to the noted weakness we responded to previously in this rebuttal also from R1 (see comment 1), about the discrepancies between the Franklin-Paxinos atlas and Allen Brain atlas. We agree that a systematic comparison of these two atlases using a tool like wholebrain or BrainJ would be valuable for the field. However, it would be a substantial amount of work, and likely would be an independent study in itself. We believe that the resolution of these atlases was sufficient to make our key conclusions here (e.g., identify gene expression patterns that relate to drug-induced changes rabies virus labeling patterns, and develop a testable hypothesis for CRISPR-based gene editing). They are also based on the same atlases and region definitions that have been applied in our previous studies (e.g., Beier et al., Cell 2015; Beier et al., Nature 2017; Beier et al., Cell Reports 2019; Tian et al., Cell Reports 2022; Tian et al., Neuron 2024; Hubbard et al., Neuropsychophamacology 2025, etc.)  The expression of Cacna1e is relatively consistent across the NAc, as we have now detailed in Figure S13.

      (2) There are so far two kinds of rabies virus strains available in the neuroscience field (SAD-B19 or CVS-N2c). It is recommended to describe which strain was used in the Material and Methods Section because labeling efficiency and toxicity is quite different between the strains (Reardon TR et al., Neuron 2016).

      We have now noted that we used SAD B19 for all experiments (Lines 141-142).

      Minor corrections to the text and figures:

      (1)  In Figure 1A, the color differences are not clear (i.e. light gray and dark gray). The figure can be simplified.

      In addition, generally, images/figures are recommended not to be overlapped with other figures/images (Figures 2A-F, 2G-L).

      (2)  In Figures 7C and D, the authors could add enlarged views of starter cells in VTA and NAcLat.

      We have attempted to simplify schematics and figures throughout. High-magnification images of cells have been added as insets in what is now Figure 10 (formerly Figure 7).

      Reviewer #2 (Recommendations For the authors):

      The number of animals for each graph should be explicated within the figure legend. For example, Figure 1C and Figure 7E lack this information. It is also advisable to delineate the definition of error bars within the figure legend.

      We have now added mouse numbers to all figures and/or legends, as appropriate. We also indicated in the legend at the end of Figure 1 how error bars and asterisks are defined. Furthermore, we added a sentence to the methods saying that in UMAP and PCA plots each dot is an animal (lines 244-245).

      The visual representations, particularly in Figures 1 and 3, are overcrowding. Furthermore, the arrangement of figure subpanels does not consistently adhere to the sequence of explication in the main text, significantly compromising the readability of the text. The authors are encouraged to consider the possibility of segmenting dense figures into two if there exists no upper limit for the number of figure displays. To illustrate, in Figure 3Q, crucial details about experimental conditions are denoted by numerical references, owing to spatial constraints.

      We agree that the figure layout and mis-alignment with a linear read of the text was unideal. Therefore, we broke our figures, especially the original Figures 1-4, into multiple sub-figures, including both main and supplemental figures. This facilitated the use of space to rearrange the figure panels, allowing the story to be told in a linear fashion. All figures and panels should now be read in order.

      I am seeking clarification on how to interpret the term "overlap" at the bottom of figures illustrating Gene Ontology analysis.

      We have clarified the meaning of overlap in this context (lines 324-325): The ‘overlap’ term on the x-axis of these plots means the number of genes in the correlated gene lists that were also within the list of genes for the corresponding GO term.

      The authors could provide Cacna1e gene expression patterns within the NAc from the AGEA data.

      Cacna1e expression data are now provided in Figure S13.

      Additionally, the meaning of "controls" in Figure 7F, along with the "No gRNA" condition, remains ambiguous. While the text mentions "no shRNA", the involvement of shRNA in this experiment lacks clarity.

      We now clarify that the control conditions are based on previously published data where no AAVs were injected into NAcLat. This is now clarified in the legend for Figure 10F (lines 1277-1578). We also corrected “shRNA” to “gRNA” in the text.

    1. eLife Assessment

      This important work shows that corticotrophin-releasing factor is delivered monosynaptically to dorsal striatal cholinergic interneurons from the central amygdala and bed nucleus of the stria terminalis. CRF increases cholinergic interneuron firing and release of acetylcholine, and this action is attenuated by pre-exposure to ethanol, suggesting a potential role in stress- and alcohol use disorders. This revision addressed prior concerns, presented convincing evidence supporting the conclusions, and set the stage for additional studies.

    2. Reviewer #1 (Public review):

      Summary:

      The authors show that corticotropin-releasing factor (CRF) neurons in the central amygdala (CeA) and bed nucleus of the stria terminalis (BNST) monosynaptically target cholinergic interneurons (CINs) in the dorsal striatum of rodents. Functionally, activation of CRFR1 receptors increases CIN firing rate, and this modulation was reduced by pre-exposure to ethanol. This is an interesting finding, with potential significance for alcohol use disorders.

      Strengths:

      Well-conceived circuit mapping experiments identify a novel pathway by which the CeA and BNST can modulate dorsal striatal function by controlling cholinergic tone. Important insight into how CRF, a neuropeptide that is important in mediating aspects of stress, affective/motivational processes and drug-seeking, modulates dorsal striatal function.

      Weaknesses:

      (1) Tracing and expression experiments were performed both in mice and rats (often in non-overlapping ways). While these species are similar in many ways, differences do exist. The authors address this important point in their final text.

      (2) As the authors point out, CRF likely modulates CIN activity in both direct and indirect ways. As justified, exploration of the network-level modulation of CINs by CRF (and how these processes may interact with direct modulation via CRFR1 on CINs) is left for future studies.

    3. Reviewer #2 (Public review):

      Summary:

      Essoh and colleagues present a thorough and elegant study identifying the central amygdala and BNST as key sources of CRF input to the dorsal striatum. Using monosynaptic rabies tracing and electrophysiology, they show direct connections to cholinergic interneurons. The study builds on previous findings that CRF increases CIN firing, extending them by measuring acetylcholine levels in slices and applying optogenetic stimulation of CRF+ fibers. It also uncovers a novel interaction between alcohol and CRF signaling in the striatum, likely to spark significant interest and future research.

      Strengths:

      A key strength is the integration of anatomical and functional approaches to demonstrate these projections and assess their impact on target cells, striatal cholinergic interneurons.

      Comments on revisions:

      No further concerns or recommendations.

    4. Reviewer #3 (Public review):

      Summary:

      The authors demonstrate that CRF neurons in the extended amygdala form GABAergic synapses on to cholinergic interneurons and that CRF can excite these neurons. The evidence is strong, however the authors lack to make a compelling connection showing CRF released from these extended amygdala neurons is mediating any of these effects. Further, they show that acute alcohol appears to modulate this action, although the effect size is not particularly robust.

      Strengths:

      This is an exciting connection from the extended amygdala to the striatum that provides a new direction for how these regions can modulate behavior. The work is rigorous and well done.

      Weaknesses:

      The effects of acute ethanol are modest but consistent, the potential role of this has yet to be determined. Further, the opto stim experiments are conducted in an ai32 mouse, so it is impossible to determine if that is from CEA and BNST, vs. another population of CRF containing neurons. This is an important caveat that was acknowledged.

    5. Author response:

      The following is the authors’ response to the original reviews

      We appreciate the reviewers’ insightful comments. In response, we conducted three new experiments, summarized in Author response table 1. After the table, we provide detailed responses to each comment.

      Author response table 1.

      Summary of new experiments and results.

      Reviewer #1 (Public review):

      The authors show that corticotropin-releasing factor (CRF) neurons in the central amygdala (CeA) and bed nucleus of the stria terminalis (BNST) monosynaptically target cholinergic interneurons (CINs) in the dorsal striatum of rodents. Functionally, activation of CRFR1 receptors increases CIN firing rate, and this modulation was reduced by pre-exposure to ethanol. This is an interesting finding, with potential significance for alcohol use disorders, but some conclusions could use additional support.

      Strengths:

      Well-conceived circuit mapping experiments identify a novel pathway by which the CeA and BNST can modulate dorsal striatal function by controlling cholinergic tone. Important insight into how CRF, a neuropeptide that is important in mediating aspects of stress, affective/motivational processes, and drug-seeking, modulates dorsal striatal function.

      Weaknesses:

      (1) Tracing and expression experiments were performed both in mice and rats (in a mostly nonoverlapping way). While these species are similar in many ways, some conclusions are based on assumptions of similarities that the presented data do not directly show. In most cases, this should be addressed in the text (but see point number 2).

      In the revised manuscript, we have clarified this limitation in the first paragraph of the Methods and the third paragraph of the Discussion and avoid cross-species claims, limiting our conclusions to the species in which each assay was performed. Specifically, we now state that while mice and rats share many conserved amygdalostriatal components, our tracing and expression studies were performed in a species-specific manner, and direct cross-species comparisons of CRF–CIN connectivity and CRFR1 expression were not assessed. We further note that future studies will be needed to determine the extent to which these observations are conserved across species as more tools become available.

      (2) Experiments in rats show that CRFR1 expression is largely confined to a subpopulation of striatal CINs. Is this true in mice, too? Since most electrophysiological experiments are done in various synaptic antagonists and/or TTX, it does not affect the interpretation of those data, but non-CIN expression of CRFR1 could potentially have a large impact on bath CRF-induced acetylcholine release.

      To address whether CRFR1 expression in striatal CINs is conserved across species, we performed new histological experiments using CRFR1-GFP mice. Striatal sections were immunostained with anti-ChAT, and we found that approximately 10% of CINs express CRFR1 (new Fig. 4D, 4E). This result indicates that, similar to rats, a subset of CINs in mice express CRFR1. However, the proportion of CRFR1<sup>+</sup> CINs is lower than the proportion of CRF-responsive CINs observed during electrophysiology experiments, suggesting that CRF may also modulate CIN activity indirectly through network or synaptic mechanisms. We have also noted in the revised Discussion that while CRFR1 expression is confirmed in a subset of CINs, the broader distribution of CRFR1 among other striatal cell types remains to be determined (third paragraph of Discussion).

      In our study, bath application of CRF increased striatal ACh release. Because striatal ACh is released primarily from CINs, and CRFR1 is an excitatory receptor, this effect is most likely mediated by CRF activation of CRFR1 on CINs, leading to enhanced CIN activity and ACh release. Although CRFR1 may also be expressed on other striatal neurons, these cell types—medium spiny neurons and GABAergic interneurons—are inhibitory. If CRF were to activate CRFR1 on these GABAergic neurons, the resulting increase in GABA release would suppress CIN activity and consequently reduce, rather than enhance, ACh release. Given that most CINs responded functionally while only a small subset expressed CRFR1, these findings imply that indirect mechanisms, such as CRF modulation of local circuits influencing CIN excitability, may also contribute to the observed increase in ACh release. Together, these data support a model in which CRF primarily enhances ACh release via activation of CRFR1-expressing CINs, while indirect network effects may further amplify this response.

      (3) Experiments in rats show that about 30% of CINs express CRFR1 in rats. Did only a similar percentage of CINs in mice respond to bath application of CRF? The effect sizes and error bars in Figure 5 imply that the majority of recorded CINs likely responded. Were exclusion criteria used in these experiments?

      We thank the reviewer for this insightful question. In our mouse cell-attached recordings, ~80% of CINs increased firing during CRF bath application, and all recorded cells were included in the analysis (no exclusions based on response direction/magnitude; cells were only required to meet standard recording-quality criteria such as stable baseline firing and seal).

      Using a CRFR1-GFP reporter mouse, we found that ~10% of striatal CINs are GFP+, suggesting that the high proportion of CRF-responsive CINs cannot be explained solely by somatic reporter-labeled CRFR1 expression. Importantly, the CRF-induced increase in CIN firing is blocked by the selective CRFR1 antagonist NBI 35695 (Fig. 5B–C), supporting a CRFR1-dependent mechanism at the circuit level. We now discuss several non-mutually exclusive explanations for this apparent discrepancy: (i) reporter lines (e.g., CRFR1-GFP) may underestimate functional CRFR1 expression, particularly for low-level or compartmentalized receptor pools; (ii) bath-applied CRF may act indirectly via CRFR1 on presynaptic afferents, thereby enhancing excitatory drive onto CINs; and (iii) electrical coupling among CINs could allow direct effects in a subset of CINs to propagate through the CIN network (Ren, Liu et al. 2021). We added this discussion to the revised manuscript (fourth paragraph of the Discussion).

      (4) The conclusion that prior acute alcohol exposure reduces the ability of subsequent alcohol exposure to suppress CIN activity in the presence of CRF may be a bit overstated. In Figure 6D (no ethanol preexposure), ethanol does not fully suppress CIN firing rate to baseline after CRF exposure. The attenuated effect of CRF on CIN firing rate after ethanol pre-treatment (6E) may just reduce the maximum potential effect that ethanol can have on firing rate after CRF, due to a lowered starting point. It is possible that the lack of significant effect of ethanol after CRF in pre-treated mice is an issue of experimental sensitivity. Related to this point, does pre-treatment with ethanol reduce the later CIN response to acute ethanol application (in the absence of CRF)?

      In the revised manuscript, we have tempered our interpretation in the final Results section and throughout the Discussion to emphasize that ethanol pre-exposure attenuates, rather than abolishes, the CRFinduced increase in CIN firing. We also note the reviewer’s important point that in Figure 6D, ethanol does not fully suppress firing to baseline after CRF exposure, consistent with a partial effect. Regarding the reviewer’s question, our experiments were specifically designed to test interactions between CRF and ethanol, so we did not assess whether ethanol pre-treatment alters subsequent responses to ethanol alone. We now explicitly acknowledge CRF-dependent and CRF-independent effects of ethanol on CIN activity as an important point for future studies to disentangle (sixth paragraph of the Discussion). For example, comparing ethanol responses with and without prior ethanol without any treatment with CRF could resolve this question.

      (5) More details about the area of the dorsal striatum being examined would be helpful (i.e., a-p axis).

      We now provide more detail regarding the anterior–posterior axis of the dorsal striatum examined. Most recordings and imaging were performed in the posterior dorsomedial striatum (pDMS), corresponding to coronal slices posterior to the crossing of the anterior commissure and anterior to the tail of the striatum (starting around 0.62 mm and ending at −1.3 mm relative to the Bregma). While our primary focus was on posterior slices, some anterior slices were included to increase the sample size. These details have been added to the Methods (Last sentence of the ‘Histology and cell counting’ section and of the ‘Slice electrophysiology’ section).

      Reviewer #2 (Public review):

      Essoh and colleagues present a thorough and elegant study identifying the central amygdala and BNST as key sources of CRF input to the dorsal striatum. Using monosynaptic rabies tracing and electrophysiology, they show direct connections to cholinergic interneurons. The study builds on previous findings that CRF increases CIN firing, extending them by measuring acetylcholine levels in slices and applying optogenetic stimulation of CRF+ fibers. It also uncovers a novel interaction between alcohol and CRF signaling in the striatum, likely to spark significant interest and future research.

      Strengths:

      A key strength is the integration of anatomical and functional approaches to demonstrate these projections and assess their impact on target cells, striatal cholinergic interneurons.

      Weaknesses:

      (1) The nature of the interaction between alcohol and CRF actions on cholinergic neurons remains unclear. Also, further clarification of the ACh sensor used and others is required

      We have clarified the nature of the interaction between alcohol and CRF signaling in CINs and have provided additional details regarding the acetylcholine sensor used. These issues are addressed in detail in our responses to the specific comments below.

      Reviewer #2 (Recommendations for the authors):

      (1) The interaction between the effects of alcohol and CRF is a novel and important part of this study. When considering possible mechanisms underlying the findings in the discussion, there is no mention of occlusion. Given that incubation with alcohol produced a similar increase in firing of CINs as CRF, occlusion could be a parsimonious explanation for the observed interaction. Have the author considered blocking the effects of alcohol on CIN with CRF-R1 antagonist? Another experiment that could address the occlusion would be to test if alcohol also increases ACh levels as it did CRF.

      We thank the reviewer for proposing occlusion as a potential mechanism underlying the interaction between alcohol and CRF. We agree that, in principle, alcohol-induced endogenous CRF release could occlude subsequent exogenous CRF-mediated potentiation of CIN firing, and we carefully considered this possibility.

      However, several observations from our data argue against occlusion driven by acute alcohol exposure or withdrawal in this preparation. First, as shown in Fig. 6A, bath application of alcohol transiently reduced CIN firing, and firing recovered to baseline levels after washout without any rebound increase. Second, in Fig. 6D–E, the baseline firing rates under control conditions and following alcohol pretreatment were comparable, indicating that acute alcohol exposure and short-term withdrawal did not produce a sustained increase in CIN excitability. Together, these results suggest that acute withdrawal in slices is less likely to trigger substantial endogenous CRF release capable of occluding subsequent exogenous CRF effects.

      While we and others have previously reported increased spontaneous CIN firing following prolonged in vivo alcohol exposure and extended withdrawal periods (e.g., 21 days), short-term withdrawal (e.g., 1 day) does not robustly alter baseline CIN firing (Ma, Huang et al. 2021, Huang, Chen et al. 2024). Consistent with these prior findings, the absence of a rebound or elevated baseline firing in the present slice experiments discouraged further pursuit of an endogenous CRF occlusion mechanism under acute conditions.

      We also considered experimentally testing occlusion by blocking CRFR1 signaling during alcohol pre-treatment. However, this approach is technically challenging in slice recordings, as CRFR1 antagonists require prolonged incubation (~1 hour) during alcohol exposure. Because it is unclear whether endogenous CRF release is triggered by alcohol incubation itself or by withdrawal, the antagonist would need to remain present throughout both the incubation and withdrawal periods. This leaves insufficient time for complete washout of the CRFR1 antagonist prior to subsequent bath application of exogenous CRF to assess its effects on CIN firing. Consequently, residual antagonist presence would confound the interpretation of the exogenous CRF response.

      Finally, regarding the possibility that alcohol increases acetylcholine release, we did not observe alcohol-induced increases in CIN firing in slices, arguing against elevated ACh signaling under these conditions. Consistent with prior work (Ma, Huang et al. 2021, Huang, Chen et al. 2024), alcohol-induced increases in CIN excitability and cholinergic signaling appear to depend on prolonged in vivo exposure and extended withdrawal rather than acute slice-level manipulations.

      We have now incorporated discussion of occlusion as a potential mechanism (seventh paragraph) and clarified why our data and technical considerations argue against it in the present study. We thank the reviewer for this wonderful suggestion, which we will test in future in vivo studies.

      (2) Retrograde monosynaptic tracing of inputs to CIN. Results state the finding of labeling in all previously reported area..." Can the authors report these areas? A list in the text or a bar plot, if there is quantification, will suffice. This formation will serve as important validation and replication of previous findings.

      We thank the reviewer for this constructive suggestion. We agree that summarizing the anatomical sources of CIN input provides important validation of our tracing results. In the revised Results, we now list the major input regions observed, including the striatum itself, cortex (e.g., cingulate cortex, motor cortex, somatosensory cortex), thalamus (e.g., parafascicular thalamic nucleus, centrolateral thalamic nucleus), globus pallidus, and midbrain (first paragraph of the Results). Quantitative analysis of relative input strength will be presented in a separate study that expands on these findings. Here, we limit the current manuscript to the functional characterization of CRF and alcohol modulation of CINs.

      (3) Given the difference in connectivity among striatal subregions, it would be important to describe in more detail the injection site in the results and figures. In the figure, for example, you might want to include the AP coordinates, given that it is such a zoomed-in image, it is hard to tell how anterior/posterior the site is. I imagine that the picture is a representative image of the injection site, but maybe having a side image with overlay of injection sites in all the animals used, would help.

      The anterior–posterior (AP) coordinates for representative images have been included in the panels and reiterated more clearly in the revised Results section and figure legends. In the legend for Figure 3B, a list of AP coordinates for each animal used for Figure 3A-3E has been added.

      (4) Figure 1D inset, there seem to be some double-labeled cells in the zoomed in BNST images. The authors might want to comment on this. It seemed far from the injection site. Do D1-MSN so far away show connectivity to CINs?

      Upon closer inspection of the BNST images, we noted a small number of double-labeled cells were indeed present, consistent with prior reports that a subset of D1R-expressing neurons (~10%) has been reported previously in our lab in the BNST, with the majority being D2R-expressing neurons (Lu, Cheng et al. 2021). Given the BNST’s anatomical proximity to the dorsal striatum, it is plausible that some D1Rexpressing neurons in this region provide monosynaptic input to CINs, highlighting a potential ventral-to-dorsal connection that merits further study.

      (5) Can the author provide quantification of the onset delay of the optogenetic evoked CRF+ axon responses onto CINs? The claim of monosynaptic connectivity is well supported by the TTX/4AP experiment but additional information on the timing will strengthen that conclusion.

      We thank the reviewer for this insightful suggestion. Quantifying the onset latency of optogenetically evoked CRFMsup+</sup> axon responses onto CINs provides valuable confirmation of monosynaptic connectivity. To address this, we performed new latency measurements under the same recording conditions as the TTX/4-AP experiments. The average onset latency from the start of the optical stimulation was 5.85 ± 0.37 ms (new Figure 3J), consistent with direct monosynaptic transmission.

      As an additional reference, we analyzed latency data from a separate project in which we optogenetically stimulated cholinergic interneurons and recorded synaptic responses in medium spiny neurons. This circuit, known to involve disynaptic transmission from CINs to MSNs via nAChR-expressing interneurons (Autor response image 1) (English, Ibanez-Sandoval et al. 2011), exhibited a significantly longer latency (18.34 ± 0.70 ms; t<sub>(29)</sub> = 10.3, p < 0.001) compared to CRF⁺ CeA/BNST inputs to CINs (5.85 ± 0.37 ms). Together, these results further support that CRF⁺ axons form direct functional synapses onto CINs.

      Author response image 1.

      Latency of disynaptic transmission from CINs to MSNs via interneurons A) Schematic illustrating optogenetic stimulation of Chrimson-expressing CINs, leading to excitation of nAChRexpressing interneurons that release GABA onto recorded MSNs. B) Sample trace of disynaptic transmission (left) and bar graph summarizing onset latency (right) from light stimulation to synaptic response onset (n = 23 neurons from 3 mice).

      (6) The ACh sensor reported is "AAV-GRABACh4m" but the reference is for GRAB-ACh3.0. Also, BrainVTA has GRAB-ACh4.3. Is this the vector? Could you please check the name of the construct and report the corresponding reference, as well as clarify the meaning of the additional "m". They have a mutant version of the GRAB-ACH that researchers use for control, and of course, you want to use it as a control, but not for the test experiment.

      GRAB-ACh4m is the correct acetylcholine sensor used in this study. The ACh4 series (including ACh4h, ACh4m, and ACh4l; personal communication with Dr. Yulong Li’s lab) represents an updated generation following GRAB-ACh3.0. Although the ACh4 family has not yet been formally published, these constructs are publicly available through BrainVTA (https://www.brainvta.tech/plus/view.php?aid=2680).

      The suffix “m” does not indicate a mutant control; rather, it denotes a medium-affinity variant within the ACh4 sensor family. Importantly, the mutant (non-responsive) control sensor is only available for GRAB-ACh3.0 (ACh3.0mut) and does not exist for the ACh4 series.

      Our laboratory has previously used GRAB-ACh4m in multiple peer-reviewed publications (Huang, Chen et al. 2024, Gangal, Iannucci et al. 2025, Purvines, Gangal et al. 2025), and its use has also been reported by independent groups in recent preprints (Potjer, Wu et al. 2025, Touponse, Pomrenze et al. 2025). We have now clarified the construct name, its relationship to GRAB-ACh3.0, in the Methods ‘Reagents’ section, and we have corrected the reference accordingly.

      (7) Are CRF-R1+ CINs equally abundant in the DMS and DLS? From the image in Figure 4, it seems that a larger percentage of CINs are CRFR1+ in the DLS than in DMS. Is this true? The authors probably already have this data, or it should be easy to get, and it could be additional information that was not studied before.

      We did not perform a quantitative comparison of CRFR1+ CIN abundance between the DMS and DLS in the present study. While the representative images in Figure 4 may appear to suggest regional differences, these panels were selected to illustrate labeling quality rather than relative density and should not be interpreted as evidence of unequal distribution. We have clarified this point in the revised Discussion (last sentence of the third paragraph) and note that future studies will be needed to systematically evaluate potential regional differences in CRFR1 expression, which could have important implications for dorsal striatal function.

      (8) The manuscript states several times that there are no CRF+ neurons in the dorsal striatum. At the same time, there are reports of the CRF+ neuron in the ventral striatum and its role in learning. Could the authors include mention of the studies by the Lemos group (10.1016/j.biopsych.2024.08.006)

      We have revised the Discussion section to clarify that our findings pertain specifically to the dorsal striatum and now acknowledge the presence and functional relevance of CRF+ neurons in the ventral striatum, citing the Lemos group’s study (fifth paragraph of the Discussion).

      (9) For the histology analysis, please express cell counts as "density", not just number of cells, by providing an area (e.g., "number of cell/ µm2").

      In the revised manuscript, all histological outcomes have been recalculated as cell density (cells/mm<sup>2</sup>) by normalizing raw cell counts to the measured area of each region of interest (ROI). Figures that previously displayed absolute counts now present densities (cells/mm<sup>2</sup>), with corresponding updates made to figure legends and text. We note one exception in Figure 4B, where the comparison between the total number of CINs and CRFR1+ CINs is best represented as cell counts rather than normalized values, as the counting was conducted in the same area (within the same ROI) of the dorsostriatal subregion.

      (10) Figure 2C, we can see there are some labeled fibers in the striatum cut. Would it be possible to get a better confocal image?

      Figure 2C has been replaced with a higher-quality confocal image captured at the same magnification and scale. The updated image provides improved clarity and resolution, ensuring accurate visualization of labeled CRF+ fibers, but not cell bodies, within the striatum.

      (11) The ACh measurements in the slice are very informative and an important addition. I first thought that these experiments with the GRAB-ACh sensor were performed in ChAT-eGFP mice. After reading more carefully, I realized they were done in wild-type mice. Would you include the wildtype label in the figure as well? The ChATeGFP BAC transgenic line was reported to have enhanced ACh packaging and increased ACh release, which could have magnified the signals. So, it is important to highlight the experiments were done in wildtype mice.

      We now label with ‘WT mice’ and note in the legend that all GRAB-ACh experiments were performed in wild-type mice, not ChAT-eGFP, to avoid confounds in ACh release. We thank the reviewer for this important suggestion.

      Reviewer #3 (Public review):

      The authors demonstrate that CRF neurons in the extended amygdala form GABAergic synapses onto cholinergic interneurons and that CRF can excite these neurons. The evidence is strong, however, the authors fail to make a compelling connection showing CRF released from these extended amygdala neurons is mediating any of these effects. Further, they show that acute alcohol appears to modulate this action, although the effect size is not particularly robust.

      Strengths:

      This is an exciting connection from the extended amygdala to the striatum that provides a new direction for how these regions can modulate behavior. The work is rigorous and well done.

      Weaknesses:

      (1) While the authors show that opto stim of these neurons can increase firing, this is not shown to be CRFR1 dependent. In addition, the effects of acute ethanol are not particularly robust or rigorously evaluated. Further, the opto stim experiments are conducted in an Ai32 mouse, so it is impossible to determine if that is from CEA and BNST, vs. another population of CRF-containing neurons. This is an important caveat.

      We added recordings with the CRFR1 antagonist antalarmin. Light-evoked increases in CIN firing were abolished under CRFR1 blockade, linking the effect to CRFR1 (Figure 5J, 5K). We also clarify that CRFCre;Ai32 does not isolate CeA versus BNST sources, so we temper regional claims and highlight this as a limitation. The acute ethanol effects are modest but consistent; we expanded the discussion of dose and preparation constraints in acute slice physiology and note that in vivo studies will be needed to define the network-level impact.

      Reviewer #3 (Recommendations for the authors):

      (1) The authors could bring some of this data together by examining CRFR1 dependence of optical stimulationinduced increases in firing. Further, the authors have devoted significant effort to exploring how the BNST and CEA project to the CIN, yet their ephys does not explore site-specific infusion of ChR2 into either region. How are we to be sure it is not some other population of CRF neurons mediating this effect? The alcohol data does not appear particularly robust, but I think if the authors wanted to, they could explore other concentrations. Mostly I think it is important to discuss the limitations of acute alcohol on 5a brain slice.

      We thank the reviewer for these thoughtful comments, which helped us strengthen the mechanistic interpretation of the CRF-CIN interaction. In the revised manuscript, we have addressed each point as follows:

      - CRFR1 dependence of optogenetically evoked responses: We performed new recordings in which optogenetic stimulation of CRF⁺ terminals in the dorsal striatum was conducted in the presence of the CRFR1 antagonist antalarmin. The increase in CIN firing evoked by light stimulation was abolished under CRFR1 blockade, confirming that this effect is mediated through CRFR1 activation (new Figure 5J, 5K, third paragraph of the corresponding Result section). These results directly link the functional effects of CRF⁺ terminal activation to CRFR1 signaling on CINs.

      - CeA vs. BNST projection specificity: The reviewer is correct that CeA and BNST projections were not analyzed separately. As unknown pathways, our experiment was designed to first establish the monosynaptic connections between CeA/BNST CRF neurons to striatal CINs. Future studies would further explore the specific contribution of each site. However, our data exclude the possibility of other CRF neurons as we selectively infused Cre-dependent opsins into both CeA and BNST of CRF-Cre mice (Figure 3G-3J).

      - Limitations of acute slice experiments: We have expanded the Discussion (sixth paragraph) to acknowledge that acute slice physiology cannot fully capture the dynamic and network-level effects of ethanol observed in vivo. While this preparation enables mechanistic precision, factors such as washout, diffusion constraints, and the absence of systemic feedback may underestimate ethanol’s impact on CINs. We now explicitly note this limitation and highlight the need for in vivo studies to examine behavioral and circuit-level implications of CRF–alcohol interactions.

      Collectively, these revisions clarify the CRFR1 dependence of CRF<sup>+</sup> terminal effects and reaffirm that both CeA and BNST projections contribute to CIN modulation while addressing the methodological limitations of the slice preparation.

      Reviewer #4 Public Review):

      This manuscript presents a compelling and methodologically rigorous investigation into how corticotropin-releasing factor (CRF) modulates cholinergic interneurons (CINs) in the dorsal striatum - a brain region central to cognitive flexibility and action selection-and how this circuit is disrupted by alcohol exposure. Through an integrated series of anatomical, optogenetic, electrophysiological, and imaging experiments, the authors uncover a previously uncharacterized CRF⁺ projection from the central amygdala (CeA) and bed nucleus of the stria terminalis (BNST) to dorsal striatal CINs.

      Strengths:

      Key strengths of the study include the use of state-of-the-art monosynaptic rabies tracing, CRF-Cre transgenic models, CRFR1 reporter lines, and functional validation of synaptic connectivity and neurotransmitter release. The finding that CRF enhances CIN excitability and acetylcholine (ACh) release via CRFR1, and that this effect is attenuated by acute alcohol exposure and withdrawal, provides important mechanistic insight into how stress and alcohol interact to impair striatal function. These results position CRF signaling in CINs as a novel contributor to alcohol use disorder (AUD) pathophysiology, with implications for relapse vulnerability and cognitive inflexibility associated with chronic alcohol intake. The study is well-structured, with a clear rationale, thorough methodology, and logical progression of results. The discussion effectively contextualizes the findings within broader addiction neuroscience literature and suggests meaningful future directions, including therapeutic targeting of CRFR1 signaling in the dorsal striatum.

      Weaknesses:

      (1) Minor areas for improvement include occasional redundancy in phrasing, slightly overlong descriptions in the abstract and significance sections, and a need for more concise language in some places. Nevertheless, these do not detract from the manuscript's overall quality or impact. Overall, this is a highly valuable contribution to the fields of addiction neuroscience and striatal circuit function, offering novel insights into stress-alcohol interactions at the cellular and circuit level, which requires minor editorial revisions.

      We have streamlined the abstract and significance statement, reduced redundancy, and improved conciseness throughout the text. We appreciate the reviewer’s feedback, which has helped us further strengthen the clarity and readability of the manuscript.

      Reviewer #4 (Recommendations for the authors):

      (1) Line 29-30: Slightly verbose. Consider: "Alcohol relapse is associated with corticotropin-releasing factor (CRF) signaling and altered reward pathway function, though the precise mechanisms are unclear."

      The sentence has been revised as recommended to improve clarity and conciseness in the introductory section (Lines 31-32).

      (2) Lines 39-43: Good synthesis, but could better emphasize the novelty of identifying a CRF-CIN pathway.

      The abstract has been revised to more clearly emphasize the novelty of identifying a CRF-CIN pathway and its functional significance (Line 42-43).

      (3) Lines 66-68: Consider integrating clinical relevance more directly, e.g., "AUD affects over 14 million adults in the U.S., with relapse often triggered by stress...".

      The introduction has been revised to more directly emphasize the clinical relevance of alcohol use disorder, including its high prevalence and the role of stress in relapse, thereby underscoring the translational significance of our findings (Lines 68-69).

      (4) Line 83: Repetition of "goal-directed learning, habit formation, and behavioral flexibility" appears multiple times; consider variety.

      We have varied the phrasing in the Introduction to avoid redundancy. Specifically, in place of repeating “goal-directed learning, habit formation, and behavioral flexibility,” we now use alternative terms such as “action selection,” “habitual responding,” and “cognitive flexibility,” depending on the context.

      (5) Lines 107-116: Clarify why both rats and mice were used-do they serve different experimental purposes?

      We now explain that each species was used for complementary experimental purposes. Rats were used for histological validation of CRFR1 expression using the CRFR1-Cre-tdTomato line, which has been extensively characterized in this species. Mice were used for the majority of electrophysiological, optogenetic, and GRAB-ACh sensor experiments due to the availability of well-established transgenic CRF-Cre-driver lines. This division allowed us to leverage the most appropriate tools in each species to address different aspects of the study. We have clarified this rationale in the Methods (first paragraph of the “Animals” section) and Discussion (third paragraph).

      (6) Electrophysiology section: The distinction between acute exposure vs. withdrawal could be further emphasized.

      To better highlight the distinction between acute alcohol exposure and withdrawal, we have clarified the timing and context of each condition within the Results section for Figure 6. Specifically, we now distinguish the immediate suppressive effects of alcohol observed during bath application (acute exposure) from the subsequent changes in CIN firing measured after washout (withdrawal). These revisions clarify the temporal dynamics and functional implications of CRF–alcohol interactions in our experimental design.

      (7) Lines 227-229: Reword for clarity: "Significantly more BNST neurons projected to CINs compared to the CeA...".

      The sentence has been reworded to clarify as recommended (Lines 247-248).

      (8) Lines 373-374: Consider connecting the CRF-CIN circuit to behavioral inflexibility in AUD more directly.

      We have modified the sentence (Lines 390-395) to more explicitly link alcohol-induced dysregulation of the CRF–CIN circuit to behavioral inflexibility in AUD, consistent with the established role of CINs in action selection and cognitive flexibility.

      (9) Lines 387-389: This is an excellent point about stress resilience; consider expanding with examples or potential implications.

      We thank the reviewer for this insightful suggestion. In the revised Discussion (sixth paragraph), we expanded this section to more directly connect alcohol-induced disruption of CRF–CIN signaling with impaired stress resilience and behavioral inflexibility. Specifically, we now note that such dysregulation may compromise stress resilience mechanisms mediated by CRF–cholinergic interactions in the striatum and related corticostriatal circuits. We further discuss how impaired CIN responsiveness could blunt adaptive behavioral adjustments under stress, biasing animals toward habitual or compulsive alcohol seeking. This addition highlights the broader implication that alcohol-induced alterations in CRF–CIN signaling may contribute to relapse vulnerability by undermining adaptive stress coping.

      References

      English, D. F., O. Ibanez-Sandoval, E. Stark, F. Tecuapetla, G. Buzsaki, K. Deisseroth, J. M. Tepper and T. Koos (2011). "GABAergic circuits mediate the reinforcement-related signals of striatal cholinergic interneurons." Nat Neurosci 15(1): 123–130.

      Gangal, H., J. Iannucci, Y. Huang, R. Chen, W. Purvines, W. T. Davis, A. Rivera, G. Johnson, X. Xie, S. Mukherjee, V. Vierkant, K. Mims, K. O'Neill, X. Wang, L. A. Shapiro and J. Wang (2025). "Traumatic brain injury exacerbates alcohol consumption and neuroinflammation with decline in cognition and cholinergic activity." Transl Psychiatry 15(1): 403.

      Huang, Z., R. Chen, M. Ho, X. Xie, H. Gangal, X. Wang and J. Wang (2024). "Dynamic responses of striatal cholinergic interneurons control behavioral flexibility." Sci Adv 10(51): eadn2446.

      Lu, J. Y., Y. F. Cheng, X. Y. Xie, K. Woodson, J. Bonifacio, E. Disney, B. Barbee, X. H. Wang, M. Zaidi and J. Wang (2021). "Whole-Brain Mapping of Direct Inputs to Dopamine D1 and D2 Receptor-Expressing Medium Spiny Neurons in the Posterior Dorsomedial Striatum." Eneuro 8(1).

      Ma, T., Z. Huang, X. Xie, Y. Cheng, X. Zhuang, M. J. Childs, H. Gangal, X. Wang, L. N. Smith, R. J. Smith, Y. Zhou and J. Wang (2021). "Chronic alcohol drinking persistently suppresses thalamostriatal excitation of cholinergic neurons to impair cognitive flexibility." J Clin Invest 132(4): e154969.

      Potjer, E. V., X. Wu, A. N. Kane and J. G. Parker (2025). "Parkinsonian striatal acetylcholine dynamics are refractory to L-DOPA treatment." bioRxiv.

      Purvines, W., H. Gangal, X. Xie, J. Ramos, X. Wang, R. Miranda and J. Wang (2025). "Perinatal and prenatal alcohol exposure impairs striatal cholinergic function and cognitive flexibility in adult offspring." Neuropharmacology 279: 110627.

      Ren, Y., Y. Liu and M. Luo (2021). "Gap Junctions Between Striatal D1 Neurons and Cholinergic Interneurons." Front Cell Neurosci 15: 674399.

      Touponse, G. C., M. B. Pomrenze, T. Yassine, V. Mehta, N. Denomme, Z. Zhang, R. C. Malenka and N. Eshel (2025). "Cholinergic modulation of dopamine release drives effortful behavior." bioRxiv.

    1. eLife Assessment

      The authors investigate how dominance hierarchy shapes defensive strategies in mice under two naturalistic threats: a transient visual looming stimulus and a sustained live rat. This study provides important insights into how social context and dominance hierarchy modulate innate defensive behaviors across distinct naturalistic threats. The strength of evidence is convincing, with detailed classification and analysis of behaviors.

    2. Reviewer #1 (Public review):

      Summary:

      This study presents an interesting behavioral paradigm and reveals interactive effects of social hierarchy and threat type on defensive behaviors. However, addressing the aforementioned points regarding methodological detail, rigor in behavioral classification, depth of result interpretation, and focus of the discussion is essential to strengthen the reliability and impact of the conclusions in a revised manuscript.

      Strengths:

      The paper is logically sound, featuring detailed classification and analysis of behaviors, with a focus on behavioral categories and transitions, thereby establishing a relatively robust research framework.

      Weaknesses:

      Several points require clarification or further revision.

      (1) Methods and Terminology Regarding Social Hierarchy:

      The study uses the tube test to determine subordinate status, but the methodological description is quite brief. Please provide a more detailed account of the experimental procedure and the criteria used for determination.

      The dominance hierarchy is established based on pairs of mice. However, the use of terms like "group cohesion" - typically applied to larger groups - to describe dyadic interactions seems overstated. Please revise the terminology to more accurately reflect the pairwise experimental setup.

      (2) Criteria and Validity of Behavioral Classification:

      The criteria for classifying mouse behaviors (e.g., passive defense, active defense) are not sufficiently clear. Please explicitly state the operational definitions and distinguishing features for each behavioral category.

      How was the meaningfulness and distinctness of these behavioral categories ensured to avoid overlap? For instance, based on Figure 3E, is "active defense" synonymous with "investigative defense," involving movement to the near region followed by return to the far region? This requires clearer delineation.

      The current analysis focuses on a few core behaviors, while other recorded behaviors appear less relevant. Please clarify the principles for selecting or categorizing all recorded behaviors.

      (3) Interpretation of Key Findings and Mechanistic Insights:

      Looming exposure increased the proportion of proactive bouts in the dominant zone but decreased it in the subordinate zone (Figure 4G), with a similar trend during rat exposure. Please provide a potential explanation for this consistent pattern. Does this consistency arise from shared neural mechanisms, or do different behavioral strategies converge to produce similar outputs under both threats?

      (4) Support for Claims and Study Limitations:

      The manuscript states that this work addresses a gap by showing defensive responses are jointly shaped by threat type and social rank, emphasizing survival-critical behaviors over fear or stress alone. However, it is possible that the behavioral differences stem from varying degrees of danger perception rather than purely strategic choices. This warrants a clear description and a deeper discussion to address this possibility.

      The Discussion section proposes numerous brain regions potentially involved in fear and social regulation. As this is a behavioral study, the extensive speculation on specific neural circuitry involvement, without supporting neuroscience data, appears insufficiently grounded and somewhat vague. It is recommended to focus the discussion more on the implications of the behavioral findings themselves or to explicitly frame these neural hypotheses as directions for future research.

    3. Reviewer #2 (Public review):

      Summary:

      The authors investigate how dominance hierarchy shapes defensive strategies in mice under two naturalistic threats: a transient visual looming stimulus and a sustained live rat. By comparing single versus paired testing, they report that social presence attenuates fear and that dominant and subordinate mice exhibit different patterns of defensive and social behaviors depending on threat type. The work provides a rich behavioral dataset and a potentially useful framework for studying hierarchical modulation of innate fear.

      Strengths:

      (1) The study uses two ecologically meaningful threat paradigms, allowing comparison across transient and sustained threat contexts.

      (2) Behavioral quantification is detailed, with manual annotation of multiple behavior types and transition-matrix level analysis.

      (3) The comparison of dominant versus subordinate pairs is novel in the context of innate fear.

      (4) The manuscript is well-organized and clearly written.

      (5) Figures are visually informative and support major claims.

      Weaknesses:

      Lack of neural mechanism insights.

    4. Reviewer #3 (Public review):

      Summary:

      This study examines how dominance hierarchy influences innate defensive behaviors in pair-housed male mice exposed to two types of naturalistic threats: a transient looming stimulus and a sustained live rat. The authors show that social presence reduces fear-related behaviors and promotes active defense, with dominant mice benefiting more prominently. They also demonstrate that threat exposure reinforces social roles and increases group cohesion. The work highlights the bidirectional interaction between social structure and defensive behavior.

      Strengths:

      This study makes a valuable contribution to behavioral neuroscience through its well-designed examination of socially modulated fear. A key strength is the use of two ethologically relevant threat paradigms - a transient looming stimulus and a sustained live predator, enabling a nuanced comparison of defensive behaviors. The experimental design is robust, systematically comparing animals tested alone versus with their cage mate to cleanly isolate social effects. The behavioral analysis is sophisticated, employing detailed transition maps that reveal how social context reshapes behavioral sequences, going beyond simple duration measurements. The finding that social modulation is rank-dependent adds significant depth, linking social hierarchy to adaptive defense strategies. Furthermore, the demonstration that threat exposure reciprocally enhances social cohesion provides a compelling systems-level perspective. Together, these elements establish a strong behavioral framework for future investigations into the neural circuits underlying socially modulated innate fear.

      Weaknesses:

      The study exhibits several limitations. The neural mechanism proposed is speculative, as the study provides no causal evidence.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary: 

      This study presents an interesting behavioral paradigm and reveals interactive effects of social hierarchy and threat type on defensive behaviors. However, addressing the aforementioned points regarding methodological detail, rigor in behavioral classification, depth of result interpretation, and focus of the discussion is essential to strengthen the reliability and impact of the conclusions in a revised manuscript. 

      Strengths: 

      The paper is logically sound, featuring detailed classification and analysis of behaviors, with a focus on behavioral categories and transitions, thereby establishing a relatively robust research framework. 

      Weaknesses: 

      Several points require clarification or further revision. 

      (1) Methods and Terminology Regarding Social Hierarchy: 

      The study uses the tube test to determine subordinate status, but the methodological description is quite brief. Please provide a more detailed account of the experimental procedure and the criteria used for determination. 

      We will add more details about how the tube test was performed in the revised manuscript.

      The dominance hierarchy is established based on pairs of mice. However, the use of terms like "group cohesion" - typically applied to larger groups - to describe dyadic interactions seems overstated. Please revise the terminology to more accurately reflect the pairwise experimental setup.

      Thanks for the comment. We agree that the term “group cohesion” can be misleading and will replace it with “social engagement”.

      (2) Criteria and Validity of Behavioral Classification: 

      The criteria for classifying mouse behaviors (e.g., passive defense, active defense) are not sufficiently clear. Please explicitly state the operational definitions and distinguishing features for each behavioral category. 

      Passive defense was defined as an immobility-based defensive strategy characterized by suppression of locomotor activity. This category included freezing and tail rattling, which in our study involved minimal body displacement aside from rapid tail vibration. Active defense was defined as movement- or posture-dependent defensive strategy, encompassing behaviors that involved locomotor engagement or spatial repositioning relative to the threat, including approach, investigation, withdrawal, and stretch-attend. We will clarify this in the revised manuscript.

      How was the meaningfulness and distinctness of these behavioral categories ensured to avoid overlap? For instance, based on Figure 3E, is "active defense" synonymous with "investigative defense," involving movement to the near region followed by return to the far region? This requires clearer delineation. 

      Defensive behaviors in the rat exposure paradigm were grouped into two categories: passive and active defense, each comprising distinct behaviors. All the manually annotated behaviors were mutually exclusive; that is, each video frame was assigned a single behavioral label to avoid overlap across behaviors. Active defense includes four behaviors: approach, investigation, withdrawal, and stretch-attend. We will clarify these points in the revised manuscript.

      The current analysis focuses on a few core behaviors, while other recorded behaviors appear less relevant. Please clarify the principles for selecting or categorizing all recorded behaviors.

      Thank you for pointing this out. In the current study, we focused primarily on defensive and social behaviors. We also included several neutral solitary behaviors related to anxiety and defensive state, such as sniffing, grooming, and rearing, which were consistently expressed across animals and closely linked to our main findings. We will clarify this rationale in the revised manuscript.

      (3) Interpretation of Key Findings and Mechanistic Insights:

      Looming exposure increased the proportion of proactive bouts in the dominant zone but decreased it in the subordinate zone (Figure 4G), with a similar trend during rat exposure. Please provide a potential explanation for this consistent pattern. Does this consistency arise from shared neural mechanisms, or do different behavioral strategies converge to produce similar outputs under both threats?

      Thanks for bringing up this important question. The consistent increase in proactive bouts in dominant mice across both paradigms suggests a consistent rank-dependent reorganization of dyadic interaction under threats. We propose that this convergence reflects a shared neural mechanism that links defensive state with social-rank information, potentially mediated by overlapping hypothalamic and prefrontal circuits. We will expand the Discussion to incorporate this explanation.

      (4) Support for Claims and Study Limitations:

      The manuscript states that this work addresses a gap by showing defensive responses are jointly shaped by threat type and social rank, emphasizing survival-critical behaviors over fear or stress alone. However, it is possible that the behavioral differences stem from varying degrees of danger perception rather than purely strategic choices. This warrants a clear description and a deeper discussion to address this possibility.

      We thank the reviewer for this insightful comment. We agree that, in principle, behavioral differences could arise from variations in perceived danger rather than strategic choice. In humans, decisions can sometimes reflect value-based strategies that override perceived danger. In contrast, under naturalistic threat conditions, mice likely rely predominantly on danger perception to make behavioral decisions, and such responses are expected to be consistent with value-based strategies shaped by natural selection. In the revised manuscript, we will expand the Discussion to address the role of threat perception and its relationship to decision-making in our behavioral paradigms.

      The Discussion section proposes numerous brain regions potentially involved in fear and social regulation. As this is a behavioral study, the extensive speculation on specific neural circuitry involvement, without supporting neuroscience data, appears insufficiently grounded and somewhat vague. It is recommended to focus the discussion more on the implications of the behavioral findings themselves or to explicitly frame these neural hypotheses as directions for future research.

      We will revise the Discussion to focus more directly on behavioral findings and add explicit neural hypotheses as potential future directions.

      Reviewer #2 (Public review):

      Summary:

      The authors investigate how dominance hierarchy shapes defensive strategies in mice under two naturalistic threats: a transient visual looming stimulus and a sustained live rat. By comparing single versus paired testing, they report that social presence attenuates fear and that dominant and subordinate mice exhibit different patterns of defensive and social behaviors depending on threat type. The work provides a rich behavioral dataset and a potentially useful framework for studying hierarchical modulation of innate fear.

      Strengths:

      (1) The study uses two ecologically meaningful threat paradigms, allowing comparison across transient and sustained threat contexts.

      (2) Behavioral quantification is detailed, with manual annotation of multiple behavior types and transition-matrix level analysis.

      (3) The comparison of dominant versus subordinate pairs is novel in the context of innate fear.

      (4) The manuscript is well-organized and clearly written.

      (5) Figures are visually informative and support major claims.

      Weaknesses:

      Lack of neural mechanism insights.

      The current study focused on behavior. In the revised manuscript, we will incorporate a discussion of potential neural mechanisms and highlight this as an important direction for future work.

      Reviewer #3 (Public review):

      Summary:

      This study examines how dominance hierarchy influences innate defensive behaviors in pair-housed male mice exposed to two types of naturalistic threats: a transient looming stimulus and a sustained live rat. The authors show that social presence reduces fear-related behaviors and promotes active defense, with dominant mice benefiting more prominently. They also demonstrate that threat exposure reinforces social roles and increases group cohesion. The work highlights the bidirectional interaction between social structure and defensive behavior.

      Strengths:

      This study makes a valuable contribution to behavioral neuroscience through its well-designed examination of socially modulated fear. A key strength is the use of two ethologically relevant threat paradigms - a transient looming stimulus and a sustained live predator, enabling a nuanced comparison of defensive behaviors. The experimental design is robust, systematically comparing animals tested alone versus with their cage mate to cleanly isolate social effects. The behavioral analysis is sophisticated, employing detailed transition maps that reveal how social context reshapes behavioral sequences, going beyond simple duration measurements. The finding that social modulation is rank-dependent adds significant depth, linking social hierarchy to adaptive defense strategies. Furthermore, the demonstration that threat exposure reciprocally enhances social cohesion provides a compelling systems-level perspective. Together, these elements establish a strong behavioral framework for future investigations into the neural circuits underlying socially modulated innate fear.

      Weaknesses:

      The study exhibits several limitations. The neural mechanism proposed is speculative, as the study provides no causal evidence.

      Establishing causal evidence for neural mechanisms is beyond the scope of the current behavioral study. We highlight this as an important direction for future work.

    1. eLife Assessment

      This valuable study tests whether prediction error or prediction uncertainty controls how the brain segments continuous experience into events. The paper uses validated models that predict human behavior to analyze multivariate neural pattern changes during naturalistic movie watching. The authors provide solid evidence that there are overlapping but partially distinct brain dynamics for each signal.

    2. Reviewer #1 (Public review):

      Summary:

      This paper investigates the control signals that drive event model updating during continuous experience. The authors apply predictions from previously published computational models to fMRI data acquired while participants watched naturalistic video stimuli. They first examine the time course of BOLD pattern changes around human-annotated event boundaries, revealing pattern changes preceding the boundary in anterior temporal and then parietal regions, followed by pattern stabilization across many regions. The authors then analyze time courses around boundaries generated by a model that updates event models based on prediction error and another that uses prediction uncertainty. These analyses reveal overlapping but partially distinct dynamics for each boundary type, suggesting that both signals may contribute to event segmentation processes in the brain.

      Strengths:

      The question addressed by this paper is of high interest to researchers working on event cognition, perception, and memory. There has been considerable debate about what kinds of signals drive event boundaries, and this paper directly engages with that debate by comparing prediction error and prediction uncertainty as candidate control signals.

      The authors use computational models that explain significant variance in human boundary judgments, and they report the variance explained clearly in the paper.

      The authors' method of using computational models to generate predictions about when event model updating should occur is a valuable mechanistic alternative to methods like HMM or GSBS, which are data-driven.

      The paper utilizes an analysis framework that characterizes how multivariate BOLD pattern dissimilarity evolves before and after boundaries. This approach offers an advance over previous work focused on just the boundary or post-boundary points.

      Weaknesses:

      Boundaries derived from prediction error and uncertainty are correlated for the naturalistic stimuli. This raises some concerns about how well their distinct contributions to brain activity can be separated. While the authors attempt to look at the unique variance, there is a limit to how effectively this can be done without experimentally dissociating prediction error and uncertainty.

      The authors reports an average event length of ~20 seconds, and they also look +20 and -20 seconds around each event boundary. Thus, it's unclear how often pre- and post-boundary timepoints are part of adjacent events. This complicates the interpretations of the reported timecourses.

    3. Reviewer #2 (Public review):

      Summary:

      Tan et al. examined how multivoxel patterns shift in time windows surrounding event boundaries caused by both prediction errors and prediction uncertainty. They observed that some regions of the brain show earlier pattern shifts than others, followed by periods of increased stability. The authors combine their recent computational model to estimate event boundaries that are based on prediction error vs. uncertainty and use this to examine the moment-to-moment dynamics of pattern changes. I believe this is a meaningful contribution that will be of interest to memory, attention, and complex cognition research.

      Strengths:

      The authors have shown exceptional transparency in terms of sharing their data, code, and stimuli which is beneficial to the field for future examinations and to the reproduction of findings. The manuscript is well written with clear figures. The study starts from a strong theoretical background to understand how the brain represents events and have used a well-curated set of stimuli. Overall, the authors extend the event segmentation theory beyond prediction error to include prediction uncertainty which is an important theoretical shift that has implications in episodic memory encoding, use of semantic and schematic knowledge and to attentional processing.

      Weaknesses:

      (1) I am not fully satisfied with the author's explanation of pattern shifts occurring 11.9s prior to event boundaries. The average length of time for an event was 21.4 seconds. The window around the identified event boundaries was 20 seconds on either side. The earliest identified pattern shift peaks occur at 11.9s prior to the actual event boundary. This would mean on average, a pattern shift is occurring approximately at the midway point of the event (11.9s prior to a boundary of a 21.4s event is approx. the middle of an event). The authors offer up an explanation in which top down regions signal an update that propagates to lower order regions closer to the boundary. To make this interpretation concrete, they added an example: "in a narrative where a goal is reached midway-for instance, a mystery solved before the story formally ends-higher-order regions may update the event representation at that point, and this updated model then cascades down to shape processing in lower-level regions". This might make sense in a one-off case of irregular storytelling, but it is odd to think this would generalize. If an event is occurring and a given collection of regions represent that event, it doesn't follow the accepted convention of multivariate representational analysis that that set of regions would undergo such a large shift in patterns in the middle of an event. The stabilization of these patterns taking so long is also odd to me. I suspect some of these findings may be due to the stimuli used in this experiment and I am not confident this would generalize and invite the authors to disagree and explain. In the case of the exercise routine video, I try to imagine going from the push-up event to the jumping jack event. The actor stops doing pushups, stands up, and moves minimally for 16 seconds (these lulls are not uncommon). At that point they start doing jumping jacks. It is immediately evident from that moment on that jumping jacks will be the kind of event you are perceiving which may explain the long delay in event pattern stabilisation. Then about 11.9s prior to the end of the event, when the person is still performing jumping jacks (at this point they have been performing jumping jacks for 6 seconds), I would expect the brain to still be expecting this " jumping jacks event". For some reason at this point multivariate patterns in higher order regions shift. I do not understand what kind of top down processing is happening here and the reviewers need to be more concrete in their explanation because as of right now it is ill-defined. I also recognize that being specific to jumping jacks is maybe unfair, but this would apply to the push-ups, granola bar eating, or table cleaning events in the same manner. I suspect one possibility is that the participants realize that the stereotyped action of jumping jacks is going to continue and, thus, mindwander to other thoughts while waiting for novel, informative information to be presented. This explanation would challenge the more active top down processing assumed by the authors.

      I had provided a set of concerns to the authors that were not part of the public review and were not addressed. I was unaware of the exact format of the eLife approach, but I think they are worth open discussion so I am adding them here for consideration. Apologies for any confusion.

      (2) Why did the authors not examine event boundary activity magnitude differences from the uncertainty vs error boundaries? I see that the authors have provided the data on the openneuro. However, it seems like the difference in activity maps would not only provide extra contextualization of the findings, but also be fairly trivial. Just by eye-balling the plots, it appears as though there may be activity differences in the mPFC occurring shortly after a boundary between the two. Given this regions role in prediction error and schema, it would be important to understand whether this difference is merely due to thresholding effects or is statistically meaningful.

      (3) Further, the authors omitted all subcortical regions some of which would be especially interesting such as the hippocampus, basal ganglia, ventral tegmental area. These regions have a rich and deep background in event boundary activity, and prediction error. Univariate effects in these regions may provide interesting effects that might contextualize some of the pattern shifts in the cortex.

      (3) I see that field maps were collected, but the fmriprep methods state that susceptibility distortion correction was not performed. Is there a reason to omit this?

      (4) How many events were present in the stimuli?

    4. Reviewer #3 (Public review):

      Summary:

      The aim of this study was to investigate the temporal progression of the neural response to event boundaries in relation to uncertainty and error. Specifically, the authors asked 1. How neural activity changes before and after event boundaries 2. If uncertainty and error both contribute to explaining the occurrence of event boundaries and 3. If uncertainty and error have unique contributions to explaining the temporal progression of neural activity.

      Strengths:

      One strength of this paper is that it builds on an already validated computational model. It relies on straightforward and interpretable analysis techniques to answer the main question, with a smart combination of pattern similarity metrics and FIR. This combination of methods may also be an inspiration to other researchers in the field working on similar questions. The paper is well written and easy to follow. The paper convincingly shows that 1. There is a temporal progression of neural activity change before and after an event boundary 2. Event boundaries are predicted best by the combination of uncertainty and error signals.

      Weaknesses:

      Regarding question 3, the results are less convincing. Although the analyses in Figure S1 show that there are some unique contributions of uncertainty and error, it is unclear to what extent the results in Figure 7 are driven by shared variance. Therefore, it is not clear to what extent the main claim in the abstract is due to shared or unique variance. More specific comments are provided below.

      The other issue is the distance between events is short compared to the pre-onset effects that are observed. Halfway the distance between two events there are already neural signatures of change relating to the upcoming event boundary. I wonder if methodological issues could explain this effect and if not, what could allow participants to notice the impending event boundary.

      Impact:

      If these comments can be addressed sufficiently, I expect that this work will impact the field in its thinking on what drives event boundaries and spur interest in understanding the mechanisms behind the temporal progression of neural activity around these boundaries.

      Comments

      (1) The correlation between uncertainly and prediction error is very high, which makes it challenging to disentangle the effects of both on the neural response. The analysis in Figure S1 shows that the two predictors indeed have dissociable contributions. However, the results mainly reported in the discussion section and abstract still rely on models where only one of these factors is included at a time. This makes it debatable whether these specific networks mentioned really reflect unique contributions of each of these components. I specifically refer to this statement in the abstract: "Error-driven boundaries were associated with early pattern shifts in ventrolateral prefrontal areas, followed by pattern stabilization in prefrontal and temporal areas. Uncertainty-driven boundaries were linked to shifts in parietal regions within the dorsal attention network, with minimal subsequent stabilization. ". I would encourage repeating all analyses (also the ones in figure 7) with a models that includes both predictors and showing both results in the manuscript, so it is clear which regions really show unique variance related to one of the predictors. I also wonder why it is necessary to look at model comparisons between the combined and unique models, rather than simply reporting the significance of each predictor in the combined model.

      (2) The distance between event boundaries ranges between 20 and 30 seconds. The early pre-boundary effect that are observed in the manuscript occur at -12 seconds. This means that these effects occur roughly halfway between the previous and current event. This seems much earlier than expected. That is why I worry that the FIR analyses might not be able to distinguish effects of the previous event from effects of the upcoming event. What evidence is there that the FIR analyses can actually properly show the return to baseline? One way to address this might be to randomize the locations of the event boundaries while preserving the distance between them and rerun the models. This will give a null-model with the same event distances and should be able to distinguish this temporal overlap from the true effects of event boundaries.

      (3) If the analyses in point 2 confirm that there is indeed an event-boundary related change that occurs 12 seconds before event onset, it is important to consider what might cause these changes. Are there cues in the movie that indicate that an event boundary is coming? It would be interesting to investigate whether uncertainty and error are higher than expected at 12 seconds pre-onset.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper investigates the control signals that drive event model updating during continuous experience. The authors apply predictions from previously published computational models to fMRI data acquired while participants watched naturalistic video stimuli. They first examine the time course of BOLD pattern changes around human-annotated event boundaries, revealing pattern changes preceding the boundary in anterior temporal and then parietal regions, followed by pattern stabilization across many regions. The authors then analyze time courses around boundaries generated by a model that updates event models based on prediction error and another that uses prediction uncertainty. These analyses reveal overlapping but partially distinct dynamics for each boundary type, suggesting that both signals may contribute to event segmentation processes in the brain.

      Strengths:

      (1) The question addressed by this paper is of high interest to researchers working on event cognition, perception, and memory. There has been considerable debate about what kinds of signals drive event boundaries, and this paper directly engages with that debate by comparing prediction error and prediction uncertainty as candidate control signals.

      (2) The authors use computational models that explain significant variance in human boundary judgments, and they report the variance explained clearly in the paper.

      (3) The authors' method of using computational models to generate predictions about when event model updating should occur is a valuable mechanistic alternative to methods like HMM or GSBS, which are data-driven.

      (4) The paper utilizes an analysis framework that characterizes how multivariate BOLD pattern dissimilarity evolves before and after boundaries. This approach offers an advance over previous work focused on just the boundary or post-boundary points.

      We appreciate this reviewer’s recognition of the significance of this research problem, and of the value of the approach taken by this paper.

      Weaknesses:

      (1) While the paper raises the possibility that both prediction error and uncertainty could serve as control signals, it does not offer a strong theoretical rationale for why the brain would benefit from multiple (empirically correlated) signals. What distinct advantages do these signals provide? This may be discussed in the authors' prior modeling work, but is left too implicit in this paper.

      We added a brief discussion in the introduction highlighting the complementary advantages of prediction error and prediction uncertainty, and cited prior theoretical work that elaborates on this point. Specifically, we now note that prediction error can act as a reactive trigger, signaling when the current event model is no longer sufficient (Zacks et al., 2007). In contrast, prediction uncertainty is framed as proactive, allowing the system to prepare for upcoming changes even before they occur (Baldwin & Kosie, 2021; Kuperberg, 2021). Together, this makes clearer why these two signals could each provide complementary benefits for effective event model updating.

      "One potential signal to control event model updating is prediction error—the difference between the system’s prediction and what actually occurs. A transient increase in prediction error is a valid indicator that the current model no longer adequately captures the current activity. Event Segmentation Theory (EST; Zacks et al., 2007) proposes that event models are updated when prediction error increases beyond a threshold, indicating that the current model no longer adequately captures ongoing activity. A related but computationally distinct proposal is that prediction uncertainty (also termed "unpredictability") can serve as a control signal (Baldwin & Kosie, 2021). The advantage of relying on prediction uncertainty to detect event boundaries is that it is inherently proactive: the cognitive system can start looking for cues about what might come next before the next event starts (Baldwin & Kosie, 2021; Kuperberg, 2021). "

      (2) Boundaries derived from prediction error and uncertainty are correlated for the naturalistic stimuli. This raises some concerns about how well their distinct contributions to brain activity can be separated. The authors should consider whether they can leverage timepoints where the models make different predictions to make a stronger case for brain regions that are responsive to one vs the other.

      We addressed this concern by adding an analysis that explicitly tests the unique contributions of prediction error– and prediction uncertainty–driven boundaries to neural pattern shifts. In the revised manuscript, we describe how we fit a combined FIR model that included both boundary types as predictors and then compared this model against versions with only one predictor. This allowed us to identify the variance explained by each boundary type over and above the other. The results revealed two partially dissociable sets of brain regions sensitive to error- versus uncertainty-driven boundaries (see Figure S1), strengthening our argument that these signals make distinct contributions.

      "To account for the correlation between uncertainty-driven boundaries and error-driven boundaries, we also fitted a FIR model that predicted pattern dissimilarity from both types of boundaries (combined FIR) for each parcel. Then, we performed two likelihood ratio tests: combined FIR to error FIR, which measures the unique contribution of uncertainty boundaries to pattern dissimilarity, and combined FIR to uncertainty FIR, which measures the unique contribution of error boundaries to pattern dissimilarity. The analysis also revealed two dissociable sets of brain regions associated with each boundary type (see Figure S1)."

      (3) The authors refer to a baseline measure of pattern dissimilarity, which their dissimilarity measure of interest is relative to, but it's not clear how this baseline is computed. Since the interpretation of increases or decreases in dissimilarity depends on this reference point, more clarity is needed.

      We clarified how the FIR baseline is estimated in the methods section. Specifically, we now explain that the FIR coefficients should be interpreted relative to a reference level, which reflects the expected dissimilarity when timepoints are far from an event boundary. This makes it clear what serves as the comparison point for observed increases or decreases in dissimilarity.

      "The coefficients from the FIR model indicate changes relative to baseline, which can be conceptualized as the expected value when far from event boundaries."

      (4) The authors report an average event length of ~20 seconds, and they also look at +20 and -20 seconds around each event boundary. Thus, it's unclear how often pre- and post-boundary timepoints are part of adjacent events. This complicates the interpretations of the reported time courses.

      This is related to reviewer's 2 comment, and it will be addressed below.

      (5) The authors describe a sequence of neural pattern shifts during each type of boundary, but offer little setup of what pattern shifts we might expect or why. They also offer little discussion of what cognitive processes these shifts might reflect. The paper would benefit from a more thorough setup for the neural results and a discussion that comments on how the results inform our understanding of what these brain regions contribute to event models.

      We thank the reviewer for this advice on how better to set the context for the different potential outcomes of the study. We expanded both the introduction and discussion to better set up expectations for neural pattern shifts and to interpret what these shifts may reflect. In the introduction, we now describe prior findings showing that sensory regions tend to update more quickly than higher-order multimodal regions (Baldassano et al., 2017; Geerligs et al., 2021, 2022), and we highlight that it remains unclear whether higher-order updates precede or follow those in lower-order regions. We also note that our analytic approach is well-suited to address this open question. In the discussion, we then interpret our results in light of this framework. Specifically, we describe how we observed early shifts in higher-order areas such as anterior temporal and prefrontal cortex, followed by shifts in parietal and dorsal attention regions closer to event boundaries. This pattern runs counter to the traditional bottom-up temporal hierarchy view and instead supports a model of top-down updating, where high-level representations are updated first and subsequently influence lower-level processing (Friston, 2005; Kuperberg, 2021). To make this interpretation concrete, we added an example: in a narrative where a goal is reached midway—for instance, a mystery solved before the story formally ends—higher-order regions may update the event representation at that point, and this updated model then cascades down to shape processing in lower-level regions. Finally, we note that the widespread stabilization of neural patterns after boundaries may signal the establishment of a new event model.

      Excerpt from Introduction:

      “More recently, multivariate approaches have provided insights into neural representations during event segmentation. One prominent approach uses hidden Markov models (HMMs) to detect moments when the brain switches from one stable activity pattern to another (Baldassano et al., 2017) during movie viewing; these periods of relative stability were referred to as "neural states" to distinguish them from subjectively perceived events. Sensory regions like visual and auditory cortex showed faster transitions between neural states. Multi-modal regions like the posterior medial cortex, angular gyrus, and intraparietal sulcus showed slower neural state shifts, and these shifts aligned with subjectively reported event boundaries. Geerligs et al. (2021, 2022) employed a different analytical approach called Greedy State Boundary Search (GSBS) to identify neural state boundaries. Their findings echoed the HMM results: short-lived neural states were observed in early sensory areas (visual, auditory, and somatosensory cortex), while longer-lasting states appeared in multi-modal regions, including the angular gyrus, posterior middle/inferior temporal cortex, precuneus, anterior temporal pole, and anterior insula. Particularly prolonged states were found in higher-order regions such as lateral and medial prefrontal cortex.

      The previous evidence about evoked responses at event boundaries indicates that these are dynamic phenomena evolving over many seconds, with different brain areas showing different dynamics (Ben-Yakov & Henson, 2018; Burunat et al., 2024; Kurby & Zacks, 2018; Speer et al., 2007; Zacks, 2010). Less is known about the dynamics of pattern shifts at event boundaries (e.g. whether shifts observed in higher-order regions precedes or follow shifts observed in lower-level regions), because the HMM and GSBS analysis methods do not directly provide moment-by-moment measures of pattern shifts. Both the spatial and temporal aspects of evoked responses and pattern shifts at event boundaries have the potential to provide evidence about two potential control processes (error-driven and uncertainty-driven) for event model updating.”

      Excerpt from Discussion:

      “We first characterized the neural signatures of human event segmentation by examining both univariate activity changes and multivariate pattern changes around subjectively identified event boundaries. Using multivariate pattern dissimilarity, we observed a structured progression of neural reconfiguration surrounding human-identified event boundaries. The largest pattern shifts were observed near event boundaries (~4.5s before) in dorsal attention and parietal regions; these correspond with regions identified by Geerligs et. al as shifting their patterns on a fast to intermediate timescale (2022). We also observed smaller pattern shifts roughly 12 seconds prior to event boundaries in higher-order regions within anterior temporal cortex and prefrontal cortex, and these are slow-changing regions identified by Geerligs et. al (2022). This is puzzling. One prevalent proposal, based on the idea of a cortical hierarchy of increasing temporal receptive windows (TRWs), suggests that higher-order regions should update representations after lower-order regions do (Chang et al., 2021). In this view, areas with shorter TRWs (e.g., word-level processors) pass information upward, where it is integrated into progressively larger narrative units (phrases, sentences, events). This proposal predicts neural shifts in higher-order regions to follow those in lower-order regions. By contrast, our findings indicate the opposite sequence. Our findings suggest that the brain might engage in top-down event representation updating, with changes in coarser-grain representations propagating downward to influence finer-grain representations. (Friston, 2005; Kuperberg, 2021). For example, in a narrative where the main goal is achieved midway—such as a detective solving a mystery before the story formally ends—higher-order regions might update the overarching event representation at that point, and this updated model could then cascade down to reconfigure how lower-level regions process the remaining sensory and contextual details. In the period after a boundary (around +12 seconds), we found widespread stabilization of neural patterns across the brain, suggesting the establishment of a new event model. Future work could focus on understanding the mechanisms behind the temporal progression of neural pattern changes around event boundaries.”

      Reviewer #2 (Public review):

      Summary:

      Tan et al. examined how multivoxel patterns shift in time windows surrounding event boundaries caused by both prediction errors and prediction uncertainty. They observed that some regions of the brain show earlier pattern shifts than others, followed by periods of increased stability. The authors combine their recent computational model to estimate event boundaries that are based on prediction error vs. uncertainty and use this to examine the moment-to-moment dynamics of pattern changes. I believe this is a meaningful contribution that will be of interest to memory, attention, and complex cognition research.

      Strengths:

      The authors have shown exceptional transparency in terms of sharing their data, code, and stimuli, which is beneficial to the field for future examinations and to the reproduction of findings. The manuscript is well written with clear figures. The study starts from a strong theoretical background to understand how the brain represents events and has used a well-curated set of stimuli. Overall, the authors extend the event segmentation theory beyond prediction error to include prediction uncertainty, which is an important theoretical shift that has implications in episodic memory encoding, the use of semantic and schematic knowledge, and attentional processing.

      We thank the reader for their support for our use of open science practices, and for their appreciation of the importance of incorporating prediction uncertainty into models of event comprehension.

      Weaknesses:

      The data presented is limited to the cortex, and subcortical contributions would be interesting to explore. Further, the temporal window around event boundaries of 20 seconds is approximately the length of the average event (21.4 seconds), and many of the observed pattern effects occur relatively distal from event boundaries themselves, which makes the link to the theoretical background challenging. Finally, while multivariate pattern shifts were examined at event boundaries related to either prediction error or prediction uncertainty, there was no exploration of univariate activity differences between these two different types of boundaries, which would be valuable.

      The fact that we observed neural pattern shifts well before boundaries was indeed unexpected, and we now offer a more extensive interpretation in the discussion section. Specifically, we added text noting that shifts emerged in higher-order anterior temporal and prefrontal regions roughly 12 seconds before boundaries, whereas shifts occurred in lower-level dorsal attention and parietal regions closer to boundaries. This sequence contrasts with the traditional bottom-up temporal hierarchy view and instead suggests a possible top-down updating mechanism, in which higher-order representations reorganize first and propagate changes to lower-level areas (Friston, 2005; Kuperberg, 2021). (See excerpt for Reviewer 1’s comment #5.)

      With respect to univariate activity, we did not find strong differences between error-driven and uncertainty-driven boundaries. This makes the multivariate analyses particularly informative for detecting differences in neural pattern dynamics. To support further exploration, we have also shared the temporal progression of univariate BOLD responses on OpenNeuro (BOLD_coefficients_brain_animation_pe_SEM_bold.html and BOLD_coefficients_brain_animation_uncertainty_SEM_bold.html in the derivatives/figures/brain_maps_and_timecourses/ directory; https://doi.org/10.18112/openneuro.ds005551.v1.0.4) for interested researchers.

      Reviewer #3 (Public review):

      Summary:

      The aim of this study was to investigate the temporal progression of the neural response to event boundaries in relation to uncertainty and error. Specifically, the authors asked (1) how neural activity changes before and after event boundaries, (2) if uncertainty and error both contribute to explaining the occurrence of event boundaries, and (3) if uncertainty and error have unique contributions to explaining the temporal progression of neural activity.

      Strengths:

      One strength of this paper is that it builds on an already validated computational model. It relies on straightforward and interpretable analysis techniques to answer the main question, with a smart combination of pattern similarity metrics and FIR. This combination of methods may also be an inspiration to other researchers in the field working on similar questions. The paper is well written and easy to follow. The paper convincingly shows that (1) there is a temporal progression of neural activity change before and after an event boundary, and (2) event boundaries are predicted best by the combination of uncertainty and error signals.

      We thank the reviewer for their thoughtful and supportive comments, particularly regarding the use of the computational model and the analysis approaches.

      Weaknesses:

      (1) The current analysis of the neural data does not convincingly show that uncertainty and prediction error both contribute to the neural responses. As both terms are modelled in separate FIR models, it may be that the responses we see for both are mostly driven by shared variance. Given that the correlation between the two is very high (r=0.49), this seems likely. The strong overlap in the neural responses elicited by both, as shown in Figure 6, also suggests that what we see may mainly be shared variance. To improve the interpretability of these effects, I think it is essential to know whether uncertainty and error explain similar or unique parts of the variance. The observation that they have distinct temporal profiles is suggestive of some dissociation,but not as convincing as adding them both to a single model.

      We appreciate this point. It is closely related to Reviewer 1's comment 2; please refer to our response above.

      (2) The results for uncertainty and error show that uncertainty has strong effects before or at boundary onset, while error is related to more stabilization after boundary onset. This makes me wonder about the temporal contribution of each of these. Could it be the case that increases in uncertainty are early indicators of a boundary, and errors tend to occur later?

      We also share the intuition that increases in uncertainty are early indicators of a boundary, and errors tend to occur later. If that is the case, we would expect some lags between prediction uncertainty and prediction error. We examined lagged correlation between prediction uncertainty and prediction error, and the optimal lag is 0 for both uncertainty-driven and error-driven models. This indicates that when prediction uncertainty rises, prediction error also simultaneously rises.

      Author response image 1.

      (3) Given that there is a 24-second period during which the neural responses are shaped by event boundaries, it would be important to know more about the average distance between boundaries and the variability of this distance. This will help establish whether the FIR model can properly capture a return to baseline.

      We have added details about the distribution of event lengths. Specifically, we now report that the mean length of subjectively identified events was 21.4 seconds (median 22.2 s, SD 16.1 s). For model-derived boundaries, the average event lengths were 28.96 seconds for the uncertainty-driven model and 24.7 seconds for the error-driven model.

      " For each activity, a separate group of 30 participants had previously segmented each movie to identify fine-grained event boundaries (Bezdek et al., 2022). The mean event length was 21.4 s (median 22.2 s, SD 16.1 s). Mean event lengths for uncertainty-driven model and error-driven model were 28.96s, and 24.7s, respectively (Nguyen et al., 2024)."

      (4) Given that there is an early onset and long-lasting response of the brain to these event boundaries, I wonder what causes this. Is it the case that uncertainty or errors already increase at 12 seconds before the boundaries occur? Or if there are other makers in the movie that the brain can use to foreshadow an event boundary? And if uncertainty or errors do increase already 12 seconds before an event boundary, do you see a similar neural response at moments with similar levels of error or uncertainty, which are not followed by a boundary? This would reveal whether the neural activity patterns are specific to event boundaries or whether these are general markers of error and uncertainty.

      We appreciate this point; it is similar to reviewer 2’s comment 2. Please see our response to that comment above.

      (5) It is known that different brain regions have different delays of their BOLD response. Could these delays contribute to the propagation of the neural activity across different brain areas in this study?

      Our analyses use ±20 s FIR windows, and the key effects we report include shifts ~12s before boundaries in higher-order cortex and ~4.5s pre-boundary in dorsal attention/parietal areas. Given the literature above, region-dependent BOLD delays are much smaller (~1–2s) than the temporal structure we observe (Taylor et al., 2018), making it unlikely that HRF lag alone explains our multi-second, region-specific progression.

      (6) In the FIR plots, timepoints -12, 0, and 12 are shown. These long intervals preclude an understanding of the full temporal progression of these effects.

      For page length purposes, we did not include all timepoints. We uploaded a brain animation of all timepoints and coefficients for each parcel in Openneuro (PATTERN_coefficients_brain_animation_human_fine_pattern.html and PATTERN_coefficients_lines_human_fine.html in the derivatives/figures/brain_maps_and_timecourses/ directory; https://doi.org/10.18112/openneuro.ds005551.v1.0.4) for interested researchers.

      References

      Taylor, A. J., Kim, J. H., & Ress, D. (2018). Characterization of the hemodynamic response function across the majority of human cerebral cortex. NeuroImage, 173, 322–331. https://doi.org/10.1016/j.neuroimage.2018.02.061

    1. eLife Assessment

      This study presents valuable analyses of single neuron activity in the subthalamic nucleus (STN) of monkeys performing a decision-making task that manipulates both perceptual evidence and reward. In particular, the study shows convincing evidence of multiple decision variables being represented in the STN. However, the evidence for sub-populations in STN with distinct involvements in decision-making is incomplete at this stage and requires either further efforts to provide stronger support or refinement of that conclusion.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript offers a careful and technically impressive dissection of how subpopulations within the subthalamic nucleus support reward‑biased decision‑making. The authors recorded from STN neurons in monkeys performing an asymmetric‑reward version of a visual motion discrimination task and combined single‑unit analyses, regression modeling, and drift‑diffusion framework fitting to reveal functionally distinct clusters of neurons. Each subpopulation demonstrated unique relationships to decision variables - such as the evidence‑accumulation rate, decision bound, and non‑decision processes - as well as to post‑decision evaluative signals like choice accuracy and reward expectation. Together, these findings expand our understanding of the computational diversity of STN activity during complex, multi‑attribute choices.

      Strengths:

      (1) The use of an asymmetric‑reward paradigm enables a clean separation between perceptual and reward influences, making it possible to identify how STN neurons blend these different sources of information.

      (2) The dataset is extensive and well‑controlled, with careful alignment between behavioral and neural analyses.

      (3) Relating neuronal cluster activity to drift‑diffusion model parameters provides an interpretable computational link between neural population signals and observed behavior.

      (4) The clustering analyses, validated across multiple parameters and distance metrics, reveal robust functional subgroups within STN. The differentiation of clusters with respect to both evidence and reward coding is an important advance over treating the STN as a unitary structure.

      (5) By linking neural activity to predicted choice accuracy and reward expectation, the study extends the discussion of the STN beyond decision formation to include outcome monitoring and post‑decision evaluation.

      Weaknesses:

      (1) The inferred relationships between neural clusters and specific drift‑diffusion parameters (e.g., bound height, scaling factor, non‑decision time) are intriguing but inherently correlational. The authors should clarify that these associations do not necessarily establish distinct computational mechanisms.

      (2) While the k‑means approach is well described, it remains somewhat heuristic. Including additional cross‑validation (e.g., cluster reproducibility across monkeys or sessions) would strengthen confidence in the four‑cluster interpretation.

      (3) The functional dissociations across clusters are clearly described, but how these subgroups interact within the STN or through downstream basal‑ganglia circuits remains speculative.

      (4) A natural next step would be to construct a generative multi‑cluster model of STN activity, in which each cluster is treated as a computational node (e.g., evidence integrator, bound controller, urgency or evaluative signal).

      (5) Such a low‑dimensional, coupled model could reproduce the observed diversity of firing patterns and predict how interactions among clusters shape decision variables and behavior.

      (6) Population‑level modeling of this kind would move the interpretation beyond correlational mapping and serve as an intermediate framework between single‑unit analysis and in‑vivo perturbation.

      (7) Causal inference gap - Without perturbation data, it is difficult to determine whether the identified neural modulations are necessary or sufficient for the observed behavioral effects. A brief discussion of this limitation - and how future causal manipulations could test these cluster functions - would be valuable.

    3. Reviewer #2 (Public review):

      This study uses monkey single-unit recordings to examine the role of the STN in combining noisy sensory information with reward bias during decision-making between saccade directions. Using multiple linear regressions and k-means clustering approaches, the authors overall show that a highly heterogeneous activity in the STN reflects almost all aspects of the task, including choice direction, stimulus coherence, reward context and expectation, choice evaluation, and their interactions. The authors report in particular how, here too, in a very heterogeneous way, four classes of neurons map to different decision processes evaluated via the fitting of a drift-diffusion model. Overall, the study provides evidence for functionally diverse populations of STN neurons, supporting multiple roles in perceptual and reward-based decision-making.

      This study follows up on work conducted in previous years by the same team and complements it. Extracellular recordings in monkeys trained to perform a complex decision-making task remain a remarkable achievement, particularly in brain structures that are difficult to target, such as the subthalamic nucleus. The authors conducted numerous rigorous and systematic analyses of STN activities, using sophisticated statistical approaches and functional computational modeling.

      One criticism I would make is that the authors sometimes seem to assume that readers are familiar with their previous work. Indeed, the motivation and choices behind some analyses are not clearly explained. It might be interesting to provide a little more context and insight into these methodological choices. The same is true for the description of certain results, such as the behavioral results, which I find insufficiently detailed, especially since the two animals do not perform exactly the same way in the task.

      Another criticism is the difficulty in following and absorbing all the presented results, given their heterogeneity. This heterogeneity stems from analytical choices that include defining multiple time windows over which activities are studied, multiple task-related or monkey behavioral factors that can influence them, multiple parameters underlying the decision-making phenomena to be captured, and all this without any a priori hypotheses. The overall impression is of an exploratory description that is sometimes difficult to digest, from which it is hard to extract precise information beyond the very general message that multiple subpopulations of neurons exist and therefore that the STN is probably involved in multiple roles during decision-making.

      It would also have been interesting to have information regarding the location of the different identified subpopulations of neurons in the STN and their level of segregation within this nucleus. Indeed, since the STN is one of the preferred targets of electrical stimulation aimed at improving the condition of patients suffering from various neurological disorders, it would be interesting to know whether a particular stimulation location could preferentially affect a specific subpopulation of neurons, with the associated specific behavioral consequences.

      Therefore, this paper is interesting because it complements other work from the same team and other studies that demonstrate the likely important role of the STN in decision-making. This will be of interest to the decision-making neuroscience community, but it may leave a sense of incompleteness due to the difficulty in connecting the conclusions of these different studies. For example, in the discussion section, the authors attempt to relate the different neuronal populations identified in their study and describe some relatively consistent results, but others less so.

    4. Reviewer #3 (Public review):

      Summary:

      In this study, the authors investigate single neuron activity in the subthalamic nucleus (STN) of two monkeys performing a perceptual decision-making task in which both perceptual evidence and reward were manipulated. They find rich representations of decision variables (such as choice, perceptual evidence and reward) in neural activity, and following prior work, cluster a subset of these neurons into subpopulations with varying activity profiles. Further, they relate the activity of neurons within these clusters to parameters of drift diffusion models (DDMs) fit to animal behaviour on trial subsets by neural firing rates, finding heterogeneous and temporally varying relationships between different clusters and DDM parameters, suggesting that STN neurons may play multiple roles in decision formation and evaluation.

      Strengths:

      The behavioural task used by the authors is rich and affords disambiguation between decision variables such as perceptual evidence, value and choice, by independently manipulating stimulus strength and reward size. Both their monkeys show good performance on the task, and their population of ~150 neurons across monkeys reveals a rich repertoire of decision-related activity in single neurons, with individual neurons showing strong tuning to choice, stimulus strength and reward bias. There is little doubt that neurons in the STN are tuned to several decision variables and show heterogeneous tuning profiles.

      Weaknesses:

      The primary weakness of the paper lies in the claim that STN contains multiple sub-populations with distinct involvements in decision making, which is inadequately supported by the paper's methods and analyses.

      First, while it is clear that the ~150 recorded neurons across 2 monkeys (91, 59 respectively) display substantial heterogeneity in their activity profiles across time and across stimulus/reward conditions, the claim of sub-populations largely rests on clustering a *subset of less than half the population - 66 neurons (48, 15 respectively) - chosen manually by visual inspection*. The full population seems to contain far more decision-modulated neurons, whose response profiles seem to interpolate between clusters. Moreover, it is unclear if the 4 clusters hold for each of the 2 monkeys, and the choice of 4-5 clusters does not seem well supported by metrics such as silhouette score, etc, that peak at 3 (1 or 2 were not attempted). From the data, it is easier to draw the conclusion that the STN population contains neurons with heterogeneous response profiles that smoothly vary in their tuning to different decision variables, rather than distinct sub-populations.

      Second, assuming the existence of sub-populations, it is unclear how their time- and condition-varying relationship with DDM parameters is to be interpreted. These relationships are inferred by splitting trials based on individual neurons' firing rates in different task epochs and reward contexts, and regressing onto the parameters of separate DDMs fit to those subsets of trials. The result is that different sub-populations show heterogeneous relationships to different DDM parameters over time - a result that, while interesting, leaves the computational involvement of these sub-populations/implementation of the decision process unclear.

      Outlook:

      This is a paper with a rich dataset of neural activity in the STN in a rich perceptual decision-making task, and convincing evidence of heterogeneity in choice, value and evidence tuning across the STN, suggesting the STN may be involved in several aspects of decision-making. However, the authors' specific claims about sub-populations in the STN, each having distinct relationships to decision processes, are not adequately supported by their analyses.

    1. eLife Assessment

      This work represents a valuable finding of how single-trial functional connectivity may be used to infer different cognitive states involved in speech perception and production. Although the data and analyses are overall convincing, the theoretical advance and novelty of the finding are less clear. With a clearer idea of the functional significance of the connectivity data, the paper would be of interest to those interested in brain networks and communication.