10,000 Matching Annotations
  1. Last 7 days
    1. eLife Assessment

      This important study uses a feedback-driven recurrent neural network framework to explore the dynamics underlying learning of BCI decoder perturbations. With convincing evidence, the authors demonstrate that behavioral learning trajectories that match those of primates learning within-manifold and outside-manifold perturbations are likely tied to the dynamical controllability of the network and input-driven learning. This work is likely to motivate a new generation of BCI and learning experiments combining large-scale neural recordings with latent dynamical systems analyses.

    2. Reviewer #1 (Public review):

      Summary:

      Gurnani et al. explore how dynamical properties of neural networks influence capacity for and mechanisms of learning. Specifically, they focus on Brain Computer Interface (BCI) learning, in which manipulations are applied to a decoder that maps neural activity onto computer cursors. This paradigm was introduced by Sadtler et al. 2014, and has become an influential part of the neuroscience motor learning literature. A particularly fascinating outcome of that body of work is the observation that "within-manifold" perturbations (WMPs), which preserve covariance structure in the neural population, are easier to learn than "outside-manifold" perturbations (OMPs), which break this. Since deep network parameter access is challenging (to say the least) in monkey experiments, the intuition for this split in learnability is ripe for modeling and theory work. Indeed, the authors here introduce a feedback-driven recurrent neural network model whose output drives a simulation of a neural decoder commonly used in BCI studies like the Sadtler paper. While there have now been several modeling studies exploring how neural networks could solve this task, the feedback control perspective gives the authors' new model an interesting niche. Overall, this is a thoroughly done and well-written modeling study, and a solid contribution to the literature on within- and outside-manifold perturbations.

      Strengths:

      Reframing the OMP and WMP learning from a feedback-driven dynamical systems perspective, not just a geometric one, is an interesting take. The controllability analysis (along with the clear difference in input-driven and recurrence-driven learning) is quite a cool result that helps better frame what might be happening in the primate brain during similar tasks.

      Weaknesses:

      Some of the more interesting aspects, especially the controllability) and the differences between input-driven and recurrence-driven learning could be further developed, either by showing more analyses or running more comparisons. A few sections could benefit from some additional clarity on the strength and significance of results.

    3. Reviewer #2 (Public review):

      Summary:

      The constraints on learning in the brain remain elusive. Using BCIs, Sadtler et al. demonstrated that the brain can rapidly learn new decoders that lie within the intrinsic neural manifold (short-term adaptation), while showing substantial difficulty learning decoders that lie outside the manifold. This finding suggests that neural manifolds impose constraints on learning. However, even among within-manifold decoders, there was considerable variability in learning rates that could not be explained solely by geometric factors.

      Here, Gurnani et al propose that, in addition to manifold structure, neural dynamics (i.e., the flow field across states) impose critical constraints on learning. To test this idea, the authors trained RNNs that received real-time feedback (e.g., position error signals) during a BCI task in which the network controlled a cursor. The authors showed that short-term adaptation to a new decoder is facilitated by plasticity in sensory inputs, and that pre-existing dynamics influence the speed of adaptation across different decoders. These findings may explain previously unresolved constraints observed in BCI learning and suggest an important role for neural dynamics in constraining sensorimotor learning in the brain.

      Strengths:

      Overall, the work is highly impactful and is likely to motivate a new generation of BCI and learning experiments combining large-scale neural recordings with latent dynamical systems analyses. The paper is clearly written, and I only have minor comments, primarily for clarification.

      Weaknesses:

      There are no major weaknesses. Please see below for minor comments.

      (1) If I understand correctly, most analyses do not distinguish between the preparatory phase and the movement phase. Given that the preparatory phase is largely controlled by feedforward input, I suspect that most of the dynamical constraints underlying learning variability arise during the movement phase. Is this correct? If so, could the authors clarify or directly test this distinction?

      (2) P4: Position vs. velocity decoders: It would be helpful to describe whether and how the choice of velocity versus position decoders influences whether perturbations are learnable, and whether input-driven constraints arising in this task are similar.

      (3) The variance criteria used to screen decoder perturbations may themselves covary with learning rate, behavioral asymmetry, and overlap with controllable subspaces. A quantification of this relationship would help contextualize the findings and inform the design of future BCI experiments.

      (4) To support the comparison between Figures 3 and 7, and the conclusion that Figure 3 better matches the experimental data, which is an important point of the manuscript, could the authors provide quantitative values from the experimental data (e.g., how large is the change in variance within oPCs, etc)?

      (5) Figure 8h: Is the variability in learning rates in models with different controller networks explained by the same dynamical constraints described in Figure 6? Demonstrating consistent dynamical constraints across model architectures would strengthen the paper's central conclusion.

      (6) Figure 8f: Why does feedforward controllability differ between conditions? This is mentioned in the text, but no explanation is provided.

    4. Author response:

      We thank the reviewers for such positive and constructive feedback, and for their enthusiasm about our use of controllability and dynamical systems perspectives to understand learning variability. We are glad to see that they believe this work will be “highly impactful” and “directly motivate new learning experiments”. We agree that these findings suggest new experimental tests of dynamical constraints on learning, in BCIs and motor control as well as other computations that depend on neural dynamics, such as decision-making tasks. Combined with new tools for data-driven identification of latent dynamics, we are excited to see how dynamical constraints can help understand learning outcomes across different tasks, brain areas, and individuals.

      Based on reviewer comments, we identified three sets of analyses that will improve the clarity and strength of evidence for our primary conclusions.

      (1) As the reviewers identified, a central contribution of this study is to show that continuous within-class variability becomes explainable by considering underlying dynamical structure. We realize this was insufficiently emphasized in Figure 6. All regression models included group-specific intercepts, so improvements from dynamical features reflect prediction beyond class-level differences. To quantify this directly, we compared against an intercept-only model and evaluated prediction of within-class residual variability (mean-subtracted). Geometric features did not improve performance beyond class means, whereas dynamical features significantly improved prediction (p<10<sup>-5</sup> for both behavioral measures). Moreover, only dynamical features predicted within-class residual variability (cross-validated R<sup>²</sup> = 0.19 and 0.30 for learning speed and hit-rate change, respectively; p < 10<sup-8</sup>). We will add these analyses and revise the text to clarify this point.

      Author response image 1.

      Cross-validated R<sup>2</sup> for (left) learning speed and (right) change in hit rate, for true behavioral outcomes (total variability, blue) and after subtracting class means for OMPs and WMPs (residual variability, orange).

      (2) We appreciate the reviewers’ comments to clarify what changes in neural structure are small, and to provide a quantitative comparison to changes observed in the primate BCI experiments.

      We referred to published analyses of within-manifold perturbations (WMPs) in the primate BCI experiments, which reported <10% reduction in fractional variance within the intrinsic manifold for most sessions (Golub et al., 2017). (No comparable analysis was reported for OMP sessions.) For adaptation to WMPs, changes in variance within the intrinsic manifold in RNN models with input plasticity closely matched experimental observations (75th percentile: 94% of pre-learning variance in the model versus 90% in data), whereas recurrent plasticity RNN models produced substantially larger departures (78%). In fact, the entire distribution with recurrent plasticity was shifted to larger changes than those observed in most primate WMP sessions. A second comparison based on covariance changes along BCI dimensions (Figure 5 in [1]) yielded a similar conclusion. The authors estimated ~5-20% changes in covariance along both the intuitive and perturbed decoder dimensions during WMP sessions. For our RNN models trained with input plasticity, we observed similar changes: changes along the perturbed decoder were <10% although changes along the intuitive decoder were ~40%. We borrowed the terminology of “small” from the experimental findings in [1], where comparisons were made to alternative learning hypotheses (with predicted changes as >10-fold higher). These analyses now provide more quantitative evidence that neural reorganization under input plasticity is largely consistent with primate neural data. We will add these comparisons as a supplementary figure in the revised manuscript.

      Author response image 2.

      Proportion of maps with normalized variance in intrinsic manifold (IM) above a certain minimum value. Results with training RNNs on WMPs, with either input plasticity (blue) or recurrent plasticity (orange), overlaid on primate data from Golub et al, 2017 (black). Dashed lines indicate the 75th percentile value.

      We agree with reviewers that under input plasticity, both statistical and dynamical changes are relatively modest, particularly when compared to the behavioral changes. Rather than focusing on the magnitude of these changes, our regression analyses in Figure 6 highlight that the dynamical changes are a better predictor of continuous variability of behavioral outcomes. Moreover, OMPs are misaligned with both the intrinsic manifold and the controllable subspace. Thus, mean OMP learning performance alone cannot disentangle the contribution of these different sources of misalignment. By showing that variability within each class is explained by considering dynamics (Figure 4, Figure 6), and using the dissociation between task manifold and controllable subspace by varying controller architecture (Figure 8), we provide evidence that dynamical constraints provide a more comprehensive picture of learning variability, beyond categorical differences.

      (3) Finally, we tested whether the same dynamical features explain learning variability across the alternative controller architectures in Figure 8. They remained predictive of learning speed (cross-validated R<sup>2</sup> of 0.35 and 0.33 for low-D and high-D controller networks respectively), supporting the generality of the proposed dynamical constraints. We will add this analysis to the revised manuscript.

      As per reviewer suggestions, we will also perform additional analyses to examine the relationship of learning outcomes to initial behavioral metrics for different decoders, assess flowfield changes during the preparatory phase, report the relevant statistics for stated comparisons, and clarify that learning with only one set of inputs (either feedforward or feedback) was poorer.  We will also clarify several points raised by the reviewers, including:

      (i) the compatibility of overlapping confidence intervals of WMP/OMP learning outcomes with prior experimental data in Sadtler et al, 2014;

      (ii) the distinction between flow-field changes in the full neural state space (Figure 5D) and along behavioral readout dimensions (Figure 5E);

      (iii) that autonomous dynamics contribute to controllability and how differences in pre-trained autonomous dynamics across controller architectures could indirectly vary feedforward controllability (Figure 8); and

      (iv) the relationship between controllability and reachable manifolds in position-decoder BCIs.

      References:

      (1) [Golub et al, 2017]   Golub, M.D., Sadtler, P.T., Oby, E.R., Quick, K.M., Ryu, S.I., Tyler-Kabara, E.C., Batista, A.P., Chase, S.M. and Yu, B.M., 2018. Learning by neural reassociation. Nature neuroscience, 21(4), pp.607-616.

      (2) [Sadtler et al, 2014]   Sadtler, P.T., Quick, K.M., Golub, M.D., Chase, S.M., Ryu, S.I., Tyler-Kabara, E.C., Yu, B.M. and Batista, A.P., 2014. Neural constraints on learning. Nature, 512(7515), pp.423-426.

    1. eLife Assessment

      This important study employed a multi-stage behavioural paradigm of increasing cognitive complexity to investigate the role of inhibitory interneurons in the medial prefrontal cortex (mPFC) in avoidance behaviour in mice. The authors used imaging and optogenetic techniques combined with this behavioural task to show that mPFC interneurons are necessary for encoding but not executing avoidance under threat. The evidence supporting these claims is compelling, and findings will be of interest to researchers in behavioural and systems neurosciences.

    2. Reviewer #1 (Public review):

      Summary:

      This study investigates the role of the medial prefrontal cortex (mPFC) in generating goal-directed actions under threat, using a progressive behavioral paradigm, neural recordings, and optogenetic inhibition in mice. The authors demonstrate that while mPFC GABAergic neurons strongly encode cues, actions, and errors, particularly under high cognitive demand, this neural activity is not causally required for executing avoidance behaviors. By rigorously controlling for movement and arousal, the researchers found that much of the observed mPFC signaling actually reflects baseline behavioral states rather than the generation of the actions themselves. This dissociation between encoding and causality challenges traditional views of mPFC as an executive controller of action and provides a nuanced understanding of its role in evaluative and contextual processing.

      Strengths:

      The behavioral paradigm employed in this study is one of its greatest strengths, offering a rigorous, progressive, and well-controlled framework to dissect the neural mechanisms underlying avoidance under threat. This three-phase task design is particularly well-suited to tease apart the contributions of learning, discrimination, and cognitive load to both behavior and neural activity.

      By tracking movement (speed, rotations) and including it as a covariate in statistical models, the authors also underscore the need to control for movement and baseline activity when interpreting cortical signals, which is relevant for all studies of brain-behavior relationships, ensuring that behavioral changes are not due to general arousal or motor activity.

      Finally, the study combines multiple advanced techniques-fiber photometry, single-cell calcium imaging (miniscopes), and two distinct optogenetic inhibition methods-to provide a comprehensive look at both neural encoding and causal necessity.

      Comments on revised version.

      The authors adequately addressed all of the reviewers' comments and made great improvements to the manuscript, particularly enhancing the methods and figures to significantly improve clarity and readability.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by Sajid et al. describes a comprehensive behavioral, imaging and optogenetic dataset investigating the role of the mPFC in avoidance and escape behaviors. Although many movement- and task-related variables are encoded by mPFC GABAergic neurons, the main conclusion is that they are unlikely to control behavioral output.

      Strengths:

      The manuscript is generally well executed and plausible in its conclusions. It provides an alternative viewpoint to many articles describing the involvement of mPFC to behavior, based on a complex multi-stage behavioral paradigm acquired and analyzed in an unbiased way.

      Weaknesses:

      This reviewer sees two weaknesses.

      (1) In some cases, the explained variance, marginal and conditional, is low, suggesting the models only modestly capture the complexity in the data.

      (2) The manuscript is challenging to read due to the comprehensive and unbiased presentation style.

      Comments on revised version.

      The authors did a good job at addressing the reviewers' comments. One minor additional suggestion is to add references for the statement in the last paragraph of the discussion for the mPFC lesion studies.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      The authors conclude that mPFC is not required for avoidance, based on the minimal behavioral effects of optogenetic inhibition. While this interpretation is supported by the data, the choice of viral constructs could lead to an underestimation of the mPFC's role for other reasons. First, the choice of viral constructs could lead to an underestimation of the mPFC's role for several reasons. Specifically, the efficacy of eArch3.0 inhibition was not verified beyond histology, and its non-cell-type-specific nature could lead to disinhibition or compensatory activity in downstream regions. Although the authors' use of visual cortex (VI) inhibition as a control suggests that broad cortical inhibition does not impair avoidance, subcortical compensation cannot be ruled out. Additionally, Vgat-ChR2 targets only GABAergic neurons, potentially missing glutamatergic contributions. Addressing these limitations in the Discussion section would strengthen the manuscript.

      We thank the reviewer for these points. First, although we did not perform direct electrophysiological verification of eArch3.0 efficacy in mPFC in the present study, this construct has been extensively validated in prior work and is widely used to produce robust neuronal inhibition. In our experiments, the lack of behavioral effect with eArch3.0 inhibition converged with the results obtained using the independent Vgat-ChR2 approach, which we directly validated, supporting the conclusion that mPFC inhibition does not impair avoidance under these conditions. Our results are also consistent with previous studies showing that mPFC lesions do not impair avoidance behavior.

      Second, we agree that manipulating mPFC activity will necessarily influence downstream circuits, including subcortical regions, given the interconnected nature of these networks. Our goal was to test whether inhibiting mPFC activity alters avoidance behavior, not to isolate it from its targets. In this context, the absence of behavioral effects indicates that avoidance behavior can be supported without mPFC activity. While compensation is always a possibility, this usually reveals some impairment while compensation occurs, but we did not observe those effects. Our results are consistent with the idea that subcortical circuits normally mediate these behaviors.

      Finally, regarding Vgat-ChR2, activating GABAergic neurons is a well-established approach to suppress cortical activity, as these interneurons provide strong inhibition onto local glutamatergic neurons. Thus, this manipulation is expected to broadly reduce excitatory output in cortex. Indeed, the robust suppression of cortical activity we observed with GABAergic activation makes it unlikely that major glutamatergic contributions were missed.

      These points are in the paper, including the Discussion.

      Reviewer #2 (Public review):

      (1) There are few details on the linear mixed models in the methods. This section could be improved by including a mathematical description. More importantly, the reader never learns how accurately the models capture the data. Given that most conclusions rely on the models, it seems central to address this point carefully. For example, what is the explained variance, marginal, and conditional? Were the nested models compared to non-nested ones (e.g., AIC), what are the specific outputs of the likelihood ratio tests briefly mentioned in the methods?

      Model structure was defined a priori by the experimental design and hypotheses rather than selected through model comparison, but we verified the contribution of key model components (e.g., covariates, interactions, and random effects) using likelihood ratio tests comparing models. Regarding model performance, we now report for each model the marginal and conditional R<sup>2</sup> values (Nakagawa), which quantify variance explained by fixed effects alone and by the full mixed model including random effects. In addition, likelihood ratio test results for all fixed effects and interactions (χ<sup>2</sup> statistics) were already reported in the manuscript.

      (2) For several figures, there is a disconnect with the main text, in the sense that it is difficult to understand how statements in the main text connect with specific figure panels or bars in their graphs. This is particularly the case for the most complex figures, e.g., Figures 3, 4, and their supplements. It would be beneficial to introduce subfigure labels (A1, etc) and state explicitly in the main text what figure panel is described (in parentheses). Alternatively, breakdown the figures into multiple ones, decreasing ambiguity. This is important because it will help the reader better assess the strength of the results.

      We have significantly revised the manuscript to reduce ambiguity and thank the reviewer for each of their (28) requests, which we have implemented in full. We also added additional figure references to the Results to assist with readability. This has significantly improved clarity and readability.

      (3) It does not appear that the code and data used to produce the figures are made available. That would be very beneficial, given the complexity of the analysis and dataset collection procedures. It would also help readers better understand the results and probe their validity.

      As usual, we will share the full dataset in the VOR at Dryad after the revision is completed.

      Reviewer #3 (Public review):

      The main weakness, in my view, lies in the Results section. In the figures, the authors do not present any raw data, and the plots are shown as mean {plus minus} SEM without displaying the distribution of individual data points.

      We thank the reviewer for the recommendations. Individual data points are shown where appropriate (e.g., Fig. 1). However, most of our analyses involve repeated-measures, hierarchical data with multiple levels (cells and sessions nested within animals), where simple point overlays can be misleading or difficult to interpret without explicit linking across levels. We therefore use mean ± SEM visualizations for clarity in these summary figures, while preserving the full hierarchical structure in the statistical analysis through mixed-effects models. All data will be made available in the VOR to allow full inspection of the underlying distributions.

      It is both a strength and a weakness that the authors do not attempt to guide the reader through the Results section and instead present the findings with very little emphasis on the key outcomes of the GLM. While this approach is arguably the most transparent way to report results, it also makes the section quite difficult to follow and may discourage readers.

      I would recommend rewriting the Results section to make it more accessible to a broader audience. A similar issue applies to the figures: presenting all plots reflects a commendable commitment to transparency, but it would greatly benefit from a clearer narrative. As it stands, it is difficult to grasp the message of each figure by simply browsing through them.

      The full description (complexity) of the models is entirely in the legends and supplemental figures. This was done to make the results easier to follow. We have made all the changes noted above to facilitate readability while assuring there is enough transparency to assess the data. We think readability has significantly improved.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Below are a few specific suggestions related to the main weaknesses mentioned above.

      (1) P4 L9: The sentence starting with "However, most ..." sounds more like a statement than a contrast with the previous sentence. Therefore, please delete "However" and please add references to justify the statement.

      Done.

      (2) P8: Definition of movement peaks. It would be great to have three videos illustrating the mouse behavior in the three different movement peaks. This would allow the reader to better understand the differences between no peaks 3 sec prior, more than 5 seconds, and one example that does not fit these two categories. In addition, what percentage of all peaks to the no peaks 3 sec prior and more than 5 sec represent?

      We added the percentages. The “3 sec prior” represent ~23% and the “5 sec” represent ~31%. However, we do not think adding a single video of one movement per these 3 cases would be useful as the dataset is composed of thousands of these movements.

      (3) P8: Last paragraph. When you state that you performed a linear fit between DF/F and movement, do you mean speed? In addition, the statement "integrating both signals over a 200 ms window" is incomplete. How is the window selected? Is the window 200 ms around movement onset or movement peak speed?

      Yes, the movement variable used in the linear fit corresponds to speed. Regarding the 200 ms window, this analysis does not focus on specific behavioral events such as movement onset or peak speed. Instead, both ΔF/F and speed signals were segmented into consecutive 200 ms windows across the entire recording session, and the linear relationship was computed across these paired segments. Thus, the analysis captures the overall relationship between neural activity and ongoing movement, rather than eventaligned dynamics. We have revised the text to clarify both the use of speed and the implementation of the 200 ms window.

      (4) P14: Discussion of AA19 and AA39 tasks: It would be helpful to clearly specify what percentage of actions you would expect given no learning, is it the 23% action dashed line indicated in the top panel of Figure 2B?

      The expected percentage of actions under no learning is not fixed, as it depends on the rate of spontaneous (non–cue-driven) crossings. In these tasks, we estimate this baseline using behavior during the noUS condition, where the action rate is ~23% (Fig. 2B). In the AA19 and especially AA39 tasks, this baseline decreases because spontaneous inter-trial crossings (ITCs) are progressively reduced, leading to lower expected action rates under no-learning conditions. Thus, the 23% baseline derived from noUS is lower in the AA19/39 tasks. In other studies, we explicitly included NoCS (no-cue) trials to estimate chance performance; however, in the present design we rely on the noUS baseline and the observed changes in ITC rate. We have clarified this point in the text.

      (5) P15 L2: "Considering tone intensity (Fig. 2B), CS1 avoids latencies increased at medium and high intensities but not a low intensity." This is confusing. Are you referring to the AA39 triangles under CS1 in the middle panel, left? They are all above the dashed reference line. So the plot seems to contradict the statement. If you are referring to AA19, the red dots also seem to show the opposite of the statement.

      The dashed reference line reflects latency during the noUS condition and is included for visual reference; however, these values are not directly comparable to those in the AA tasks, as noUS latencies are largely unconstrained and reflect baseline behavior rather than learned responding. The statement in the text refers specifically to changes across AA conditions, consistent with our analysis approach throughout the manuscript, where values are compared to the immediately preceding condition. In this case, we are referring to AA39 (triangles) relative to AA19 (circles). Under this comparison, CS1 avoidance latencies increase at medium and high intensities, but not at low intensity, consistent with the statistical contrasts. We have revised the text to clarify the points.

      (6) P17: "Movement and neural measures subtract the baseline from the other three windows at a trial level." Do you mean to say that for each measure, the baseline was subtracted? How is baseline defined (over which time window)?

      The baseline is defined in that same paragraph as the −0.5 to 0 s pre-CS window. To improve clarity, we have revised the text to explicitly restate this definition in the sentence describing baseline subtraction.

      (7) P17: "Fig. 2-Supplement 2A,B shows model-derived marginal means of movement averaged across tone intensities." Some explanation needs to be provided, since the previous figures show a dependence of behavior on tone intensity. Are you doing this based on Fig. 2-S1?

      Yes, these results are derived from the same model of the full data shown in Fig. 2–S1. In this particular analysis, tone intensity was included in the model but not retained when computing marginal means and contrasts, effectively averaging across intensity levels. The rationale for this approach is that tone intensity was primarily used to increase behavioral variability, particularly error rates, which are otherwise low in this task. Averaging across intensity therefore improves statistical power and allows us to more clearly isolate the effects of the primary factors of interest. We have clarified this point in the text.

      (8) P18: "Orienting magnitude was strongly dependent on tone intensity...". However, in Figure 2-S2, there is no information about tone intensity. So how is the reader supposed to see this? Same issue on P19 when discussing the action window. Generally, the description of Figure 2-S1 and S2 is difficult to follow and should be improved. It is not clear that all panels are referred to in the text.

      We have revised the start of the Movement section to clarify how tone intensity is treated across analyses and figures. Specifically, tone intensity is included as a factor in all statistical models; however, for clarity of presentation, it is sometimes collapsed in figures to reduce dimensionality and to emphasize other task-related factors. This manipulation was introduced primarily to increase behavioral variability (particularly error rates), thereby improving sensitivity for estimating the effects of the other task variables.

      We have also clarified when we reference Fig. 2–S2 legend that, although intensity is not displayed in the figure for visualization purposes, it is included in the underlying model and its effects are reported in the supplement.

      (9) P22, 23: Windows are mentioned, but not defined or indicated in figures.

      We have clarified in the text that the same time windows defined for movement analyses (baseline, orienting, action, and from-action) were also used for the neural analyses.

      (10) P22: "Covariates were standardized within each window so that estimated marginal means reflected ΔF/F at average covariate values." It is unclear what was done exactly. What do you mean by "standardized"? Maybe give an example here and elaborate in the methods.

      By “standardized within each window,” we mean that covariates were z-scored within each analysis window (i.e., each covariate was transformed to have a mean of 0 and a standard deviation of 1 within that window). This ensures that estimated marginal means correspond to ΔF/F evaluated at the average covariate values within each window. We have clarified this in the Methods and Results.

      (11) P24-25: Indicating spurious action on Figure 3-S2 (and in Figure 3) would help the reader follow the argument in the main text.

      We clarified this in the legends by indicating that actions not classified as AA, PA, Escape, or PA Error are spurious actions.

      (12) P25: "After controlling for ..., but this includes the effects of aversive stimulation." The second part of this sentence was not clear.

      We have clarified this sentence to indicate that avoidance errors are followed by aversive stimulation (i.e., errors are punished).

      (13) P34L3: "Classs" -> "Class".

      Fixed.

      (14) P42 top paragraph: There are two references to Figure 5-S1 panel D, but there is no panel D on the figure.

      Fixed.

      (15) P57: The sentence starting with "Random effects were specified ..." is very difficult to follow.

      We have revised this sentence to improve clarity by separating the description of the random-effects structure from the model syntax.

      (16) P57: The windows analyzed are finally defined at the bottom of this page. The information also needs to be included early in the results to improve comprehension.

      This is now included in the main text when windows are first used in the movement section.

      (17) P58: Several R packages are mentioned by name, but without specifying that they are R packages, which would facilitate reading.

      We added R.

      (18) P58 top paragraph: "Tuckey's correction", do you mean "Tukey's HSD test"?

      We thank the reviewer for noting this. We used Holm-adjusted p-values for multiple comparisons (as implemented in emmeans) and have revised the text.

      (19) P63: "features extracted from F/F" do you mean "DF/F"?

      Yes, fixed.

      (20) Figure 1B speed plots: it is not possible to visualize the lines at the movement peak because they overlap completely. You can either add an inset on the left of the peak (for each panel), magnifying that region, or play with the transparency of the traces to improve visibility. There is a similar issue in Figure 5A, B. (Alternatively, if it is not possible to solve the issue graphically, explicitly state that traces overlap.)

      We have fixed this by making some traces dashed in Figure1 and 1-S1, which reveals the underlying traces. We also stated that the peak speed completely overlaps. In Figure 5, we stated that traces overlap as expected; transparency or dashing does not work well with the colors used in Figure 5 and in fact the overlap emphasizes the similarity of the movements.

      (21) Legend 1A: abbreviation CCF not defined. Is it anterior to the left? Abbreviation WM not defined. The right panels are unclear. The legend states that they show a schematic of the location of the optical fibers, but that was not clear. Do the dots indicate the location of the fibers? Is the green region indicative of V1? Same for dark gray in the mPFC panel. What are the lighter grey regions and the blue region? Does 'lateral' mean 'lateral from midline'? Please clarify these points.

      CCF is defined in Methods, and the typesetting process will adjust abbreviations as needed per the journal. We have defined MW and clarified all the other points in the legend.

      (22) 1B: "peaks taken at a fixed interval > 5 s", this is a bit confusing. If the interval is fixed, the exact time interval should be given. If it is > 5 s, then this suggests that it is not fixed. Do you mean "at intervals > 5 s"?

      Yes, fixed.

      (23) Figure 1-S1C: is the area the integral of the z-scored DF/F above zero DF/F? If so, it should have units of seconds (integral over dt of a dimensionless variable). Similarly, the Peak is a z-score value? In addition, is the time to peak in seconds? What is zero? Peak time of movement?

      We thank the reviewer for raising these points. We have clarified the terminology in the text and figure. Specifically, “area” was inaccurately labeled and refers to the mean z-scored ΔF/F within each analysis window (not a time integral). Peak values correspond to the maximum z-scored ΔF/F within the window, and time to peak is reported in seconds relative to the alignment point. We have also clarified the definition of time zero and included these definitions in Methods.

      (24) Figure 2-S1: It is not clear if this figure is obtained by averaging across all animals. Please explain in the legend.

      We clarified that values represent averages across mice.

      (25) Figure 2-S2: Are the speeds in A and B in units of cm/s (vertical axis)? This needs to be indicated.

      We have clarified in the figure legend that movement speed is expressed in cm/s.

      (26) Figure 5A, scale bar: It looks like a Delta is missing in front of F because the label reads 0.5 F/F instead of 0.5 DF/F. I am unclear why there are three colored traces for the speed panels. If the colors denote neuron classes, does this mean they were recorded in different sessions, allowing the authors to distinguish activation speed for each class separately?

      We fixed the scale bar typo. The speed traces in the bottom panels are shown to illustrate that movement is highly similar across activation types within each avoidance mode, indicating that the observed large differences in neural activity cannot be attributed to differences in movement. Minor differences in the speed traces arise because activation types are composed of neurons that can be recorded in the same or different sessions, and each activation type may not be present in every session. We added several sentences to this section that should fully clarify the issue.

      (27) Figure 4-S1 legend B: Please indicate why the two panels are missing for the PA case (for the confused reader).

      We have clarified in the legend that panels are not shown for correct CS2 passive avoids because these trials do not involve an action, and therefore from-action alignment cannot be defined.

      (28) Figure 5-S A, B: Units missing for speed.

      Fixed.

      Reviewer #3 (Recommendations for the authors):

      I cannot assess the scientific validity of the study design as it is too far away from my direct field of expertise. But I found the authors' arguments convincing, and the results sound pretty consistent with the little I know of the field. The recording methods are good and the statistical analysis robust. So my only recommendation for the authors would be to work on the figures to improve clarity.

      Thank you. We have introduced various changes that we hope will facilitate readability for a wider audience while preserving the necessary details.

    1. eLife Assessment

      This important study demonstrates that paternal diet influences not only testicular morphology but also placental and fetal development, supporting a role for paternal contributions to offspring health. The study also considers potential links between the microbiome and male reproductive health. By combining transcriptomic and histological analyses across multiple tissues, the evidence supporting the central conclusions of the study is convincing.

    2. Reviewer #1 (Public review):

      [Editors' note: this version has been assessed by the Reviewing Editor without further input from the original reviewers. The authors have addressed the comments raised in the previous round of review.]

      Summary:

      Morgan et al. studied how paternal dietary alteration influenced testicular phenotype, placental and fetal growth using a mouse model of paternal low protein diet (LPD) or Western Diet (WD) feeding, with or without supplementation of methyl-donors and carriers (MD). They found diet- and sex-specific effects of paternal diet alteration. All experimental diets decreased paternal body weight and the number of spermatogonial stem cells, while fertility was unaffected. WD males (irrespective of MD) showed signs of adiposity and metabolic dysfunction, abnormal seminiferous tubules and dysregulation of testicular genes related to chromatin homeostasis. Conversely, LPD induced abnormalities in the early placental cone, fetal growth restriction and placental insufficiency, which was partly ameliorated by MD. The paternal diets changed placental transcriptome in a sex-specific manner and led to a loss of sexual dimorphism in the placental transcriptome. These data provide a novel insight on how paternal health can affect the outcome of pregnancies, which is often overlooked in prenatal care.

      Strengths:

      The authors have performed a well-designed study using commonly used mouse models of paternal underfeeding (low protein) and overfeeding (Western diet). They performed comprehensive phenotyping at multiple timepoints including of the fathers, the early placenta and late gestation feto-placental unit. The inclusion of both testicular and placental morphological and transcriptomic analysis is a powerful non-biased tool for such exploratory observational studies. The authors describe changes in testicular gene expression revolving around histone (methylation) pathways that are linked to altered offspring development (H3.3 and H3K4), which is in line with hypothesised paternal contributions to offspring health. The authors report sex differences in control placentas that mimic those in humans, providing potential for translatability of the findings. The exploration of sexual dimorphism (often overlooked) and its absence in response to dietary modification is novel and contributes to the evidence-base for the inclusion of both sexes in developmental studies.

      Comments on revised version:

      The authors have done a great job addressing my concerns. The description of the data analysis and the figures are now much clearer. The inclusion of the potential links between the microbiome and male reproductive fitness is informative and improves the flow of the discussion.

    3. Reviewer #2 (Public review):

      Summary:

      The authors investigated the effects of a low-protein diet (LPD) and a high sugar- and fat-rich diet (Western diet, WD) on paternal metabolic and reproductive parameters and feto-placental development and gene expression. They did not observe significant effects on fertility; however, they reported gut microbiota dysbiosis, alterations in testicular morphology, and severe detrimental effects on spermatogenesis. In addition, they examined whether the adverse effects of these diets could be prevented by supplementation with methyl donors. Although LPD and WD showed limited negative effects on paternal reproductive health (with no impairment of reproductive success), the consequences on fetal and placental development were evident and, as reported in many previous studies, were sex-dependent.

      Strengths:

      This study is of high quality and addresses a research question of great global relevance, particularly in light of the growing concern regarding the exponential increase in metabolic disorders, such as obesity and diabetes, worldwide. The work highlights the importance of a balanced paternal diet in regulating the expression of metabolic genes in the offspring at both fetal and placental levels. The identification of genes involved in metabolic pathways that may influence offspring health after birth is highly valuable, strengthening the manuscript and emphasizing the need to further investigate long-term outcomes in adult offspring.

      The histological analyses performed on paternal testes clearly demonstrate diet-induced damage. Moreover, although placental morphometric analyses and detailed histological assessments of the different placental zones did not reveal significant differences between groups, their inclusion is important. These results indicate that even in the absence of overt placental phenotypic changes, placental function may still be altered, with potential consequences for fetal programming.

      Comments on revised version:

      The authors have adequately addressed all my previous comments.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Morgan et al. studied how paternal dietary alteration influenced testicular phenotype, placental and fetal growth using a mouse model of paternal low protein diet (LPD) or Western Diet (WD) feeding, with or without supplementation of methyl-donors and carriers (MD). They found diet- and sex-specific effects of paternal diet alteration. All experimental diets decreased paternal body weight and the number of spermatogonial stem cells, while fertility was unaffected. WD males (irrespective of MD) showed signs of adiposity and metabolic dysfunction, abnormal seminiferous tubules and dysregulation of testicular genes related to chromatin homeostasis. Conversely, LPD induced abnormalities in the early placental cone, fetal growth restriction and placental insufficiency, which was partly ameliorated by MD. The paternal diets changed placental transcriptome in a sex-specific manner and led to a loss of sexual dimorphism in the placental transcriptome. These data provide a novel insight on how paternal health can affect the outcome of pregnancies, which is often overlooked in prenatal care.

      Strengths:

      The authors have performed a well-designed study using commonly used mouse models of paternal underfeeding (low protein) and overfeeding (Western diet). They performed comprehensive phenotyping at multiple timepoints including of the fathers, the early placenta and late gestation feto-placental unit. The inclusion of both testicular and placental morphological and transcriptomic analysis is a powerful non-biased tool for such exploratory observational studies. The authors describe changes in testicular gene expression revolving around histone (methylation) pathways that are linked to altered offspring development (H3.3 and H3K4), which is in line with hypothesised paternal contributions to offspring health. The authors report sex differences in control placentas that mimic those in humans, providing potential for translatability of the findings. The exploration of sexual dimorphism (often overlooked) and its absence in response to dietary modification is novel and contributes to the evidence-base for the inclusion of both sexes in developmental studies.

      Comments on revised version:

      The authors have done a great job addressing my concerns. The description of the data analysis and the figures are now much clearer. The inclusion of the potential links between the microbiome and male reproductive fitness is informative and improves the flow of the discussion.

      Reviewer #2 (Public review):

      Summary:

      The authors investigated the effects of a low-protein diet (LPD) and a high sugar- and fat-rich diet (Western diet, WD) on paternal metabolic and reproductive parameters and feto-placental development and gene expression. They did not observe significant effects on fertility; however, they reported gut microbiota dysbiosis, alterations in testicular morphology, and severe detrimental effects on spermatogenesis. In addition, they examined whether the adverse effects of these diets could be prevented by supplementation with methyl donors. Although LPD and WD showed limited negative effects on paternal reproductive health (with no impairment of reproductive success), the consequences on fetal and placental development were evident and, as reported in many previous studies, were sex-dependent.

      Strengths:

      This study is of high quality and addresses a research question of great global relevance, particularly in light of the growing concern regarding the exponential increase in metabolic disorders, such as obesity and diabetes, worldwide. The work highlights the importance of a balanced paternal diet in regulating the expression of metabolic genes in the offspring at both fetal and placental levels. The identification of genes involved in metabolic pathways that may influence offspring health after birth is highly valuable, strengthening the manuscript and emphasizing the need to further investigate long-term outcomes in adult offspring.

      The histological analyses performed on paternal testes clearly demonstrate diet-induced damage. Moreover, although placental morphometric analyses and detailed histological assessments of the different placental zones did not reveal significant differences between groups, their inclusion is important. These results indicate that even in the absence of overt placental phenotypic changes, placental function may still be altered, with potential consequences for fetal programming.

      Comments on revised version:

      The authors have adequately addressed all my previous comments.

      We would like to thank the Editor and Reviewers for their consideration and thoughtful comments.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      It was a little difficult seeing exactly what had changed in the manuscript without going back to the original version as not all changes were marked yellow in the revised version. In future, I would recommend clearly labelling all changes to aid the referee.

      We apologise to the reviewer for the difficulty in seeing where the changes had been made. We acknowledge their comments for subsequent manuscripts and thank them for their time, consideration and comments.

      Small comments:

      (1) I noted the description of the statistical analysis now includes the addition of paternal age/diet duration in the generalised mixed model for the late gestation cohort. Was this also done for the early gestation cohort? If not, why not?

      For the data presented in Figure 6, each data point was obtained from a separate male. As such, we were not able to factor in male effects, as no male sired more than one litter (Figure 6A). Additionally, only one conceptus per male was analysed for ECP area and development meaning paternal age effects could not be accounted for.

      (2) The legend of Figure 2 states that "Data were analysed using either a one-way ANOVA with Holm-Sidak post hoc tests for multiple comparison respectively". Is some text missing here?

      We thank the reviewer for spotting this typographical error. This has now been corrected and reads “Data were analysed using a one-way ANOVA with Holm-Sidak post hoc tests for multiple comparison”.

      (3) Figure 1 remains low resolution in the reviewer's copy. If possible, it would be good to upload a higher resolution figure during production of the article.

      We apologies that the resolution of this figure was still low for the Reviewer. We have checked the dpi and it is 300x300. However, we will ensure the quality is as high as possible during production.

      Reviewer #2 (Recommendations for the authors):

      One minor remaining issue: the caption of Figure 3 still contains the phrase "non-fasting metabolic status", which should be deleted from this sentence.

      We thank the reviewer for spotting this typographical mistake. This has now been corrected.

    1. eLife Assessment

      This study presents a valuable finding on the direct cytotoxic effects of DuoHexaBody-CD37 in diffuse large B-cell lymphoma through antibody clustering, independent of complement. The central findings are supported by solid evidence, although some mechanistic details, including the specific Fc receptor requirements for crosslinking-mediated cytotoxicity, remain unresolved. As the findings are based primarily on in vitro models, further validation would be required to support broader translational conclusions. The previous review comments were addressed by the authors and have improved the work.

    2. Joint Public Review:

      [Editor's Note: The previous reviewers comments were felt to be addressed by the reviewers and myself and have improved the work.]

      In this study, the authors suggest that DuoHexaBody-CD37, a biparatopic CD37-targeting antibody, can induce direct cytotoxicity in diffuse large B-cell lymphoma (DLBCL) cells through antibody clustering and SHP-1 activation, independent of complement. They further propose that DuoHexaBody-CD37 inhibits cytokine-mediated pro-survival signalling, suggesting a broader role for CD37-directed therapy in disrupting tumour supportive signalling networks.

      A strength of the study is the systematic in vitro characterisation of signalling responses to DuoHexaBody-CD37 across both malignant and normal B-cells. The inclusion of phosphoproteomic profiling and mutant constructs provides mechanistic detail, and the findings may be of interest to researchers working on antibody therapeutics in lymphoma.

      However, the evidence supporting key mechanistic processes - particularly the specific subtype requirement for Fc receptor crosslinking - is incomplete and would benefit from further functional validation. While CD37 has been explored previously as a therapeutic target, this study does add mechanistic insight into direct cytotoxicity and cytokine modulation. Nevertheless, the exclusive reliance on in vitro systems makes the translational relevance unclear.

      Overall, the study provides valuable insight into CD37-mediated signalling in lymphoma cells, but the evidence remains incomplete to support broader conclusions about therapeutic impact. The additional mechanistic data included during revision are informative, but the precise basis of the observed cytotoxic effects remains incompletely defined.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Joint Public Review:

      In this study, the authors suggest that DuoHexaBody-CD37, a biparatopic CD37-targeting antibody, can induce direct cytotoxicity in diffuse large B-cell lymphoma (DLBCL) cells through antibody clustering and SHP-1 activation, independent of complement. They further propose that DuoHexaBody-CD37 inhibits cytokinemediated pro-survival signalling, suggesting a broader role for CD37-directed therapy in disrupting tumour supportive signalling networks.

      A strength of the study is the systematic in vitro characterisation of signalling responses to DuoHexaBodyCD37 across both malignant and normal B-cells. The inclusion of phosphoproteomic profiling and mutant constructs provides mechanistic detail, and the findings may be of interest to researchers working on antibody therapeutics in lymphoma.

      However, the evidence supporting key mechanistic processes - particularly the role of SHP-1 in mediating cytotoxicity and the requirement for Fc receptor crosslinking - is incomplete and would benefit from further functional validation. While CD37 has been explored previously as a therapeutic target, this study does add mechanistic insight into direct cytotoxicity and cytokine modulation. Nevertheless, the exclusive reliance on in vitro systems makes the translational relevance unclear. Overall, the study provides valuable insight into CD37-mediated signalling in lymphoma cells, but the evidence remains incomplete to support broader conclusions about therapeutic impact.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      In the manuscript, Singh and colleagues reveal a new mechanism via which DuoHexaBody-CD37 induces DLBCL cytotoxicity, which is independent of external factors, such as the effector cells and the complement system. As cited by the authors, the induction of B cell death has previously been demonstrated for antibodies directed against B cells, including anti-CD37 (otlertuzumab). Furthermore, the majority of these observations are made using in vitro systems, and it is not clear if this phenomenon happens in vivo or not?

      Thank you for pointing this out. We would like to refer to previous report that have demonstrated potent anti-tumor activity of DuoHexaBody-CD37 in vivo in cell line- and patient-derived xenograft models from different B-cell malignancy subtypes [PMID: 32341336]. Moreover, DuoHexaBody-CD37 ex vivo activity has been shown in primary tumor cell samples from a large cohort of newly diagnosed (ND) and relapsed/refractory (RR) patients with a broad range of B-cell malignancies, including chronic lymphocytic leukemia (CLL) and B-cell non-Hodgkin lymphoma, including diffuse large B-cell lymphoma (DLBCL) [PMID: 33324950]. We refer to these data in the introduction.

      The presented data suggest that DuoHexaBody-CD37 relies on Fc crosslinking for its optimal cytotoxic activity. Investigating which FcγR is needed for this purpose would have been useful, as FcγRIIb, for instance, has been shown to be important in supporting the therapeutic function of mAbs like anti-CD40.

      We thank you for this suggestion. To further investigate the role of specific FcγRs in effector cell-mediated Fc cross-linking, PBMC-mediated direct cytotoxicity was compared across various immune cell subsets: B cells (FcγRIIb), NK cells (FcγRIIIa, IIc), monocytes (FcγRI, IIa/b, IIIb), and T cells (no confirmed FcγR expression). Notably, all immune cells subsets expressing FcγRs exhibited similar or enhanced cytotoxicity against DLBCL cells compared to the total PBMC pool. These results indicate that DuoHexaBody-CD37 induced killing is independent of specific FcγR subtypes. We have added these new data to new Figure 1C.

      Specific comments:

      (1) Line 92:93: The authors should also cite the following reference for rituximab: https://pubmed.ncbi.nlm.nih.gov/19620786/ .

      We have added this reference to the revised paper (ref. 31).

      (2) Figure 1 and 2: Since cell death was only observed in the presence of crosslinking in Figure 1, Figure 2 should also investigate the clustering and internalization of CD37 in the presence of the same secondary antibody. It is likely that DuoHexaBody-CD37 will induce receptor internalization upon crosslinking.

      To further investigate internalization, we compared the surface availability of CD37 with and without Fc-mediated crosslinking of DuoHexaBody-CD37 across cell lines. Little to no decrease in the surface availability of CD37 upon Fc-mediated crosslinking (new Supplementary Figure 2) was observed.

      In addition, we performed cluster analysis studies in lymphoma cells treated with DuoHexabody-CD37 in the absence and presence of Fc-crosslinking (and respective isotype controls). We observed that DuoHexabodyCD37 by itself was already sufficient to induce CD37 clustering, which was further enhanced by Fc-crosslinking (new Figure 2A, B).

      (3) Figure 3A: the Y-axes should be clearly labelled.

      Done.

      (4) Figure 6: What is the reason for the selective use of different cell lines in Figure 6? Additionally, only 1 donor has been used for the IL-6 analysis.

      The reviewer is indeed correct in noticing that only one cell line has been used for the IL-6 analysis. We observed that HBL-1 cells were the only cell line that were sensitive to IL-6 treatment, in contrast to IL-4 and IL-21. We have added this sentence to the discussion to explain this better: “p-STAT3 downregulation upon DuoHexaBody-CD37 treatment in presence of IL-6 requires further investigation in additional IL-6-responsive cell lines, as HBL1 was the only IL-6-responsive lymphoma cell line tested in this study.”

      The data shown in Figure 6 are results from at least three independent experiments (each dot is an independent experiment, not a donor).

      Reviewer #2 (Recommendations for the authors):

      Singh et al uncover a novel mechanism of action for the DuoHexaBody-CD37 against DLBCL, whereby it is shown to induce direct cytotoxicity independent of complement and to activate the phosphatase SHP-1. DuoHexaBody-CD37 is also shown to reduce cytokine induced JAK/STAT signalling in DLBCL cells.

      Strengths:

      The authors provide novel insight into CD37 targeting across normal B cells, DLBCL and Burkitt lymphoma cells, which have the potential to inform clinical translation.

      Weaknesses:

      The mechanisms behind differences in signalling and apoptosis between normal B cells, Burkitt lymphoma, and DLBCL cells with CD37 targeting require further clarification. In particular, the contribution of SHP-1 to this effect is not clear and indeed is increased in both normal b cells and DLBCL cells.

      Key points that require addressing are below:

      (1) Viability of Burkitt lines was less affected than DLBCL in Figure 1- this should be compared with surface CD37 expression in these same lines to determine whether this accounts for the effect. This difference is a key finding for clinical translation.  

      We thank the reviewer for this suggestion and we have now performed flow cytometry analysis across DLBCL and Burkitt cell lines upon staining with two different anti-CD37 antibodies (WR17, M-B371) to quantify membrane CD37 expression (new Supplementary Figure 1B). These data show that CD37 expression levels are not directly related to DuoHexaBody-CD37 mediated cytotoxicity in the studied B cell lines. 

      (2) pSHP1 is increased in both normal B cells (lines 169-171, Figure 3C) and DLBCL and yet the authors state specific upregulation of pSHP1 in DLBCL as a reason for induced cytotoxicity in DLBCL (lines 183-185). This requires clarification and experimental confirmation. The authors should investigate normal B cells in the cytotoxicity assays as in Figure 1 for comparison. The authors should also confirm the importance of SHP-1 in this apoptosis process using specific SHP pharmacological agents, which are commercially available.

      To analyze the role of SHP1 mediated signaling in induced cytotoxicity of DLBCL, SHP1 knock outs (KO) were generated in HBL1 and OciLy7 cell lines using CRISPR Cas9 technology (new Supplementary figure 5A). The wild type and SHP-1 KO cell lines were then compared for differences in cytotoxicity after treatment with DuoHexaBody-CD37 with and without Fc-crosslinker. No differences in cytotoxicity were observed between the wild type and knock out cell lines (new Supplementary figure 5B), indicating that DuoHexaBody-CD37induced SHP1 signaling does not play a direct role in the increased cytotoxicity. We have added these new data to the results and rephrased the role of SHP-1 in the revised manuscript. 

      (3) It would be informative to assess caspase activation and PARP cleavage across normal B cells, DLBCL and Burkitt under these conditions for clarity on apoptosis induction.

      We thank the reviewer and we agree it would be informative to confirm apoptosis induction in the cell lines upon DuoHexaBody-CD37 treatment. We addressed this question by flow cytometric analysis of different lymphoma cell lines stained with/without Annexin V (apoptosis marker) and 7AAD (late apoptotic/necrotic marker) in presence or absence of DuoHexaBody-CD37, with and without Fc-crosslinking. These experiments demonstrate that Fc-crosslinking DuoHexaBody-CD37 leads to the induction of apoptosis across DLBCL cell lines (new Supplementary Figure 1A).

      (4) The regulation of JAK/STAT signalling by SHP-1 should be mentioned in the introduction and discussion as this is a key finding of the manuscript.

      Based on the new data on the role of SHP-1 (Suppl. Fig. 5), we have rephrased the text on the SHP1 in the discussion of the revised paper: “DuoHexaBody-CD37 treatment also led to an increase in SHP1 mediated signaling, however we could not confirm a direct role of SHP1 signaling in DuoHexaBody-CD37-mediated cytotoxicity. DLBCL cells may undergo signal rewiring upon SHP1 knockdown by altered levels of p‑AKT, p‑STAT3, and p‑STAT6, or SHP2 may compensate for the loss of SHP1. It is currently unclear what the biological implications are of the increased SHP1 signaling observed upon treatment with DuoHexaBody-CD37 in DLBCL cells.”

      (5) The authors state that DuoHexabody-37 is particularly effective at downregulating STAT signalling in the presence of IL-6 (lines 302-303) however, this is not statistically significant in the results section. There is a trend for a reduction, however, further experimental repeats would clarify this.

      We agree with the reviewer, and rewrote the text on IL-6 in the discussion: “p-STAT3 downregulation upon DuoHexaBody-CD37 treatment in presence of IL-6 requires further investigation in additional IL-6-responsive cell lines, as HBL1 was the only IL-6-responsive lymphoma cell line tested in this study.”

    1. eLife Assessment

      This valuable study re-evaluates a published simulation model on the role of heterozygote advantage in shaping MHC diversity. By modifying key modeling assumptions, the author argues that the original conclusions depend on a narrow and potentially unrealistic parameter range. While the work is in principle solid, the robustness of this claim is viewed differently by the reviewers. The manuscript further proposes an alternative modeling framework in which expansion of the MHC gene family allows homozygotes to outperform heterozygotes, thereby challenging the idea that heterozygote advantage alone can account for high allelic diversity at MHC loci. The topic is highly relevant for eco-immunology and evolutionary genetics, although it is not clear yet how well the model generalizes to other genes with different patterns of haplotype diversity in the population and different degrees of heterozygous advantage.

    2. Reviewer #1 (Public review):

      The manuscript "Heterozygote advantage cannot explain MHC diversity, but MHC diversity can explain heterozygote advantage" explores two topics. First, it is claimed that the recently published by Mattias Siljestam and Claus Rueffler conclusion (in the following referred to as [SR] for brevity) that heterozygote advantage explains MHC diversity does not withstand an even very slight change in ecological parameters. Second, a modified model that allows an expansion of MHC gene family shows that homozygotes outperform heterozygotes. This is an important topic and could be of potential interest to the readership of eLife if the conclusions are valid and non-trivial.

      The resubmitted manuscript addresses several questions from my previous review. In particular, there is a more detailed description of how the code of Siljestam and Rueffler ([SR]) was used for the simulations and the calculation of the factor 2.7 x 10^43 that is the key to the alleged breakdown of the numerical reasoning presented by in [SR].

      Yet I think that important aspects of my critique of the first statement of the manuscript about the flaws of [SR] model remain unanswered. I guess the discussion becomes rather general about the universality and robustness of various types of models to parameter changes. My point is that none of the models is totally universal. The model in [SR] is not phenomenological as none of the parameters or functional forms were derived empirically. Instead, it is a proof of principle demonstration that inevitably grossly simplifies the actual immune response. The choice of constants and functions used in Eqs. (1-5) is dictated by the mathematical convenience and works in a limited range of parameter values. It is shown in [SR] that for 3 pathogens and reasonable "virulence " \nu, the alleles branch. These conclusions are supported by the analytically derived Adaptive Dynamics branching criteria (7), which, contrary to the statement is the cover letter (" It is clear from Fig. 4 of Siljestam and Rueffler that the branching condition is far from sufficient for high MHC diversity.") is perfectly confirmed by the simulation data shown in Fig. 4.

      The mathematical simplicity of the [SR] model generates various artifacts, such as the mentioned by the Author reduction of the "condition" by an enormous factor 2.7 x 10^43 and the resulting decrease in the "survival" induced by the addition of a new pathogen. This occurs at the very large value of \nu=20, whose effect is enormous due to the Gaussian form of (1), which, once again, was chosen for the mathematical convenience. In reality, a new pathogen cannot reduce the "survival" by such a factor as it would wipe out any resident population. So to compensate for such an artifact, the additional factor c_max was introduced to buffer such an excess. There is no reason to fix c_max once for an arbitrary number of pathogens, because varying c_max basically reflects the observation that a well-adapted individual must have a reasonable survival probability. At the same time, there are many ways in which the numerical simulation may break down when the survival rates become of the order of 10^(-43) instead of one, so it comes to no surprise that the diversification, predicted by the adaptive dynamics, does not readily occur in the scenario with an addition or removal of the 8th pathogen with a very high virulence \nu=20.

      I have doubts that the reported breakdown of the [SR] model with fixed c_max remains observable with less extreme values of m and \nu (say, for \nu=7 and m=3 plus or minus 1 used in Fig. 3 in the manuscript).

      So I still find the claim that " the phenomenon that leads to high diversity in the simulations of Siljestam and Rueffler depends on finely tuned parameter values" is not well substantiated.

    3. Reviewer #2 (Public review):

      Summary:

      This study addresses the population genetic underpinnings of the extraordinary diversity of genes in the MHC, which is widespread among jawed vertebrates. This topic has been widely discussed and studied, and several hypotheses have been suggested to explain this diversity. One of them is based on the idea that heterozygote genotypes have an advantage over homozygotes. While this hypothesis lost early on support, a reason study claimed that there is good support for this idea. The current study highlights an important aspect that allows us to see results presented in the earlier published paper in a different light, changing strongly the conclusions of the earlier study, i.e., there is no support for a heterozygote advantage. This is a very important contribution to the field. Furthermore, this new study presents an alternative hypothesis to explain the maintenance of MHC diversity, which is based on the idea that gene duplications can create diversity without heterozygosity being important. This is an interesting idea, but not entirely new.

      Strength:

      (1) A careful re-evaluation of a published model, questioning a major assumption made by a previous study.

      (2) A convincing reanalysis of a model that, in the light of the re-analysis-loses all support.

      (3) A convincing suggestion for an alternative hypothesis.

      Weakness:

      (1) The title of the study is catchy, but it is explained only in the very end of the paper.

    4. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1:

      Yet I think that important aspects of my critique of the first statement of the manuscript about the flaws of [SR] model remain unanswered.

      I believe that I have fully addressed the points in the earlier review. The reviewer had doubted that my results were correct, attributing them to “a poor setup of the model” on my part. The reviewer stated that if I were correct about the factor of >10<sup>43</sup> change in cmax, this would “naturally break down all the estimates and conclusions made in Siljestam and Rueffler” (S&R).

      It appears that the reviewer is now convinced that my results represent a faithful analysis of the models on which S&R based their claims. The reviewer now contends that these results, including the factor of >10<sup>43</sup>, present no difficulties for the claims of S&R after all. In fact, this enormous factor of >10<sup>43</sup> is now claimed to support the conclusions of S&R by invalidating my conclusions. I respond to these new and very different arguments in what follows.

      As I stated in the first round of review, the issue is not the enormity of this factor per se, but the fact that the compensatory adjustment of cmax conceals the true effects of changes in other parameters. These effects are large; small changes to the parameter values mostly eliminate the diversity that the model is claimed to explain.

      The model in [SR] is not phenomenological as none of the parameters or functional forms were derived empirically. Instead, it is a proof of principle demonstration that inevitably grossly simplifies the actual immune response.

      The hidden sensitivity of the results of S&R to paramater values is sufficient to invalidate them as a proof of principle. The manuscript goes further and explains how the problem "is not specific to the details of the models of Siljestam and Rueffler, but is inherent in the phenomenon invoked to allow high diversity" because "any change that affects condition by as much as the difference between MHC heterozygotes and homozygotes will eliminate high equilibrium diversity". This general principle addresses all of the reviewer's points.

      In reality, a new pathogen cannot reduce the "survival" by such a factor as it would wipe out any resident population. So to compensate for such an artifact, the additional factor cmax was introduced to buffer such an excess. There is no reason to fix cmax once for an arbitrary number of pathogens, because varying cmax basically reflects the observation that a well-adapted individual must have a reasonable survival probability.

      This is not a legitimate reason for making compensatory, diversity-promoting adjustments to cmax when evaluating sensitivity to other parameters. If the number of pathogens or their virulence changes, cmax obviously does not automatically change along with it. If the population or species consequently goes extinct, then it goes extinct. If it persists, it does so with the same value of cmax.

      The possibility of extinction arguably puts a minimum value on cmax, but it does not restrict it to a range of values that conveniently leads to high MHC diversity. In the examples that I analyzed, slightly decreasing the number of pathogens or their virulence, which increases survivability, eliminates diversity. This phenomenon obviously cannot be dismissed on the grounds that survivability would be too low for the species to exist.

      S&R in effect assume that the condition of the most fit homozygote remains fixed, regardless of the number of pathogens, their virulence, and myriad other differences between species. It is this assumption that is without justification.

      At the same time, there are many ways in which the numerical simulation may break down when the survival rates become of the order of 10^(-43) instead of one

      I am not sure what is meant by “the numerical simulation may break down”. Numerical error is not a tenable explanation of the lack of diversity observed in that simulation. The outcome is exactly what is expected from purely theoretical considerations: conditions of all genotypes fall on the steep part of the curve, making the mechanism proposed by S&R largely inoperative, so a pair of alleles forming a fit heterozygote comes to predominate. The numerical simulation is actually superfluous.

      Low survival rates are completely irrelevant to the effect of decreasing the number of pathogens or their virulence, which does not lower survival rates, but does eliminate diversity.

      so it comes to no surprise that the diversification, predicted by the adaptive dynamics, does not readily occur in the scenario with an addition or removal of the 8th pathogen with a very high virulence \nu=20.

      Whether or not it surprising, the lack of diversity is a problem for the claims of S&R, as there is no reason to expect the number of pathogens to have just the right value to produce high diversity. Furthermore, for many combinations of values of the other parameters (e.g., my v=19.5 and 20.5 examples), no number of pathogens leads to high diversity.

      Again, the general principle mentioned above makes the details that the reviewer refers to irrelevant. Nonetheless, some additional remarks are in order:

      (1) This comment ignores the fact that removal of a pathogen, or a slight decrease in “virulence”, eliminates diversity without lowering survival rates.

      (2) Small increases or decreases in v (virulence) eliminate diversity without having such large effects on condition.

      (3) In the example emphasized by the reviewer, mean survival rates are nowhere near as low as 10<sup>-43</sup>. Only homozygotes have such low fitness.

      (4) The adaptive dynamics predict the low diversity seen in the simulations, contrary to what the reviewer seems to suggest. Elimination of diversity is not an artifact of the simulation.

      (5) v\=20 was chosen because it is most favorable to the model of S&R in that it yields the highest diversity. Indeed, S&R only observed realistically high diversity with the narrow gaussians that the reviewer objects to. With lower values of v, diversity is much lower, but even this meager diversity is eliminated by small changes in parameter values (see below). If narrow gaussians and large effects of pathogens somehow invalidate results, then they invalidate the high-diversity results of S&R.

      I have doubts that the reported breakdown of the [SR] model with fixed cmax remains observable with less extreme values of m and \nu (say, for \nu=7 and m=3 plus or minus 1 used in Fig. 3 in the manuscript).

      These doubts are unwarrented. With the suggested parameter values, for example, increasing or decreasing m by 1 reduces the effective number of alleles to around 1 or 2. This can easily be checked using the simulation code of S&R, as detailed in my initial response and now in a Supplementary Text. Even without this result, the general principle mentioned above tells us that considering other regions of parameter space cannot rescue the conclusions of S&R.

      So I still find the claim that " the phenomenon that leads to high diversity in the simulations of Siljestam and Rueffler depends on finely tuned parameter values" is not well substantiated.

      What is unsubstantiated is the claim of S&R that “For a large part of the parameter space, more than 100 and up to over 200 alleles can emerge and coexist”. As my manuscript illustrates, this is an illusion created by the adjustment of one parameter to compensate for changes in others.

      The reviewer even acknowledges that “the choice of constants and functions...works in a limited range of parameter values”. Furthermore, the manuscript explains why this problem is inherent to the general phenomenon, not specific to the details of the model or parameter values.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      It appears obvious that with no or a little fitness penalty, it becomes beneficial to have MHC-coding genes specific to each pathogen. A more thorough study that takes into account a realistic (most probably non-linear in gene number) fitness penalty, various numbers of pathogens that could grossly exceed the self-consistent fitness limit on the number of MHC genes, etc, could be more informative.

      The reviewer seems to be referring to the cost of excessively high presentation breadth. Such a cost is irrelevant to the inferior fitness of a polymorphic population with heterozygote advantage compared to a monomorphic population with merely doubled gene copy number. It is relevant to the possibility of a fitness valley separating these two states, but this issue is addressed explicitly in the manuscript.

      An addition or removal of one of the pathogens is reported to affect "the maximum condition", a key ecological characteristic of the model, by an enormous factor 10^43, naturally breaking down all the estimates and conclusions made in [RS]. This observation is not substantiated by any formulas, recipes for how to compute this number numerically, or other details, and is presented just as a self-standing number in the text.

      It is encouraging that the reviewer agrees that this observation, if correct, would cast doubt on the conclusions of Siljestam and Rueffler. I would add that it is not the enormity of this factor per se that invalidates those conclusions, but the fact that the automatic compensatory adjustment of c</sub>max</sub> conceals the true effects of removing a pathogen, which are quite large.

      I am not sure why the reviewer doubts that this observation is correct. The factor of 2.7∙10<sup>43</sup> was determined in a straightforward manner in the course of simulating the symmetric Gaussian model of Siljestam and Rueffler with the specified parameter values. A simple way to determine this number is to have the simulation code print the value to which c</sub>max</sub> is set, or would be set, by the procedure of Siljestam and Rueffler for different parameter values. I have in this way confirmed this factor using the simulation code written and used by Siljestam and Rueffler. A procedure for doing so is described in the new Supplementary Text S1. In addition, I now give a theoretical derivation of this factor in Supplementary Text S2.

      This begs the conclusion that the branching remains robust to changes in cmax that span 4 decades as well.

      That shows at most that the results are not extremely sensitive to c</sub>max</sub> or K. They are, nonetheless, exquisitely sensitive to m and v. This difference in sensitivities is the reason that a relatively small change to m leads to such a large compensatory change in c</sub>max</sub>. It is evident from Fig. 4 of Siljestam and Rueffler that the level of diversity is not robust to these very large changes in c</sub>max</sub>, which include, as noted above, a change of over 43 orders of magnitude.

      As I wrote above, there is no explanation behind this number, so I can only guess that such a number is created by the removal or addition of a pathogen that is very far away from the other pathogens. Very far in this context means being separated in the x-space by a much greater distance than 1/\nu, the width of the pathogens' gaussians. Once again, I am not totally sure if this was the case, but if it were, some basic notions of how models are set up were broken. It appears very strange that nothing is said in the manuscript about the spatial distribution of the pathogens, which is crucial to their effects on the condition c.

      I did not explicitly describe the distribution of pathogens in antigenic space because it is exactly the same as in Siljestam and Rueffler, Fig. 4: the vertices of a regular simplex, centered at the origin, with unity edge length.

      The number in question (2.7∙10<sup>43</sup>) pertains to the Gaussian model with v\=20. As specified by Siljestam and Rueffler, each pathogen lies at a distance of 1 from every other pathogen, so the distance of any pathogen from the others is indeed much greater than 1/v. This condition holds, however, for most of the parameter space explored by Siljestam and Rueffler (their Fig. 4), and for all of the parameter space that seemingly supports their conclusions. Thus, if this condition indicates that “basic notions of how models are set up were broken”, they must have been broken by Siljestam and Rueffler.

      ...the branching condition appears to be pretty robust with respect to reasonable changes in parameters.

      It is clear from Fig. 4 of Siljestam and Rueffler that the branching condition is far from sufficient for high MHC diversity.

      Overall, I strongly suspect that an unfortunately poor setup of the model reported in the manuscript has led to the conclusions that dispute the much better-substantiated claims made in [SD].

      The reviewer seems to be suggesting that my simulations are somehow flawed and my conclusions unreliable. I have addressed the reasons for this suggestion above. Furthermore, I have confirmed the main conclusion—the extreme sensitivity of the results of Siljestam and Rueffler to parameter values--using the code that they used for their simulations, indicating that my conclusions are not consequences of my having done a “poor setup of the model”. I now describe, in Supplementary Text S1, how anybody can verify my conclusions in this way.

      Reviewer #2 (Public review):

      (1) The statement that the model outcome of Siljestam and Rueffler is very sensitive to parameter values is, in this form, not correct. The sensitivity is only visible once a strong assumption by Siljestam and Rueffler is removed. This assumption is questionable, and it is well explained in the manuscript by J. Cherry why it should not be used. This may be seen as a subtle difference, but I think it is important to pin done the exact nature of the problem (see, for example, the abstract, where this is presented in a misleading way).

      I appreciate the distinction, and the importance of clearly specifying the nature of the problem. However, as I understand it, Siljestam and Rueffler do not invoke the implausible assumption that changes to the number of pathogens or their virulence will be accompanied by compensatory changes to c</sub>max</sub>. Rather, they describe the adjustment of c</sub>max</sub> (Appendix 7) as a “helpful” standardization that applies “without loss of generality”. Indeed, my low-diversity results could be obtained, despite such adjustment, by combining the small change to m or v with a very large change to K (e.g., a factor of 2.7∙10<sup>43</sup>). In this sense there is no loss of generality, but the automatic adjustment of c</sub>max</sub> obscures the extreme sensitivity of the results to m and v.

      (2) The title of the study is very catchy, but it needs to be explained better in the text.

      I have expanded the end of the Discussion in the hope of clarifying the point expressed by the title.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I would like to suggest to the author that they provide essential details about their simulations that would justify their claims, and to communicate with Mattias Siljestam and Claus Rueffler whether claims of the lack of robustness could be confirmed.

      The models simulated were modified versions of those of Siljestam and Rueffler. Thus, only the modifications were described in my manuscript. I have added a more detailed description of how c</sub>max</sub> was set in the simulations concerned with sensitivity to parameter values. In addition, the new Supplementary Text S1, which describes confirmation of the lack of robustness using the code of Siljestam and Rueffler, should remove any doubt about this conclusion.

      Reviewer #2 (Recommendations for the authors):

      I have no further recommendations. The manuscript is well written and clear.

      Thank you.

      Reviewer #3 (Recommendations for the authors):

      (1) Since this is a full report and not just a letter to the editor, it would benefit from a bit more introduction of what the MHC actually is and what the current understanding of its evolution is. Currently, it assumes a lot of knowledge about these genes that might not be available to every reader of eLife.

      I have added some more information to the opening paragraph. I would also note that this report was submitted as a “Research Advance”, which may only need “minimal introductory material”.

      (2) Some more recent literature on MHC evolution should be added, e.g., the review by Radwan et al. 2020 TiG, a concrete case of MHC heterozygote advantage by Arora et al. 2020 MolBiolEvol, and a simulation of MHC CNV evolution by Bentkowski et al. 2019 PLOSCompBiol.

      I have cited some additional literature.

      (3) Since much of the criticism hinges on the cmax parameter, its biological meaning or role (or the lack thereof) could be discussed more.

      I am not sure what I can add to what is in the first paragraph of the Discussion.

      (4) I find it difficult to grasp how the v parameter, which is intended to define pathogen virulence, if I understand it correctly, can be used to amend the breadth of peptide presentation. Maybe this could be illustrated better.

      I have attempted to make this clearer. The parameter v actually controls the breadth of peptide detection conferred by an allele, which, if not identical to the breath of presentation, is certainly affected by it. The basis of the “virulence” interpretation seems to be that narrower detection breadth can, according to the model, only decrease peptide detection probability, which increases the damage done by pathogens.

      (5) Please check sentences in lines 279ff on peptide detection and cost of . There seem to be words missing.

      There was an extraneous word, which I have removed. Thank you for pointing this out.

    1. eLife Assessment

      This study reports important findings by showing that two classes of kinase inhibitors, which stabilise the LRRK2 enzyme in either an active (Type I) or inactive state (Type II), have distinct effects on the formation of LRRK2 filaments and their association with cellular structures. Using correlative light microscopy, cryo-electron tomography and sub-tomogram averaging, the authors provide convincing evidence that a Type I inhibitor leads to the extensive decoration of microtubules with LRRK2 in a closed-kinase conformation, and that such decoration is not seen for a type-II inhibitor. The conclusions are consistent with previous work, although the physiological relevance of the work remains somewhat limited due to reliance on overexpression and the use of a rare mutation in a single cell type.

    2. Reviewer #1 (Public review):

      In this study, the authors set out to determine how two classes of kinase inhibitors, which stabilise a disease-relevant enzyme in either an active (Type I) or inactive state (Type II), influence its organisation and interactions with microtubule filaments in cells. Using the state-of-the-art in-cell structural imaging approaches, they examine how these compounds affect the formation of protein filaments and their association with microtubules, and succeed in defining the underlying structural basis for these differences.

      A major strength of the work is the application of in-cell cryo-electron tomography combined with correlative imaging, which enables direct visualisation of protein organisation in a near-native cellular context. The data convincingly demonstrate that the Type I inhibitor compound stabilising the active state promotes extensive LRRK2 filament formation and microtubule bundling, whereas compounds stabilising the inactive state markedly reduce these interactions. The structural analysis further provides insight into how conformational states relate to filament organisation, including modelling of previously unresolved regions of the protein.

      These findings are internally consistent and align well with prior biochemical and structural studies, many of which were performed by the same team.

      There are, however, some limitations that should be noted. The experiments rely on overexpression of the I2020T mutant form of the LRRK2 protein, which is a rare variant, in a single cell type (293T cells), which may not fully reflect endogenous behaviour or wild-type LRRK2 in a physiological context. In addition, while the imaging data are compelling, the functional consequences of the observed filament formation and microtubule association remain unclear.

      The study therefore provides strong descriptive and structural insight, but more limited evidence linking these observations to cellular or disease-relevant outcomes.

      Overall, the authors largely achieve their aims, and the results support their central conclusion that different classes of kinase inhibitors have distinct effects on protein organisation in cells. The work represents an important advance in understanding how small molecules can reshape protein architecture in a cellular environment, with potential implications for therapeutic strategies. The methodological approach will also be of broad interest to the field, as it highlights the power of in-cell structural biology to study dynamic protein assemblies that are difficult to capture using traditional approaches.

    3. Reviewer #2 (Public review):

      Summary:

      Mutations in Leucine-Rich Repeat Kinase 2 (LRRK2) are a major cause of Parkinson's disease. LRRK2 PD-related mutations all result in increased kinase activity. Therefore, LRRK2 has been the focus of the development of kinase inhibitors. So far, two classes of kinase inhibitors have been identified: type 1 LRRK2-specific inhibitors that stabilize LRRK2 in a closed active-like conformation and broad-range type 2 inhibitors that stabilize LRRK2 in an open inactive-like conformation. Basiashvili et al. used here in cell structural biology to study the effect of both type 1 and type 2 inhibitors on the localization and structural conformation of LRRK2-I2020T.

      Strengths:

      They showed that Type 1 and not Type 2 inhibitors induce LRRK2 filament/ on microtubules. Furthermore, they were able to build a structural map of full-length LRRK2 I2020T bound to a Type 1 inhibitor in a closed kinase confirmation. Together, this work thus confirms the data of previous studies that showed that LRRK2 Type 1 and 2 inhibitors differently affect filament formation.

      Weaknesses:

      All conclusions are fully supported by the provided data. However, as the authors indicated themselves, the physiological relevance of LRRK2 microtubule binding is questionable. Furthermore, although the authors used a full-length LRRK2 protein, like in previously published structures, the resolution of the N-terminal domains is rather poor. Therefore, it also remains unclear what we learn from this structure compared to the previously published structures.

    4. Reviewer #3 (Public review):

      Summary:

      This paper describes new insights into the effects of type-I and type-II LRRK2 inhibitors on HEK293T cells that over-express GFP-labeled LRRK2-I2020T. Using correlative light microscopy and cryo-electron tomography, a type-I inhibitor leads to the extensive decoration of microtubules with LRRK2, which is not seen for a type-II inhibitor. Subtomogram averaging reveals that LRRK2 binds to the microtubules in a closed-kinase conformation, with density for the N-terminal arms.

      Strengths:

      The paper is well written; the CLEM and cryo-ET appear to be done to a high standard. Consequently, I have only minor comments.

      Weaknesses:

      The resolution of the subtomogram averages is somewhat limited, but the authors have adequately limited the number of degrees of freedom in the fitting of their atomic models by only allowing rigid-body transformations of separate parts of LRRK2.

      The authors should include FSC curves between the rigid-body fitted atomic models and the various sub-tomogram average maps.

    1. eLife Assessment

      This solid paper reports on the use of artificial intelligence to assess bone marrow adipose tissue in the skull. The method employing MRI is novel and that approach allows for the identification of genetic loci that regulate this trait as well as others using data from the UK biobank. Overall this is an important contribution although the authors should consider several points: 1-validation of the T1-weighted MRI signal intensity; 2-further discussion of the sex differences; and 3-cross-trait linkage disequilibrium score regression (LDSC) for osteoporosis, Parkinson's disease, and cognitive function.

    2. Reviewer #1 (Public review):

      The authors of this study developed a method to quantify calvarial bone marrow from MRI head scans, enabling the study of its composition in large datasets of adults, usually collected to study the brain. Bone marrow intensity can be semi-quantitatively measured in T1-weighted MRI scans due to the greater signal intensity of fat than watery red marrow. This is an ingenious use of the MRI-produced information for other important phenotypes, such as bone structure and marrow content. Different head types were tested for complying with the model, which is notable.

      The model was also successfully validated using several publicly available MRI resources - real data - in (1) a dataset consisting of 30 individuals that were scanned 10 times each at 3-day intervals, and (2) the monozygotic (MZ) twin data from the Human Connectome Project cohort. Then the authors applied this validated method to head-MRI scans from the UK Biobank (n=33,042) to extract information on the spatial distribution of bone marrow adiposity (BMA) in the calvaria, allowing a GWAS to identify associated genes.

      The authors revealed high heritability and identified 41 genetic loci significantly associated with the BMA trait, including six sex-specific loci. Of note, statistics estimate that 99% of BMA trait-influencing variants are shared with BMD (497 of 500 variants), which may mean these results demonstrate the biological relevance to bone health. Some of the BMA genes were found related to the Wnt pathway, including WNT16, WNT4, NXN; this is a "positive control", since the Wnt/β-catenin signaling pathway was suggested as an important determinant of BMA. Also, associations in genes (BMP4, DLX5, LGR4, LRP4, SFRP4) that are known to specifically influence adiposity, are encouraging. Integrating mapped genes with bone marrow single-cell RNA-seq data revealed patterns of adipogenic lineage differentiation and lipid loading.

      The study also investigated the genetic overlap between BMA and twelve (or 13) "brain and body" traits and identified significant genetic correlations with BMI, cognitive ability, and Parkinson's disease.

      In sum, since MRI head scans present a hitherto unexplored opportunity to address unresolved aspects of bone marrow biology, this study is both timely and innovative.

      There are, however, some assumptions, findings, and their interpretation, which require more critical focus.

      Sex-specificity is well described and studied here. Men have higher BMA than women, but post-menopausal women catch up in the BMA values. The authors believe that calvarial marrow has a number of features that make it particularly well-suited to the study of BMA process - which is clinically important in other bone sites. It has a simple "sandwiched" structure that they are able to model. This is true only to some extent: a condition called "Hyperostosis frontalis interna", of unknown etiology (described by Smith & Hemphill in 1956) - is characterized by irregular overgrowth of the inner table of the frontal bone (symmetric/bilateral). Although not of clinical significance, typically benign, studies report a prevalence of 12%; However, it's most common in postmenopausal women - where prevalences up to 49% in women over the age of 65 - have been reported. Thus, sexual dimorphism is obvious and the effect of estrogen is likely shared with whichever bone - and marrow - age-related pathology. So, for women not using HRT, this new layer of the bone might interfere with the calvarial BMA readings and in turn, affect the BMA-related analyses. The authors suspect that the effect of BMA on BMD may be biased in women; they should comment on those "with low BMD and high BMA" given that hyperostosis frontalis might be an issue. A strong effect of SNPs in the ESR1 chromosomal region might be akin to the above concern.

      Then, there is a perfect overlap of the BMA SNPs that are shared with BMD (497 of 500 variants), which may prove a "face validity" of the MRI-derived BMA. However, the BMD in the study was heel-derived eBMD - which is a good proxy for osteoporosis and is mostly driven by trabecular bone. Thus, there might be a concern that the BMA metrics capture some trabecular BMD.

      Next, integrating mapped genes with existing bone marrow single-cell RNA-sequencing data revealed patterns of adipogenic lineage differentiation and lipid loading. The problem here is that the scRNAseq studies of the Bone Marrow niche are overwhelmingly mouse. The authors might wish to justify why they are relevant to humans (in the absence of the human-specific scRNAseq).

      For genetic correlation analysis, the authors selected 7 body and 6 brain traits. The latter traits reflect cognition (general cognitive ability and educational attainment) and brain-related disorders. This selection might seem arbitrary. The interpretation of genetic correlation with cognitive ability, education, and Parkinson's disease was attributed to the recently discovered vascular channels that link calvarial bone marrow to the meninges. This is a fascinating hypothesis, which requires functional proof. However, there might be simpler explanations. Thus, the diploe and the inner table of the calvarium are drained by the same veins as the dura. From the anatomy textbook, we know that diploic veins connect the pericranial and endocranial venous system through the skull.

    3. Reviewer #2 (Public review):

      Summary:

      This study develops a new artificial intelligence method for high-throughput analysis of skull bone marrow from MRI data, which may be useful for large-scale biological analyses. Using this method, the authors then attempt to estimate skull bone marrow adiposity (BMA) using T1-weighted signal intensity from MRI scans of ~33,000 people, followed by genome-wide association analysis; however, the approach is inadequate because T1-weighted signal intensity is not validated for measurement of bone marrow adiposity. If it could be validated, the study would be an important advance in understanding of bone marrow adiposity and skeletal biology.

      Strengths:

      This paper is well-written, and the figures are nicely presented. The neural network method used for analysing skull bone marrow is innovative, and the authors validate this through several approaches. Therefore, the authors have achieved the aim of developing a method for large-scale analysis of skull bone marrow from MRI data.

      The GWAS is reasonably well-powered and addresses potential ethnicity differences, with one GWAS done across white males and females, and a separate GWAS in non-white participants. The methodology also conforms to common GWAS standards, including for mapping genetic variants to candidate genes. Moreover, the study further investigates the biological roles of these genes by analysing their expression in single-cell RNA sequencing data.

      Weaknesses:

      The fundamental weakness is that T1-weighted MRI signal intensity (T1W) is used as an estimate of BMA, but it has never been validated for this. The authors show that this T1W parameter measures something that is heritable and can be compared between subjects, but they don't show that it actually measures (or even estimates) calvarial BMA. There is an attempt to do so by comparing the T1W parameter with data from quantitative T1 images: the authors show a reasonable correlation with some of the quantitative T1 image data. However, this still does not show that the parameter is measuring BMA; it could be measuring some other biological characteristic, but this remains unclear. So, there is a need to validate the T1W parameter against an established measure of BMA, such as the bone marrow fat-fraction or proton density fat fraction measured from multi-echo MRI analysis.

      Without validating this BMA measurement method, it is not possible to interpret the GWAS or other findings reported in the study.

      A less critical weakness is that the GWAS has been done only on a single cohort, without replicating the findings in a follow-up cohort. For example, the authors could repeat their analysis on the remaining ~50,000 UK Biobank imaging participants for whom MRI data is now available. However, this would be pointless without knowing what biological characteristic(s) the T1W parameter is actually reflecting.

      [UPDATE, June 2026: since writing this review in September 2024, the reviewer has changed their opinion and now has confidence in the reliability of the T1W method used to estimate BMA. The reviewer would like to explain that their original critiques were based largely on previous discussions with a colleague with expertise in magnetic resonance and medical physics, who was extremely negative about use of T1W signal intensity to estimate BMA; this colleague’s criticisms may not have been objective, and clouded the reviewer’s overall impression of the present study. The reviewer and others have since completed BMA analysis using dual-echo MRI data in the UK Biobank; the findings of these studies, both for genetic and pathophysiological associations, are largely consistent with the findings of the present study, underscoring the reliability of the T1W-based BMA estimates.]

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript, "Estimating bone marrow adiposity from head MRI and identifying its genetic 2 architecture", brings together the groups of Drs. Kaufmann and Hughes in a tour de force work to develop an artificial neural network that localizes calvaria bone marrow in T1-weighted MRI head scans, with the goal of studying its composition in several large MRI datasets, and to model sex-dimorphic age trajectories, including the effect of menopause.

      Strengths:

      Bone marrow adiposity is a very active tissue with far-reaching implications for tissue crosstalk and human health than we had initially recognized. Although MRI has been used to measure BM, studies such as the one by these two groups are still lacking whereas very large datasets are analyzed using advanced AI machine learning tools coupled with genetic studies and a specific pathology. The groups had to develop new methods and new AI machine-learning tools for the imaging analyses.

      Weaknesses:

      Some aspects of the work that authors could add additional clarification.

      (1) Imaging Limitations: The authors provide an excellent overview and references supporting the use of MRI as a method for assessing marrow fat, particularly with some specific modifications. However, MRI images can be affected by various factors, including the presence of other tissues as well as specific MRI settings, which are much harder to precisely control when using different datasets.

      (2) The specific density of cranial bones as it relates to the types of bone marrow: Cranial bones are extremely dense structures, which naturally interfere with MRI imaging. While it is thought that cranial bones have mostly "red bone marrow", this is only true for a short time in humans. How sensitive is their system in differentiating between red and yellow BM?

      (3) Both items above are further complicated by aging, but aging is not a linear event as we have learned. There are specific bursts of aging in humans around the age of 45 and early 60s. How do the system and model predict or incorporate these peaks of aging? It seems from the data shown that aging is reflected more as a linear phenomenon. Is this because additional aging datasets are needed?

      (4) The authors describe in richness of detail their AI learning programming and how it extracted the data from datasets. The authors also show some important correlations with specific genes, SNPs. What is not clear is how conditions such as anemia for example. An expected finding would be that patients with chronic anemia have lower bone marrow (BM) signal intensity on MRI scans than healthy people. This is because the signal intensity of BM depends on the fat-to-cell ratio in the tissue. Furthermore, patients with a host of musculoskeletal disorders ranging from osteopenia to osteoporosis, sarcopenia, and osteosarcopenia will also have altered MRI scans. When using such large datasets how did the authors control or exclude these pathological conditions, or were all these conditions likely present?

      (5) Some of the genes and SNPs although significant showed very small correlations. What is their likely physiological significance?

      (6) The authors could use this excellent manuscript to expand their discussion to include the need for studies like theirs to be also complemented by multi-OMICS studies that will include proteomics and lipidomics of BM, bones, and muscles.

    1. eLife Assessment

      This study provides conditionally useful evidence that amino acid starvation and other stresses induce RNF25-dependent ubiquitination of RPS27A/eS31, extending this pathway beyond A-site-trapping conditions and implicating GCN1. However, incomplete and largely indirect evidence was provided to support key mechanistic claims-notably competition between RNF25 and GCN2 for GCN1 and a role in resolving ribosome collisions. Additional direct and orthogonal evidence is required to substantiate these conclusions.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors investigate ubiquitylation of RPS27A/eS31 by the E3 ligase RNF25 in response to translational stress. Previous studies have identified RPS27A/eS31 ubiquitylation at Lys113 under conditions where translation factors are trapped in the ribosomal A-site. Here, the authors extend this work by testing whether additional translational stress conditions, including amino acid deprivation, induce RPS27A/eS31 ubiquitylation. They further show that GCN1 is required and explore a possible competition between RNF25 and GCN2 for GCN1.

      Strengths:

      This study expands on the range of stress conditions leading to RPS27A/eS31 ubiquitylation, reporting that it occurs in a variety of conditions associated with ribosome stalling, including amino acid deprivation. These observations are useful because they suggest that the RNF25 pathway may not require translation factors trapped in the ribosomal A-site, but may instead respond more broadly to translational perturbations associated with ribosome collisions.

      Weaknesses:

      The evidence supporting several of the major claims is incomplete, and additional controls and orthogonal approaches would greatly strengthen the evidence presented. In particular:

      (1) It is unclear whether the different conditions used to induce translational stress lead to ribosome stalling or collisions. The model presented by the authors seems to rely on ribosomal collisions, but this is not shown. In addition, further investigating amino acid deprivation beyond the removal of Arg or Lys would strengthen the paper.

      (2) Ubiquitylation of RPS27A/eS31 by RNF25 is used throughout the paper as a readout of RNF25 activity and is assumed to be on Lys113 based on previous work, but is not formally shown here.

      (3) Rescue experiments of the different mutants used in this study with wild-type and different domain deletions (i.e., ΔRWD for RNF25, ΔRWD-binding for GCN1) would help confirm specificity and strengthen the mechanistic claims.

      (4) The conclusion that RPS27A/eS31 ubiquitylation supports translation (Figure 4) is based entirely on polysome/monosome ratios, which are difficult to interpret without additional assays of translation output, elongation, or collision.

      (5) The idea that RNF25 competes with GCN2 for GCN1 binding is interesting, and related models have recently been proposed in RNA damage. The effect of GCN2 KO on RNF25-dependent ubiquitylation appears modest, and the data would be strengthened by rescue experiments with wild-type GCN2 and GCN2 mutants defective in GCN1 binding. The authors propose: "that the RNF25 pathway acts as a first line of defence to resolve ribosome collisions, outcompeted by GCN2 binding to GCN1 under acute stress." This model would suggest a further increase in RPS27A/eS31 ubiquitylation upon Arg/Lys deprivation in GCN2 KO cells, since this is the condition in which GCN2 is expected to be activated and engaged with GCN1 (i.e., when it would be competing with RNF25), but no further increase in RPS27A ubiquitylation is observed. It is therefore not clear that these data support the proposed model. Contributing to this may be the fact that many of these assays are performed in a USP16 KO background, which may make it difficult to assess changes in RPS27A/eS31 ubiquitylation.

      (6) Given that several RWD domain proteins can interact with GCN1, and that DRG2 KO appears to affect RPS27A/eS31 ubiquitylation (Figure S5), the data do not support the GCN2-specific title. The results are more consistent with a broader, incompletely characterized network of GCN1-associated RWD domain-containing proteins that seems to affect RNF25-dependent ubiquitylation rather than with a demonstrated RNF25-GCN2 competition mechanism. Further characterization of GCN2-dependent ISR activation (p-eIF2a and ATF4 WB) in the absence of RNF25 in Arg/Lys starvation will help shed light on the RNF25-GCN2 competition. The authors use K113R, but this is not shown to prevent RNF25 engagement with GCN1, so a RNF25 KO should be used.

      Overall, the study contains useful observations, but the mechanistic claims are not yet fully supported.

    3. Reviewer #2 (Public review):

      Summary:

      The authors show that deprivation of Arginine and Lysine induces a ~50% increase in the ratio of ubi-RPS27A to RPS27A, and this induction requires E3 ubiquitin ligase RNF25. The authors show ZAKalpha and EDF1 are not required for steady state or ribosome stalling-induced ubi-RPS27A, while GCN1 is required. The ratio of polysomes to monosomes is increased in RNF25 knockdown cells or when translation is activated by ISRIB in a RPS27A K113R mutant cell line. GCN2 KO cells indicate elevated levels of ubi-RPS27A, and overexpression of the GCN2 RWD domain reduces levels of ubi-RPS27A.

      Strengths:

      (1) The authors identified a novel pathway to sense amino acid deprivation, indicated by ubi-RPS27A, previously implicated in ribosome stalling.

      (2) The authors find antagonism between two proteins known to act downstream of GCN1, giving insight into how signaling occurs from an upstream sensor of ribosome stalling to multiple downstream pathways.

      Weaknesses:

      (1) The authors suggest that, based on increased Polysome/Monosome ratios, there is more disome stalling in RNF25 KD cells and RPS27A K113R cells treated with ISRIB, but this readout is very indirect and could be driven by other changes in the cell other than ribosome stalling.

      (2) While the authors propose that GCN2 and RNF25 compete for binding to GCN1, no evidence was shown that RNF25 binds to GCN1 in cells, nor that the interaction increases when GCN2 is absent.

      (3) The use of USP16 to enhance the detection of ubi-RPS27A in many experiments brings the question of whether USP16 KO may alter the protein levels of any known regulators of ribosome collisions? (i.e. ZNF598, GCN1, EDF1, ZAKalpha, etc.) If USP16 KO causes changes in other important regulators of collisions, the authors could be identifying genetic interactions with USP16 in their experiments throughout the paper.

      (4) In Figure 5E, the expression level of the GCN2 3K RWD domain looks to be lower than the WT RWD domain; perhaps this could be what is driving the smaller decrease of ubi-RPS27A seen with GCN2 3K vs WT.

    4. Reviewer #3 (Public review):

      Summary:

      This study examines the role of RNF25 in translational quality control. Previous work indicated that RNF25 is activated by ribosomes stalled with defective elongation or termination factors bound in the A-site. Here, the authors provide evidence that RNF25 is activated by other treatments that evoke ribosome stalling, including amino acid starvation, where the A-site may be empty, leading to ubiquitination of RPS27A in a manner requiring the ISR collision sensor Gcn1, but not EDF1 and ZAKα, involved in the RQC and RSR surveillance pathways. They present some evidence from polysome profiling that RNF25 and its ubiquitination of RPS7A help resolve ribosome collisions and support translation elongation in basal conditions. They further show that KO of Gcn2 increases RPS27A ubiquitination in basal conditions, but not in amino acid-starved cells, and that RPS27A ubiquitination was reduced on overexpressing the WT RWD domain of Gcn2 but not a variant harboring substitutions of residues predicted to bind Gcn1. Based on these findings, they propose a model that, in response to ribosome stalling induced by various stresses, Gcn1 recruits RNF25 via the latter's RWD domain to ubiquitinate RPS27A and thereby resolve ribosome stalling and promote continued elongation. If collisions increase even further, GCN1 recruits GCN2 instead of RNF25 to elicit the ISR.

      Strengths:

      The data is convincing that a variety of triggers leading to diverse stalled ribosomal states, including amino acid limitation, can activate RNF25, suggesting that activation of this pathway does not require the presence of trapped protein factors in the ribosomal A-site but is a more general response to ribosome collisions. It is also convincing that Gcn1 is required for RNF25 activation under all of these conditions, which is consistent with previous findings that Gcn1 is required for RNF25 function in the presence of trapped elongation or termination factors. The finding that EDF1 and ZAK are not needed for RNF25 activation in amino acid starvation conditions is of interest for EDF1, given the recent claim that it is required for full ISR activation.

      Weaknesses:

      The evidence presented from polysome profiling that RNF25 helps resolve naturally occurring ribosome collisions in basal conditions is not compelling, as eliminating RNF25 could be increasing the rate of initiation rather than increasing stalled ribosomes as the means of increasing the P/M ratio. The Rps27A-K113R mutation could have the same effect of increasing initiation, which could have been obscured by inhibiting the ISR with ISRIB.

      The evidence that RNF25 competes with Gcn2 for Gcn1 binding is also not compelling. While it's convincing that Rps27A-Ubi is elevated in basal conditions on eliminating Gcn2, loss of GCN2 would be expected to increase ribosome loading on mRNAs, potentially elevating the frequency of collisions and thereby stimulating RNF25 activity indirectly.

      It's also quite puzzling and left unexplained why they observed no further increase in Rps27A-Ubi on -Arg/-Lys starvation in the cells lacking Gcn2. Why wouldn't -Arg/-Lys starvation lead to further stalling and RNF25 activation in the absence of Gcn2? (Since Gcn2 KO increases Rps27A-Ubi in the presence +Arg/+Lys conditions, it can't be that Gcn2 is required for RNF25 function.) The same puzzling and unresolved observation was made in the cells lacking DRG2. One possible explanation for this conundrum is that low-level RNF25 abundance limits further activation.

      The quantitative effects of overexpressing the Gcn2 RWD domain on Rps27A-Ubi, constituting their other evidence presented to support the competition model, are quite small in magnitude.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors investigate ubiquitylation of RPS27A/eS31 by the E3 ligase RNF25 in response to translational stress. Previous studies have identified RPS27A/eS31 ubiquitylation at Lys113 under conditions where translation factors are trapped in the ribosomal A-site. Here, the authors extend this work by testing whether additional translational stress conditions, including amino acid deprivation, induce RPS27A/eS31 ubiquitylation. They further show that GCN1 is required and explore a possible competition between RNF25 and GCN2 for GCN1.

      Strengths:

      This study expands on the range of stress conditions leading to RPS27A/eS31 ubiquitylation, reporting that it occurs in a variety of conditions associated with ribosome stalling, including amino acid deprivation. These observations are useful because they suggest that the RNF25 pathway may not require translation factors trapped in the ribosomal A-site, but may instead respond more broadly to translational perturbations associated with ribosome collisions.

      We wish to point out that our study in fact suggests that the RNF25 pathway is activated by translation factors in the A-site, in agreement with what has been previously proposed, and in addition by stalling conditions that are assumed to not trap translation factors in the A-site. We do not exclude that these conditions might be sampled by A-site binding quality control factors before recognition by RNF25.

      Weaknesses:

      The evidence supporting several of the major claims is incomplete, and additional controls and orthogonal approaches would greatly strengthen the evidence presented.

      We appreciate adding more controls to further substantiate our novel findings. In the course of the revisions we will focus our work on those experiments that do not merely reproduce established facts in the field.

      In particular:

      (1) It is unclear whether the different conditions used to induce translational stress lead to ribosome stalling or collisions. The model presented by the authors seems to rely on ribosomal collisions, but this is not shown. In addition, further investigating amino acid deprivation beyond the removal of Arg or Lys would strengthen the paper.

      We thank the reviewer for the comment. It is correct that we don’t formally show collisions.

      However, the conditions we use have been previously established in the field to induce ribosome stalls and/or collisions, which we may not have pointed out clearly enough. In the revised version, we will include all relevant citations, i.e. for ternatin (Oltion et al., 2023): collisions, anisomycin (Juszkiewicz et al., 2018, Sinha et al., 2020): collisions, emetine (Sinha et al., 2020): collisions, didemnin B (Juszkiewicz et al., 2018, Stoneley et al., 2022): accumulation of ubi-eS10 and changes in polysome profiles indicative of collisions, MMS (Stoneley et al., 2022): changes in polysome profiles indicative of stalls or collisions, starvation -Arg/-Lys (Darnell et al., 2018, Stoneley et al., 2022): accumulation of collided ribosomes only upon GCN2 inhibition, indicative of collisions.

      Secondly, we do not claim to induce collisions when describing the inhibition data (Figure 1 and Figure S1) and were careful to say that we use ‘conditions that cause ribosome stalling’.

      Thirdly, we conclude on collisions when interpreting the data on amino acid starvation (and in our model (Figure 6)), based on our data demonstrating that RNF25 activity in RPS27A/eS31 ubiquitylation is dependent on GCN1 (Figure 3), an established sensor of collided disomes (Pochopien et al., 2021). This conclusion is thus based on the current knowledge in the field.

      We will carefully screen the text for potential points of overinterpretation or confusion between stalling and collisions.

      To address the request of further investigating amino acid deprivation beyond the removal of Arg or Lys, we will include an additional experiment in which we will deplete another amino acid.

      (2) Ubiquitylation of RPS27A/eS31 by RNF25 is used throughout the paper as a readout of RNF25 activity and is assumed to be on Lys113 based on previous work, but is not formally shown here.

      It is established that Lys113 is the main target of RNF25, not only by our work (Montellese et al., 2020), but also by recent work of other groups to which we had referred in our manuscript (Gurzeler et al., 2023, Oltion et al., 2023, Zhao et al., 2026).

      To experimentally address this point, we will add an experiment testing ubiquitylation of RPS27A/eS31 in cells carrying the K113R mutation.

      (3) Rescue experiments of the different mutants used in this study with wild-type and different domain deletions (i.e., ΔRWD for RNF25, ΔRWD-binding for GCN1) would help confirm specificity and strengthen the mechanistic claims.

      Minimally, we will include rescue experiments for RNF25 (using WT, DRWD and enzymatically dead mutant) and, if possible, also for GCN1, which might be more challenging due to its large size and anticipated problems with cloning, cell line generation and protein expression.

      (4) The conclusion that RPS27A/eS31 ubiquitylation supports translation (Figure 4) is based entirely on polysome/monosome ratios, which are difficult to interpret without additional assays of translation output, elongation, or collision.

      It is correct that we base our conclusion on polysome profiles and agree that these are an indirect measure of translation output. However, this assay is well established in the field to show dysregulation of polysome/monosome ratio upon ribosome stalling (Garzia et al., 2017), (Wu et al., 2020), (Chatterjee et al., 2024), (Gurzeler et al., 2023).

      Elongation defects would be expected to lead to stalls and/or collisions (which we conclude on). However, we cannot exclude that there is more initiation when RPS27A/eS31 carries the K113R mutation, although this is hard to rationalize mechanistically and experimentally challenging to exclude. Therefore, to address the point, we will add a sentence that we cannot exclude indirect effects on initiation but consider these unlikely.

      (5) The idea that RNF25 competes with GCN2 for GCN1 binding is interesting, and related models have recently been proposed in RNA damage. The effect of GCN2 KO on RNF25dependent ubiquitylation appears modest, and the data would be strengthened by rescue experiments with wild-type GCN2 and GCN2 mutants defective in GCN1 binding. The authors propose: "that the RNF25 pathway acts as a first line of defence to resolve ribosome collisions, outcompeted by GCN2 binding to GCN1 under acute stress." This model would suggest a further increase in RPS27A/eS31 ubiquitylation upon Arg/Lys deprivation in GCN2 KO cells, since this is the condition in which GCN2 is expected to be activated and engaged with GCN1 (i.e., when it would be competing with RNF25), but no further increase in RPS27A ubiquitylation is observed. It is therefore not clear that these data support the proposed model. Contributing to this may be the fact that many of these assays are performed in a USP16 KO background, which may make it difficult to assess changes in RPS27A/eS31 ubiquitylation.

      We thank the reviewer for the comment. We measure on average a 50% increase in the level of ubiquitinated RPS27A/eS31 in GCN2 KO cells. Considering the large number of ribosomes in a cell (~10<sup>7</sup> per HeLa cell), this 50% increase (from 12.5 to 25% ubiquitinated RPS27A/eS31) amounts to an estimated number of 1,25 x 10<sup>6</sup> of RPS27A/eS31 molecules that get additionally modified, which is clearly a substantial difference, especially compared to the naturally very low levels of RNF25 (in the range of 23’000 molecules (Itzhak et al., 2016)).

      We respectfully disagree that performing experiments in USP16 KO background makes it difficult to assess RPS27A/eS31 ubiquitination. On the contrary. The natural levels of RPS27A/eS31 ubiquitination in WT cells are very low, making quantification sensitive to background fluctuations (see Figure S1). Therefore, in our experience, the usage of USP16 KO makes the quantitative analysis of RPS27A/eS31 ubiquitination robust, allowing us to analyse both increase and decrease in the levels of ubiquitination. We agree that with increasing collisions, the level of ubiquitinated RPS27A/eS31 reaches a plateau in USP16 KO, which may limit the observable increase. Therefore, the substantial 50% increase might indeed underestimate the effect as compared to WT cells. Still, the measurable increase is substantial and robust.

      To experimentally address the point of the reviewer, we will try generating GCN2 KO cells in a WT background, i.e. in absence of USP16 KO, to strengthen our model.

      (6) Given that several RWD domain proteins can interact with GCN1, and that DRG2 KO appears to affect RPS27A/eS31 ubiquitylation (Figure S5), the data do not support the GCN2specific title. The results are more consistent with a broader, incompletely characterized network of GCN1-associated RWD domain-containing proteins that seems to affect RNF25-dependent ubiquitylation rather than with a demonstrated RNF25-GCN2 competition mechanism. Further characterization of GCN2-dependent ISR activation (p-eIF2a and ATF4 WB) in the absence of RNF25 in Arg/Lys starvation will help shed light on the RNF25-GCN2 competition. The authors use K113R, but this is not shown to prevent RNF25 engagement with GCN1, so a RNF25 KO should be used.

      While we fully agree that our data point at a broader network of competition on GCN1, we wished to avoid an overstatement on other pathways than GCN2, since our experimental evidence on DRG2 is limited at the moment. As it stands, changing the title of the manuscript to a more general message, would indeed fuel the view that our claims are incomplete. But we are glad to reconsider this suggestion if further supporting evidence can be obtained in the course of the revision work.

      The reviewer suggests experiments on competition of RNF25 with GCN2. In contrast to the expectation of the reviewer, we do not expect KO of RNF25 to manifest in defects in ISR activation due to the low expression levels of RNF25. In the revised manuscript, we will make clearer that our model refers to competition in the other direction, i.e., of GCN2 with RNF25, which our data supports. The reverse competition of RNF25 with GCN2 is expected to be inefficient to enable a robust activation of the ISR by GCN1 when needed. In addition, other pathways (such as DRG2) might also contribute to the resolution of collisions in the absence of RNF25, affecting the level of ISR activation.

      We feel that further working out these competitive relationships will be interesting to perform in future work. Currently, it is also not clear whether all involved RWD-containing factors bind GCN1 with the same affinity, which is important to consider for the effectiveness of a mutual competition model as suggested by the reviewer.

      Reviewer #2 (Public review):

      Summary:

      The authors show that deprivation of Arginine and Lysine induces a ~50% increase in the ratio of ubi-RPS27A to RPS27A, and this induction requires E3 ubiquitin ligase RNF25. The authors show ZAKalpha and EDF1 are not required for steady state or ribosome stalling-induced ubiRPS27A, while GCN1 is required. The ratio of polysomes to monosomes is increased in RNF25 knockdown cells or when translation is activated by ISRIB in a RPS27A K113R mutant cell line. GCN2 KO cells indicate elevated levels of ubi-RPS27A, and overexpression of the GCN2 RWD domain reduces levels of ubi-RPS27A.

      Strengths:

      (1) The authors identified a novel pathway to sense amino acid deprivation, indicated by ubiRPS27A, previously implicated in ribosome stalling.

      (2) The authors find antagonism between two proteins known to act downstream of GCN1, giving insight into how signaling occurs from an upstream sensor of ribosome stalling to multiple downstream pathways.

      Weaknesses:

      (1) The authors suggest that, based on increased Polysome/Monosome ratios, there is more disome stalling in RNF25 KD cells and RPS27A K113R cells treated with ISRIB, but this readout is very indirect and could be driven by other changes in the cell other than ribosome stalling.

      We thank the reviewer for this important comment. We intentionally used ISRIB in Figure 4F, G to avoid possible effects on initiation, and the results are consistent with our model. While we agree that ISRIB itself might have indirect consequences, these should be the same for the control (WT cells) and the assay condition (K113R cells). We also show the data without ISRIB, which show a similar trend but are less robust (Figure 4D, E). It is very hard to exclude other possible effects which would selectively affect K113R cells in presence of ISRIB.

      (2) While the authors propose that GCN2 and RNF25 compete for binding to GCN1, no evidence was shown that RNF25 binds to GCN1 in cells, nor that the interaction increases when GCN2 is absent.

      The idea of RNF25 binding to GCN1 is based on a previously published work (Oltion et al., 2023, Seidel et al., 2026, Zhao et al., 2026). We will design additional experiments to potentially confirm the interaction between RNF25 and GCN1.

      (3) The use of USP16 to enhance the detection of ubi-RPS27A in many experiments brings the question of whether USP16 KO may alter the protein levels of any known regulators of ribosome collisions? (i.e. ZNF598, GCN1, EDF1, ZAKalpha, etc.) If USP16 KO causes changes in other important regulators of collisions, the authors could be identifying genetic interactions with USP16 in their experiments throughout the paper.

      Indeed, we can’t exclude the effect of USP16 KO on the expression levels of other collision sensors. We will experimentally confirm the levels of other ribosome collision sensors in USP16 KO cells.

      (4) In Figure 5E, the expression level of the GCN2 3K RWD domain looks to be lower than the WT RWD domain; perhaps this could be what is driving the smaller decrease of ubi-RPS27A seen with GCN2 3K vs WT.

      We thank the reviewer for pointing at this issue, which we will experimentally address in the revised version.

      Reviewer #3 (Public review):

      Summary:

      This study examines the role of RNF25 in translational quality control. Previous work indicated that RNF25 is activated by ribosomes stalled with defective elongation or termination factors bound in the A-site. Here, the authors provide evidence that RNF25 is activated by other treatments that evoke ribosome stalling, including amino acid starvation, where the A-site may be empty, leading to ubiquitination of RPS27A in a manner requiring the ISR collision sensor Gcn1, but not EDF1 and ZAKα, involved in the RQC and RSR surveillance pathways. They present some evidence from polysome profiling that RNF25 and its ubiquitination of RPS7A help resolve ribosome collisions and support translation elongation in basal conditions. They further show that KO of Gcn2 increases RPS27A ubiquitination in basal conditions, but not in amino acid-starved cells, and that RPS27A ubiquitination was reduced on overexpressing the WT RWD domain of Gcn2 but not a variant harboring substitutions of residues predicted to bind Gcn1. Based on these findings, they propose a model that, in response to ribosome stalling induced by various stresses, Gcn1 recruits RNF25 via the latter's RWD domain to ubiquitinate RPS27A and thereby resolve ribosome stalling and promote continued elongation. If collisions increase even further, GCN1 recruits GCN2 instead of RNF25 to elicit the ISR.

      Strengths:

      The data is convincing that a variety of triggers leading to diverse stalled ribosomal states, including amino acid limitation, can activate RNF25, suggesting that activation of this pathway does not require the presence of trapped protein factors in the ribosomal A-site but is a more general response to ribosome collisions. It is also convincing that Gcn1 is required for RNF25 activation under all of these conditions, which is consistent with previous findings that Gcn1 is required for RNF25 function in the presence of trapped elongation or termination factors. The finding that EDF1 and ZAK are not needed for RNF25 activation in amino acid starvation conditions is of interest for EDF1, given the recent claim that it is required for full ISR activation.

      Weaknesses:

      (1) The evidence presented from polysome profiling that RNF25 helps resolve naturally occurring ribosome collisions in basal conditions is not compelling, as eliminating RNF25 could be increasing the rate of initiation rather than increasing stalled ribosomes as the means of increasing the P/M ratio. The Rps27A-K113R mutation could have the same effect of increasing initiation, which could have been obscured by inhibiting the ISR with ISRIB.

      Our results indicate that P/M ratio increases upon ISRIB treatment of K113R cells compared to WT cells, aligning with the idea that ISRIB enhances initiation, causing increased loading of ribosomes on mRNA and consequent increased frequency of collisions. As outlined above, we agree that this experiment is indirect and results might be affected by secondary effects. However, we cannot rationalize how inhibition of the ISR by ISRIB would specifically obscure the effect for the K113R mutation but not the WT.

      (2) The evidence that RNF25 competes with Gcn2 for Gcn1 binding is also not compelling. While it's convincing that Rps27A-Ubi is elevated in basal conditions on eliminating Gcn2, loss of GCN2 would be expected to increase ribosome loading on mRNAs, potentially elevating the frequency of collisions and thereby stimulating RNF25 activity indirectly.

      We have not made sufficiently clear that we did not intend to claim that RNF25 efficiently competes with GCN2 (see also response to reviewer 1), which we do not expect due to the low levels of RNF25. Our manuscript is focussed on competition in the reverse direction, i.e. of GCN2 with RNF25.

      We agree that loss of GCN2 may increase ribosome loading on mRNA similar to ISRIB treatment, which could lead to more collisions by enhanced translation and hence increased Rps27A-Ubi. At the same time, however, this does not exclude that loss of GCN2 contributes more directly at the level of RNF25 recruitment. Therefore, the experiment also supports the competition model, and both effects together may contribute to the observed increase in ubiquitylated RPS27A/eS31. Without other evidence, the experiment would remain inconclusive.

      Therefore, to directly test the competition model, we had overexpressed the GCN1-binding RWD domain of GCN2, which leads to decreased levels of ubiquitinated RPS27A/eS31, lending direct support to the competition model of GCN2 with RNF25, which is consistent with similar models recently proposed by two other manuscripts (Seidel et al., 2026, Zhao et al., 2026).

      (3) It's also quite puzzling and left unexplained why they observed no further increase in Rps27AUbi on -Arg/-Lys starvation in the cells lacking Gcn2. Why wouldn't -Arg/-Lys starvation lead to further stalling and RNF25 activation in the absence of Gcn2? (Since Gcn2 KO increases Rps27A-Ubi in the presence +Arg/+Lys conditions, it can't be that Gcn2 is required for RNF25 function.) The same puzzling and unresolved observation was made in the cells lacking DRG2. One possible explanation for this conundrum is that low-level RNF25 abundance limits further activation.

      Over all of our experiments, we have observed that RPS27A-Ubi reaches a plateau of about 30% to 35% of total RPS27A in the USP16 KO background (GCN2 deletion or amino acid starvation). This plateau indeed limits seeing further increases. We do not know the underlying reason but note that under these conditions about one third of 40S subunits carry ubiquitin on RPS27A/eS31. As the reviewer suggests, RNF25 is expressed at low levels (in the range of 23’000 molecules, (Itzhak et al., 2016); see point 5 of reviewer 1), likely rendering it the limiting factor for further ubiquitination events.

      To circumvent the plateau issue, we will attempt to generate GCN2 KO cell lines in the WT background for the starvation experiments (see also response to reviewer 1, point 5).

      (4) The quantitative effects of overexpressing the Gcn2 RWD domain on Rps27A-Ubi, constituting their other evidence presented to support the competition model, are quite small in magnitude.

      We respectfully disagree with the reviewers’ comment concerning the magnitude of the effect. There is a ~27% decrease in ubiquitination, which is substantial considering the number of 40S ribosomal subunits and possible consequences of such change. It should also be noted that this is a transient transfection experiment not hitting all cells of the population. We will repeat the experiment, optimizing the expression of the negative control construct.

      Cited literature:

      Chatterjee S, Naeli P, Onar O, Simms N, Garzia A, Hackett A, Coyle K, Harris Snell P, McGirr T, Sawant TN et al. (2024) Ribosome Quality Control mitigates the cytotoxicity of ribosome collisions induced by 5-Fluorouracil. Nucleic Acids Res 52: 12534-12548

      Darnell AM, Subramaniam AR, O'Shea EK (2018) Translational Control through Differential Ribosome Pausing during Amino Acid Limitation in Mammalian Cells. Mol Cell 71: 229-243 e11

      Garzia A, Jafarnejad SM, Meyer C, Chapat C, Gogakos T, Morozov P, Amiri M, Shapiro M, Molina H, Tuschl T et al. (2017) The E3 ubiquitin ligase and RNA-binding protein ZNF598 orchestrates ribosome quality control of premature polyadenylated mRNAs. Nat Commun 8: 16056

      Gurzeler LA, Link M, Ibig Y, Schmidt I, Galuba O, Schoenbett J, Gasser-Didierlaurant C, Parker CN, Mao X, Bitsch F et al. (2023) Drug-induced eRF1 degradation promotes readthrough and reveals a new branch of ribosome quality control. Cell Rep 42: 113056

      Itzhak DN, Tyanova S, Cox J, Borner GH (2016) Global, quantitative and dynamic mapping of protein subcellular localization. Elife 5

      Juszkiewicz S, Chandrasekaran V, Lin Z, Kraatz S, Ramakrishnan V, Hegde RS (2018) ZNF598 Is a Quality Control Sensor of Collided Ribosomes. Mol Cell 72: 469-481 e7

      Montellese C, van den Heuvel J, Ashiono C, Dorner K, Melnik A, Jonas S, Zemp I, Picotti P, Gillet LC, Kutay U (2020) USP16 counteracts mono-ubiquitination of RPS27a and promotes maturation of the 40S ribosomal subunit. Elife 9  

      Oltion K, Carelli JD, Yang T, See SK, Wang HY, Kampmann M, Taunton J (2023) An E3 ligase network engages GCN1 to promote the degradation of translation factors on stalled ribosomes. Cell 186: 346-362 e17

      Pochopien AA, Beckert B, Kasvandik S, Berninghausen O, Beckmann R, Tenson T, Wilson DN (2021) Structure of Gcn1 bound to stalled and colliding 80S ribosomes. Proc Natl Acad Sci U S A 118

      Seidel AS, Nemcekova L, Grønbæk-Thygesen M, Shi X, Ramalho S, Mordente KC, Bekker-Jensen S, Haahr P (2026) RNF25 restrains GCN2 hyperactivation to sustain protein synthesis and cell proliferation in response to RNA damage. bioRxiv

      Sinha NK, Ordureau A, Best K, Saba JA, Zinshteyn B, Sundaramoorthy E, Fulzele A, Garshott DM, Denk T, Thoms M et al. (2020) EDF1 coordinates cellular responses to ribosome collisions. Elife 9

      Stoneley M, Harvey RF, Mulroney TE, Mordue R, Jukes-Jones R, Cain K, Lilley KS, Sawarkar R, Willis AE (2022) Unresolved stalled ribosome complexes restrict cell-cycle progression after genotoxic stress. Mol Cell 82: 1557-1572 e7

      Wu CC, Peterson A, Zinshteyn B, Regot S, Green R (2020) Ribosome Collisions Trigger General Stress Responses to Regulate Cell Fate. Cell 182: 404-416 e14

      Zhao S, Palma-Chaundler CS, Engel CM, Cordes J, Nixdorf D, Luo MY, Kaya S, Suryo Rahmanto A, van den Heuvel D, Mackens-Kiani T et al. (2026) RNF25 confers mRNA damage tolerance by curbing activation of the integrated stress response. Mol Cell 86: 1275-1292 e12

    1. eLife Assessment

      Using a genetic screen in C. elegans, Benbow et al., identify mutations in alpha-tubulin genes that suppress Tau-induced neurodegenerative phenotypes. The results provide solid support the authors' claim that the tubulin mutants protect against neurodegeneration without altering tau aggregation and hyperphosphorylation. While precise mechanisms of protection by tubulin mutants remain to be established, the results are valuable for understanding the underlying cellular mechanisms of Tauopathies and for the development of therapeutic interventions.

    2. Reviewer #1 (Public review):

      Summary:

      This study identifies mutations in alpha-tubulin that suppress Tau-induced neurodegeneration using the C. elegans model of Tauopathy, suggesting a potentially interesting role for microtubule properties in modulating Tau toxicity. These missense mutations cluster in the C-terminal Tau-interacting helix 12 region of alpha-tubulin genes (tba-1, tba-2, and mec-12). Further analysis, particularly using the strongest suppressor tba-2, shows that it rescues Tau-induced behavioral deficits and neuronal loss without significantly altering bulk tau-phosphorylation, aggregation, or binding to soluble tubulin. The authors suggest that altered microtubule properties underlie the neuroprotective effects, and manipulating microtubule properties may have therapeutic potential.

      Strengths:

      The study is conceptually interesting as it shows that Tau-induced neurotoxicity can, in this model, be partially uncoupled from canonical pathological hallmarks such as Tau-hyperphosphorylation and aggregation. The identification of multiple independent mutations in the same structural region of three alpha-tubulin genes provides support for the functional relevance of helix 12 in modulating Tau-induced toxicity. The authors demonstrate significant rescue of behavioral deficits (using motility and manual thrashing assays) and neuronal loss in both WT-tau and FTLD-associated TauV337M in combination with mutant alpha-tubulins, suggesting a general mechanism for tubulin-regulated modulation of Tau-toxicity. Moreover, the correlation between mutant tubulin expression levels and the extent of rescue supports a causal relationship.

      Weaknesses:

      One of the major claims of this manuscript is that altered microtubule properties suppress Tau toxicity. The only supporting evidence in this context provided by the authors is reduced taxol-stabilized microtubule mass, which does not fully explain neuronal loss or the rescue of behavioral deficits. What remains unclear is whether these mutations alter microtubule dynamics, catastrophe, lattice stability, or axonal transport.

      The authors show that mutant tba-2 reduces total tau levels by ~45%. This level of reduction is likely significant but underexplored in the manuscript. Why are the Tau levels reduced? How is Tau getting cleared- is there enhanced autophagy or ubiquitin-proteasome pathway getting upregulated in tba-2 + Tau animals? Or one or more of the Tau species not detectable by the antibodies used in this study? The observation that the mec-12 mutant rescues Tau-induced phenotypes without altering Tau levels suggests that suppression can occur through Tau-independent mechanisms. This raises an important unresolved question regarding the extent to which suppression is Tau-dependent vs Tau-independent across different mutant alpha-tubulin genes, complicating the interpretation of the rescue phenotypes.

      Given that Tau primarily associates with the microtubule lattice in vivo, measuring interactions with soluble tubulin may not fully capture biologically relevant binding dynamics and therefore does not exclude the possibility that these mutations alter tau-microtubule interactions at the lattice level or may affect the binding of other MAPs/regulators, thereby altering stability or trafficking.

      A large body of conclusions is drawn from behavioral rescue and biochemical assays. This limits the understanding of how molecular changes in tubulin might affect cellular mechanisms of neuroprotection. Are there changes in the neuronal microtubule organization, Tau localization, or its redistribution in the mutant alpha-tubulin background? Are there differences in soluble vs oligomeric vs insoluble Tau in mutant tba-2 and mec-12 animals?

      The suppression of behavior in the co-pathology model is interesting but mechanistically insufficient, mainly because the underlying basis of suppression is not examined in these models. Moreover, it remains unclear whether tubulin-Tau genetically interacts with Aβ or TDP-43, and what cellular mechanisms account for the partial rescue observed in these co-pathology models.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by Benbow et al. identifies, through a genetic screen, key tubulin mutants that, with high confidence, rescue tau-mediated ND phenotypes. This manuscript is well written, and the experimental results strongly support the authors' claims that these tubulin mutants can rescue ND-linked phenotypes in C. elegans while having little to no direct effect on Tau aggregation.

      Strengths:

      Benbow et al. use a relatively unbiased forward genetic screen to identify mutations associated with phenotypes that suppress tauopathy-related defects. The authors then logically focus on the various α-tubulin missense mutations identified in H12, which are known to localize to the external face of microtubules. The authors also carefully compare their established tauopathy-associated phenotypes in the WT TauH model, with and without specific α-tubulin mutations, using appropriate controls throughout. Lastly, the authors provide partial mechanistic insight into the α-tubulin mutant-mediated rescue, showing that these effects are independent of tau aggregation and tau phosphorylation, and instead suggest that the α-tubulin mutations may confer altered microtubule assembly properties based on the sedimentation assays.

      Weaknesses:

      While the claims are largely supported by the experimental outcomes, the authors at times do not provide enough detail in the text for readers to interpret the data sets independently. In addition, some claims appear to be slightly overstated relative to the data or the degree of error associated with those data.

    4. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study identifies mutations in alpha-tubulin that suppress Tau-induced neurodegeneration using the C. elegans model of Tauopathy, suggesting a potentially interesting role for microtubule properties in modulating Tau toxicity. These missense mutations cluster in the C-terminal Tau-interacting helix 12 region of alpha-tubulin genes (tba-1, tba-2, and mec-12). Further analysis, particularly using the strongest suppressor tba-2, shows that it rescues Tau-induced behavioral deficits and neuronal loss without significantly altering bulk tau-phosphorylation, aggregation, or binding to soluble tubulin. The authors suggest that altered microtubule properties underlie the neuroprotective effects, and manipulating microtubule properties may have therapeutic potential.

      Strengths:

      The study is conceptually interesting as it shows that Tau-induced neurotoxicity can, in this model, be partially uncoupled from canonical pathological hallmarks such as Tau-hyperphosphorylation and aggregation. The identification of multiple independent mutations in the same structural region of three alpha-tubulin genes provides support for the functional relevance of helix 12 in modulating Tau-induced toxicity. The authors demonstrate significant rescue of behavioral deficits (using motility and manual thrashing assays) and neuronal loss in both WT-tau and FTLD-associated TauV337M in combination with mutant alpha-tubulins, suggesting a general mechanism for tubulin-regulated modulation of Tau-toxicity. Moreover, the correlation between mutant tubulin expression levels and the extent of rescue supports a causal relationship.

      Weaknesses:

      One of the major claims of this manuscript is that altered microtubule properties suppress Tau toxicity. The only supporting evidence in this context provided by the authors is reduced taxol-stabilized microtubule mass, which does not fully explain neuronal loss or the rescue of behavioral deficits. What remains unclear is whether these mutations alter microtubule dynamics, catastrophe, lattice stability, or axonal transport.

      We agree with Reviewer #1’s critique that the evidence presented does not fully explain neuronal loss and requires further investigation. This first manuscript characterized the mutations discovered through forward genetic screening techniques and provided data to support the positive correlation mutant expression and level of suppression. We believe the studies and data presented here help to formulated the next testable hypotheses, and guide the next lines of experimentation. We are encouraged by Reviewer #1’s assessment that exploration of microtubule dynamics, catastrophe, lattice stability and axonal transport will be critical to testing the hypothesis that mutant tubulin drives suppression of tau toxicity through changes to microtubule properties. These suggestions are highly relevant and align with our priorities as we recently submitted an application for a 5-year research award to support these key questions.

      To address this specifically, the reviewer recommended “The microtubule-dependent axonal transport should be examined in tubulin mutants and compared with mutant tubulin + Tau conditions. Imaging of mitochondrial or synaptic vesicle markers, along with appropriate quantifications (velocity or run length), may provide a functional readout linking microtubule changes to neuronal survival.”

      We agree with the reviewer that these experiments will be highly valuable to further understand the mechanisms underlying suppression, and we have planned to complete these experiments upon receipt of funding that would directly support the completion of these experiments.

      The authors show that mutant tba-2 reduces total tau levels by ~45%. This level of reduction is likely significant but underexplored in the manuscript. Why are the Tau levels reduced? How is Tau getting cleared- is there enhanced autophagy or ubiquitin-proteasome pathway getting upregulated in tba-2 + Tau animals? Or one or more of the Tau species not detectable by the antibodies used in this study? The observation that the mec-12 mutant rescues Tau-induced phenotypes without altering Tau levels suggests that suppression can occur through Tau-independent mechanisms. This raises an important unresolved question regarding the extent to which suppression is Tau-dependent vs Tau-independent across different mutant alpha-tubulin genes, complicating the interpretation of the rescue phenotypes.

      We think the reviewer has addressed an important point that there may be both tau-dependent and tau-independent mechanisms at work here, and we will add greater nuance to this in our discussion. Additionally, we agree these two potential mechanistic pathways merit further exploration. To address this, we have planned to conduct experiments using reporter C. elegans lines crossed with our mutant tubulin/tau-transgenic lines to detect potential upregulation of these pathways as mechanisms for tau clearance.

      Given that Tau primarily associates with the microtubule lattice in vivo, measuring interactions with soluble tubulin may not fully capture biologically relevant binding dynamics and therefore does not exclude the possibility that these mutations alter tau-microtubule interactions at the lattice level or may affect the binding of other MAPs/regulators, thereby altering stability or trafficking.

      In the discussion we acknowledge the limitation of only examining the binding affinity between soluble tubulin and tau and intend to complete further studies with polymerized microtubules containing mutant α-tubulin. We will expand discussion of this in the text. Similar to reviewer 1, we have also concluded that the next line of experimentation will focus on mutant alpha-tubulin effects on the microtubule polymer such as changes to MAP interactions, stability and trafficking. We have applied for and hope to receive funding to address these questions in the near future.

      To address this concern specifically, we plan to conduct these experiments using C. elegans extracts to polymerize microtubules and subsequently test the binding of recombinant human tau. These co-sedimentation experiments are expected to be included in the revised manuscript.

      A large body of conclusions is drawn from behavioral rescue and biochemical assays. This limits the understanding of how molecular changes in tubulin might affect cellular mechanisms of neuroprotection. Are there changes in the neuronal microtubule organization, Tau localization, or its redistribution in the mutant alpha-tubulin background? Are there differences in soluble vs oligomeric vs insoluble Tau in mutant tba-2 and mec-12 animals?

      The reviewer raises relevant questions regarding elucidation of the mechanisms underlying mutant tubulin-mediated suppression at the cellular level. To address this concern we will analyze the cellular distribution of tau in neurons from mutant and non-mutant C. elegans.

      Ultimately, our goals are to identify and connect the underlying biochemical mechanisms with the observed prevention of cell death as Reviewer 1 has identified. Their suggestion to explore cellular-level changes such as mutant tubulin effects on tau distribution is highly relevant. We therefore plan to test this directly by imaging neurons in C. elegans strains expressing fluorescently labeled tau and/or immunohistochemical techniques to stain for tau in C. elegans neurons.

      The suppression of behavior in the co-pathology model is interesting but mechanistically insufficient, mainly because the underlying basis of suppression is not examined in these models. Moreover, it remains unclear whether tubulin-Tau genetically interacts with Aβ or TDP-43, and what cellular mechanisms account for the partial rescue observed in these co-pathology models.

      In agreement with Reviewer #1’s assessment, we have concluded these data, while interesting, do not substantially expand our understanding apart from the existing data. Without additional information regarding the underlying mechanisms, they do not provide substantial novel insights and we have therefore chosen to remove the co-pathology data sets from the revised version of the manuscript to refine the scope of the data and hypotheses discussed in this work.

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Benbow et al. identifies, through a genetic screen, key tubulin mutants that, with high confidence, rescue tau-mediated ND phenotypes. This manuscript is well written, and the experimental results strongly support the authors' claims that these tubulin mutants can rescue ND-linked phenotypes in C. elegans while having little to no direct effect on Tau aggregation.

      Strengths:

      Benbow et al. use a relatively unbiased forward genetic screen to identify mutations associated with phenotypes that suppress tauopathy-related defects. The authors then logically focus on the various α-tubulin missense mutations identified in H12, which are known to localize to the external face of microtubules. The authors also carefully compare their established tauopathy-associated phenotypes in the WT TauH model, with and without specific α-tubulin mutations, using appropriate controls throughout. Lastly, the authors provide partial mechanistic insight into the α-tubulin mutant-mediated rescue, showing that these effects are independent of tau aggregation and tau phosphorylation, and instead suggest that the α-tubulin mutations may confer altered microtubule assembly properties based on the sedimentation assays.

      Weaknesses:

      While the claims are largely supported by the experimental outcomes, the authors at times do not provide enough detail in the text for readers to interpret the data sets independently. In addition, some claims appear to be slightly overstated relative to the data or the degree of error associated with those data.

      We appreciate the feedback regarding the need for additional clarity for independent analysis of the datasets. We will revise the figures and text to increase clarity for the readers. We will review statements and edit language in accordance with their degrees of error as appropriate.

      The authors measure tau binding affinities using soluble tubulin but do not assess tau binding to assembled microtubules. This is an important limitation, as the physiologically relevant interaction involves α/β-tubulin heterodimers, either free or incorporated into the microtubule lattice. Furthermore, the binding analysis appears to focus only on the D429N α-tubulin mutant, which further limits physiological relevance, as β-tubulin, which is also required for normal tau binding, is not explicitly considered.

      We acknowledge that the limited conclusions may be drawn from soluble tubulin interactions with tau and additional analysis with polymerized microtubules will be useful in understanding tau-microtubule binding affinity. The analysis was completed with isolated pools of tubulin from C. elegans, not recombinant mutant tubulin, so this is a heterogenous mixture of tubulin composed of α/β heterodimer subunits, and a mixture of the mutant isotype within the larger pool of wild type isotypes. While this further complicating the analysis, and is the likely source of variability, it incorporates the normal heterodimer subunit biochemistry.

      Given that tau prominently binds the microtubule lattice we agree with the reviewers that the assessment that experiments with polymerized microtubules containing mutant tubulin would offer a greater understanding of the effects of mutant alpha-tubulin on microtubule properties and potential mechanisms of toxic tau suppression. To test this directly we intend to complete co-sedimentation experiments using C. elegans extracts from wild type and mutant tubulin expressing C. elegans incubated with recombinant human tau.

      In conclusion, the thoughtful commentary and suggestions from reviewers will help improve the manuscript. We plan to complete the following experiments to address their concerns.

      (1) Assess tau localization in mutant tba-2 and mec-12 C. elegans as compared to tau-transgenic C. elegans without tubulin mutations. We plan to use immunohistochemical techniques and/or imaging of Dendra2-labeled tau to assess the sub-compartmental distribution of tau in C. elegans neurons. This addresses Reviewer #1’s question of whether the mutant tubulin changes tau localization in neurons.

      (2) Assess changes mutant-tubulin driven changes to tau affinity for polymerized microtubules. To address both reviewers concerns regarding the limitations of biding experiments with tau and soluble tubulin, We plan to use C. elegans extracts to tests whether microtubule polymers containing mutant alpha-tubulin alter tau-microtubule co-sedimentation.

      (3) Using C. elegans reporter lines we plan to assess whether tau clearance occurs in tba-2 mutant tubulin C. elegans through the upregulation of autophagy or ubiquitin degradation pathways.

      (4) Evaluate the neuroprotective effects of mutant alpha-tubulin in cholinergic neurons using a C. elegans strain expressing a fluorescent label specifically in cholinergic neurons.

      We plan to make textual revisions to increase clarity, aid in independent analysis of the presented datasets, and better address the possibility of both tau-dependent and tau-independent mechanisms. We appreciate the Reviewers attentive reading and thoughtful feedback for the improvement of this manuscript.

    1. eLife Assessment

      This potentially valuable study describes the development of protein binders targeting DELE1, a protein involved in activating the integrated stress response when mitochondria are perturbed (the mitoISR pathway. The strategy appears to be successful, as several designed proteins were shown to bind DELE1, disrupt DELE1 oligomerization, and attenuate ISR activation. However, the demonstration of the utility of these inhibitory binders is incomplete, particularly given the limited biological outcomes examined in the current study, thus limiting the significance of the paper in its current form.

    2. Reviewer #1 (Public review):

      Summary:

      The protein DELE1 is a critical component to signal mitochondrial stress to the cytosol: under stress conditions, a truncated form of DELE1, termed DELE1(CTD) accumulates in the cytosol as an oligomer, binds the HRI kinase, which triggers the integrated stress response.

      Leveraging the structural knowledge of the DELE1(CTD) oligomer, this study attempts to interfere with the oligomerization process, using an AI-designed protein that binds to the DELE1(CTD) oligomerization interface. The starting hypothesis is that such a binder shall selectively inhibit the DELE1-signalled mitochondrial stress response. The authors use established AI pipelines (RFdiffusion) to make a series of such binders, characterize them with biochemical methods and a crystal structure of the binder in its free state. When over-expressing the binders in HEK293T cells, the authors report that mitochondrial stress - induced with a drug - does indeed not lead to triggering the stress response, confirming their starting hypothesis.

      The work is an elegant demonstration of how AI-designed proteins can specifically interfere with cellular mechanisms.

      The conclusions of the work are mostly well supported by data; there are some mechanistic gaps, however, about the interaction mechanisms.

      Strengths:

      The study is a nice combination of (i) a clear structure-derived hypothesis on how to interfere with a signalling mechanism, (ii) state-of-the-art protein design tools, (iii) a mostly robust biochemical characterization, and (iv) cellular experiments to demonstrate the effects of the binders.

      Weaknesses:

      The crystal structure of the binder5, while confirming its AlphaFold model, does not provide direct evidence of the binding mode to DELE1. Direct structure determination, using crystallography (which may require cleaving the MBP domain) would make their mechanistic arguments stronger.

      The demonstration that the binders do not inhibit the DELE1-HRI interaction is interesting; however, the underlying mechanism, in particular where the DELE1-HRI binding occurs, is not explored.

      While this study opens perspectives on how to interfere with DELE1-signalling, it is unlikely that these binders are actually useful for medical applications (compared to small-molecule drugs), as acknowledged in the manuscript.

    3. Reviewer #2 (Public review):

      Summary:

      Previous structural analyses of DELE1 by the authors revealed that the first α-helix within the TPR repeat domain provides the oligomeric interface of DELE1, and that DELE1 octamer formation is required for maximal ISR activation. Based on these findings, the authors designed peptides intended to bind this oligomeric interface and showed that these peptides interfere with DELE1 oligomerization in vitro and attenuate ISR activation in cultured cells.

      Strengths:

      The series of in-vitro data sets showing direct binding of the designed peptides to DELE1 and inhibitory effects on its oligomerization are convincing.

      Weaknesses:

      The physiological (or experimental) significance of inhibiting the DELE1-HRI-ISR pathway using these peptides has not been clearly demonstrated, particularly given that the very limited cell biological outcomes are tested in the current manuscript.

    4. Reviewer #3 (Public review):

      Significance of the findings and the strength of evidence:

      The article presented by Yang et al. describes the development of protein binders targeting the C-terminal domain of the protein DELE1, which is involved in the mitochondrial integrated stress response (mitoISR) pathway. It was shown earlier that DELE1 is imported into the mitochondria and cleaved by the inner mitochondrial membrane protease OMA1, resulting in an N-terminal and C-terminal domain, the latter being transported back into the cytosol, where it interacts and activates the kinase HRI. HRI, in turn, phosphorylates eIF2α, resulting in selective translation of mRNAs encoding proteins involved in stress signalling, such as the transcription factor ATF4. ATF4 activates expression of genes involved in amino acid balance, redox homeostasis and proteostasis. The C-terminal domain of DELE1 (DELE1CTD) was structurally and functionally characterized by earlier by cryo-EM by Jie Yang and co-workers. These studies suggest that it forms an octamer with D4 symmetry consisting of two tetramers arranged in a tail-to-tail arrangement. In this octamers two interfaces were identified, one between the monomers in the tetramers and one connecting the tetramers to form the octamer. In this earlier work, it was also shown by mutational studies that interrupting the first interface has an impact on the OMA1-DELE1-HRI-eIF2α-ATF4 pathway upon mitochondrial stress in human cells. To this end, the authors concluded in the current manuscript that it might be interesting and also of therapeutic interest to develop a protein binder that binds DELE1 and disrupts oligomer formation. The authors set up a de novo protein design approach using RFdiffusion to design a protein scaffold and ProteinMPNN to design the side chains to create protein binders targeting the α-helix α1 in DELE1CTD that is directly involved in the formation of the first interface forming the tetramer. As I am not an expert in protein design, I cannot judge the quality of this data. The candidates were evaluated by AlphaFold3 to confirm complexes formed between the designs and DELE1CTD. In the end, 12 designed protein binders were selected for further analyses. These proteins were recombinantly produced in E. coli and purified. The proteins DELE1 full-length (DELE1fl) and DELE1CTD were produced as MBP-fusion proteins to improve solubility and stability. Co-expression studies with mbp-delet1CTD revealed that 11 out of the 12 binders co-eluted with MBP-DELE1CTD from a size-exclusion chromatography column, indicating complex formation. Without the presence of the binders, MBP-DELE1CTD elutes as a higher oligomer, suggesting that the binders interfere with oligomerisation. Further analyses included the impact of the presence of selected binders on stress-induced ISR. The authors found that different binders had a slightly different impact on the outcome upon treatment with stressors, and also compared two different stressors. This was concluded by assessing the ATP4 protein level by immunoblotting. The interaction of selected binders with DELE1CTD was subsequently confirmed by co-immunoprecipitation experiments. To evaluate whether the impact of the binders is restricted to mitochondrial stress studies, eliciting endoplasmic reticulum stress showed no effect on ATF4 levels. The presence of the binders furthermore impaired recovery of tubulated mitochondria following mitochondrial stress induction, resulting in more fragmented mitochondria. The authors determined a crystal structure of one binder at a resolution of 2.6 Å and performed AlphaFold3 predictions to model the complex between binders and DELE1CTD. The interface is characterized by many hydrophobic residues. From this data, they concluded some interface mutants and tested those concerning their impact on the interaction. Indeed, mutation of these hydrophobic side chains to charged residues interfered with complex formation. Finally, the authors show that binder binding to DELE1CTD does not interfere with the binding of HRI kinase. Overall, the methodology applied is state-of-the-art, and the manuscript is well-written. The design of protein binders targeting DELE1 involved in mitochondrial stress signalling is interesting for basic science to study stress signalling, but also therapeutically. However, as ISR has a positive impact on disease development and ageing, but also a negative one, depending on the degree of activated ISR, a therapeutic use would need to be precisely applied. The study has some weaknesses, and particularly the structural data seems to have severe issues.

    1. eLife Assessment

      This study presents a valuable finding that coordinated changes in epigenetic modifications and three-dimensional chromatin architecture may drive primary trastuzumab resistance in HER2+ breast cancer. Moreover, this manuscript identifies SGK1 as a potential therapeutic target. The evidence supporting the claims of the authors is solid, although the inclusion of a more direct validation of the key findings using tumor samples from patients with clinical trastuzumab resistance would have strengthened the study. The work will be of interest to scientists or clinicians working in the field of BCs.

    2. Reviewer #1 (Public review):

      Summary:

      This study investigates epigenetic and three-dimensional chromatin alterations associated with primary trastuzumab resistance in HER2-positive breast cancer using integrated CUT&Tag, RNA-seq, and Micro-C analyses in JIMT1 (resistant) and SKBR3 (sensitive) cell models. The authors identify widespread remodeling of histone modification landscapes, chromatin compartment organization, and promoter-enhancer looping, highlighting SGK1 as a candidate epigenetically activated mediator associated with intrinsic resistance. The manuscript provides a technically solid and extensive multi-omic resource for the study of HER2-positive breast cancer resistance states.

      Strengths:

      The study integrates multiple state-of-the-art epigenomic and chromatin conformation approaches, including CUT&Tag, RNA-seq, and Micro-C, generating a comprehensive dataset that will likely be valuable to the field. The analyses are generally technically rigorous and well executed, and the manuscript is overall clearly written. The integration of chromatin architecture, enhancer activity, transcriptional regulation, and histone modification profiling provides an informative overview of large-scale epigenomic remodeling associated with resistant versus sensitive HER2-positive breast cancer states. The identification of SGK1-associated chromatin activation and enhancer rewiring is particularly interesting and supported by multiple orthogonal datasets.

      The inclusion of both intrinsic and acquired trastuzumab resistance models also strengthens the study conceptually, even if the biological interpretation remains somewhat complex.

      Weaknesses:

      The major limitation of the study is that many of the central mechanistic conclusions remain largely correlative. Although coordinated changes in chromatin architecture, histone modifications, enhancer activity, and SGK1 expression are observed, direct evidence demonstrating that these epigenetic alterations causally drive SGK1 activation or trastuzumab resistance is currently lacking.

      In addition, the interpretation of SGK1 as a broader trastuzumab-resistance driver is somewhat weakened by the analyses in the acquired resistant SKBR3_HR model, where SGK1-associated chromatin and transcriptional changes appear largely absent. This raises the possibility that SGK1 dependency may reflect a lineage- or model-specific vulnerability intrinsic to JIMT1 cells rather than a generalizable resistance mechanism.

      The study also remains descriptive in several sections. Numerous chromatin interactions and compartment changes are cataloged without sufficient biological contextualization or mechanistic integration. As a result, parts of the manuscript currently read more as a comprehensive epigenomic profiling resource than a fully mechanistic study of resistance biology.

      Finally, the translational impact is limited by the lack of patient-level validation linking SGK1 activation to trastuzumab response or clinical outcome in HER2-positive breast cancer cohorts.

    3. Reviewer #2 (Public review):

      Summary:

      Duan, Hua et al. used CUT&Tag and Micro-C to investigate that in primary trastuzumab-resistant HER2+ breast cancer cells, promoter H3K4me3 rather than H3K27me3 is strongly correlated with transcriptional activity. Resistant cells also exhibited more abundant promoter-enhancer loops and enriched cohesin at loop anchors, accompanied by shifts in A/B compartment status. Through multi-omics integration, the authors identified SGK1 as a key gene showing elevated promoter H3K4me3 levels, enhancer activation, strengthened chromatin loops, and upregulated transcription in resistant cells, and validated SGK1 as a potential therapeutic target. These findings reveal the coordinated interplay between three-dimensional chromatin architecture and epigenetic modifications, offering important insights into trastuzumab resistance in HER2+ breast cancer.

      Strengths:

      Previous investigations into trastuzumab resistance have largely focused on genetic mutations or individual epigenetic modifications. In contrast, this study moves beyond genetic or single epigenetic views by integrating histone modifications and 3D chromatin architecture into a unified framework, proposing a synergistic model of promoter H3K4me3, enhancer activation, and chromatin looping that underlies non-genetic resistance. It provides a new conceptual basis for understanding non-genetic resistance mechanisms. Secondly, using high-resolution epigenomic and conformational mapping together with bidirectional in vitro and in vivo functional validation, it establishes a solid link between epigenetic changes and phenotypes, and demonstrates that SGK1 inhibition suppresses tumor growth in a xenograft model, revealing clear translational potential.

      Weaknesses:

      (1) All findings are based on a single pair of cell lines, JIMT1 and SKBR3, which does not allow exclusion of cell line‑specific effects. The authors did not examine SGK1 expression levels, promoter H3K4me3 status, or relevant chromatin loops in tumor tissues from patients with clinical trastuzumab resistance. Consequently, whether the conclusions can be extrapolated to actual patient populations remains unclear, which limits the clinical relevance of the findings. It is recommended that the authors directly validate the key findings using tumor samples from patients with clinical trastuzumab resistance or analyze the correlation between SGK1 expression levels and disease-free survival or pathological complete response using data from public databases for HER2+ breast cancer patients, which would help address the current limitation of lacking clinical sample validation and the uncertainty regarding the association of SGK1 with patient prognosis and treatment response.

      (2) In the Discussion, the authors propose that SGK1 may assume the role of AKT to sustain mTOR activation, thereby bypassing the dependence on HER2 signaling following trastuzumab inhibition. Although this hypothesis is supported by published literature, the present study provides no direct signaling evidence, such as examining phosphorylation changes of SGK1, AKT, mTOR, or their downstream effectors.

    1. eLife Assessment

      This valuable paper uses a mathematical model applied to a dataset of E coli / ESBL carriage and transmission to infer drivers of drug resistance in France. The strength of support for the study findings is incomplete. While the research question is of importance, and the mathematical model has structural and methodological integrity, numerous issues are noted: insufficient description of the data, lack of included equations and code, definitions of antibiotic use that are not complete, low sensitivity of assays for carriage, technical issues with statistical prior selection and parameter identification, and application of non-regional ECDC surveillance data to France.

    2. Reviewer #1 (Public review):

      Summary:

      The authors used a large dataset evaluating gut carriage of Enterobacterales and ESBL organisms from children aged 6-24 months as the basis for a modeling study to investigate what factors are most important for determining the prevalence of ESBL resistance. The modeling incorporated travel, a simple model of carriage duration (short and long), fitness cost of resistance on transmission and clearance, and antibiotic use. They found that antibiotic use is the primary driver of resistance prevalence, with transmissibility of resistant strains also important for setting the prevalence. Travel, while important when prevalence is very low, plays less of a role in maintaining prevalence once it is established (in keeping with other recent work). They estimated the fitness cost of resistance (terming a reduction of 14% on the rate of transmission and an increase of 23% on the rate of clearance as "low"). While the extent of assumptions and simplifications makes me skeptical of the quantitative conclusions, the qualitative ones seem reasonable and reinforce the long-held principles of the field--reducing antibiotic pressure and interrupting transmission--and highlight the importance of understanding the biological factors that shape the duration of carriage and the likelihood of colonization.

      Strengths:

      This study incorporates many of the factors that might influence the carriage prevalence of ESBL Enterobacterales. This builds on the work led by this group, both in primary data collection and in theory. Overall, it's such a tough problem that I commend the authors for trying to tackle it. The authors take a thoughtful, rigorous approach, acknowledging simplifications and assumptions where they need to, so as to evaluate the various factors shaping ESBL prevalence.

      Weaknesses:

      Part of the reason it's such a tough problem is that we have limited data to structure and parameterize a complex model.

      (1) The data are not sufficiently described.

      The primary data source for this modeling exercise comes from a study of 6-24-month-old children who underwent rectal swabs and evaluation of the carriage prevalence of Enterobacterales, and then whether these Enterobacterales were ESBL; moreover, the study included data on travel and on antibiotic use. Could the authors please direct us to these primary data? Could the authors also justify the parameters in their models from these data--for example, could they please provide the distribution of antibiotic use and the associated timing? Could they also explain why they decided to treat all Enterobacterales as if they were E. coli (line 307)? Is there evidence that all Enterobacterales occupy the same niche and compete with each other?

      (2) The model should be more fully described and the limitations explored/explained.

      - The authors should point to the code and the ODEs.<br /> - I understand the focus on the pediatric population; the authors argue that this is reasonable because ESBL colonization is similar across age groups. But presumably, antibiotic use differs across age groups, and there is colonization pressure from within households.<br /> - The authors only consider resistance to extended-spectrum beta-lactams and use of beta-lactam antibiotics, but ESBL Enterobacterales are often resistant to other antibiotics as well. How much does the use of other antibiotics also select for Enterbacterales that happen to carry ESBL resistance? "One bug/one drug" modeling, as done here, neglects the complexities of the actual patterns of resistance and range of antibiotic use.<br /> - Do the data support the T3 or S3 compartments, which, if I understand correctly, means no exposure to antibiotics can happen during three months after either treatment or travel? What do the data say about the patterns of antibiotic use? I'd imagine that the likelihood of antibiotic use is not homogenous, but instead, there are some who use repeated rounds of antibiotics.<br /> - Why do the authors exclude individuals who used antibiotics in the prior 7 days? What justifies that cutoff? The authors speculate that the impact of excluding these individuals is likely to be minimal; why exclude them, then? Did the authors evaluate the results if they were included?<br /> - What is the basis of "niche differentiation", as described starting on line 221? Why should clearance of one strain be slower when the strain co-occurs in a host with a strain of another type?

    3. Reviewer #2 (Public review):

      Overview:

      This study integrates several datasets into a unified modeling framework that incorporates several mechanisms thought to impact the spread of ESBL-resistant bacterial strains. The model accounts for tradeoffs between persistor and colonizer strains, travel rates, antibiotic treatment and strain clearance, direct competitive interactions, and, most importantly, a series of distinct costs associated with the carriage of ESBL resistance. The resulting 75-compartment model is internally consistent and structurally neutral. However, the parameter estimation is flawed in many ways, compromising the interpretations of the model.

      On the usage of the Swedish infant data set to estimate colonization and persistence:

      First, while other papers have taken similar approaches, the Swedish infant data set is fundamentally inadequate to estimate colonization and persistence rates. This is because very few colonies were typed per sampling event (2 to 6 colonies per event). The original authors themselves argued that strains of indistinguishable morphology would not be able to be differentiated by this method. They also provided data showing that strain identity was not directly related to colony morphology (same strain often displaying distinct morphologies).

      The consequence of this is that strains present in low abundance would be missed with a high likelihood. However, if they were to be stochastically sampled, this would count as a "colonization" event, and if they were missed in subsequent samplings, this would count as a "loss" event. In other words, the statistical methods described conflate within-host dynamics (which might lead to distinct within-host abundances) with between-host dynamics (colonization and loss).

      Beyond this conceptual issue, some technical aspects aren't particularly sound. The mean of the inferred posterior for the lambda and mu parameters are then used to calculate the beta, gamma, d, and epsilon parameters through a linear regression. The more technically correct way of doing this would be to directly infer these parameters from the data and obtain a full posterior for these parameters.

      This highlights another issue: these parameters are passed down to the next statistical model as point estimates, with no associated uncertainty. This artificially inflates the (already low) confidence of the estimates for the cost parameters.

      Finally, when this procedure generated parameters that were inconsistent with their expectations (clearance is too high to explain prevalence in France), they adjusted the parameters by discarding and recalculating their beta parameters to artificially enforce neutrality between their strains and enforce the expected prevalence. This is problematic because beta and gamma were jointly estimated, and there is no particular reason why some of them should be discarded. The more natural interpretation would be that parameters inferred from Swedish infants do not translate well to French adults, which should preclude their usage in this context.

      On the estimation of costs of ESBL resistance:

      The core of the second statistical model is to use prevalence data, travel data, and treatment data in conjunction with the previously inferred colonization and loss parameters to infer the costs of carrying antibiotic resistance. Therefore, the accuracy of this section is contingent on an accurate estimation of the previous parameters. However, these colonization and loss parameters are inherited with no uncertainty (just point estimates are passed down), which, as previously mentioned, generates an artificially precise posterior distribution for the resistance parameters.

      However, the most severe issue with the statistics lies in the choice of priors for the cost parameters. All of them are uniform in a positive range that implies a positive cost. Importantly, the average over a positive range will always be positive; therefore, this method will ALWAYS estimate a positive mean for the costs. Note that the posterior distribution of some cost parameters seems to peak around zero and abruptly decays with no mass to the left of zero. This is caused by the choice of prior. Had delta been allowed to be negative (i.e., antibiotic resistance carried a benefit, having the prior be uniform between -1 and 1), the posterior distribution would likely be much more symmetrical, and the confidence interval would have included 0.

      Restating, because the prior is a continuous function between 0 and 1, it contains infinitely more mass in the region that represents there being a cost (delta>0) than in the region representing no cost (delta=0). This means that it is a mathematical impossibility for this model to infer the absence of a cost.

      Therefore, the main finding of the paper ("We found that resistance is costly") is a mathematical artifact of the prior choice and of the model structure.

    4. Reviewer #3 (Public review):

      Cotto and colleagues integrated data analysis with mathematical modeling to examine extended-spectrum beta-lactamase (ESBL)-producing E. coli in France. While ESBL prevalence has risen globally, it has stabilized at approximately 6-8% across Europe. Established risk factors for ESBL carriage include prior antibiotic exposure and travel to high-prevalence regions, most notably South-East Asia. The dataset incorporated information on ESBL-producing E. coli and travel history in young children, and the model was calibrated to ECDC surveillance data on ESBL across Europe, supplemented by literature-derived parameters on antibiotic use, E. coli biology, and transmission dynamics. The authors report that ESBL-carrying strains exhibit a 14% fitness cost in community transmission relative to susceptible bacteria, yet are cleared 23% less frequently. ESBL carriage was strongly associated with factors that prolong gut colonization. Both antibiotic treatment rates and transmission efficiency were identified as key determinants of community-level ESBL prevalence.

      Strengths:

      The study addresses a clinically and epidemiologically important topic. The integrated modeling approach is methodologically sound and well-suited to disentangling the relative contributions of transmission and antibiotic selection pressure.

      Weaknesses:

      Several concerns regarding the data used in this study warrant consideration. First, model calibration relied on ECDC surveillance data pooled across multiple European countries, several of which have substantially lower antibiotic consumption than France (ECDC ESAC-Net Annual Epidemiological Report, 2024). Given that antibiotic use is a primary driver of ESBL selection, ESBL prevalence is likely to be heterogeneous across these settings. Calibrating to a geographically diverse dataset risks introducing systematic bias into parameter estimates that may not be representative of the French context. The authors should repeat the analysis using France-specific data, or, where this is not feasible, restrict the calibration dataset to countries with comparable antibiotic consumption profiles. Second, the travel exposure data may be insufficient to adequately capture importation dynamics from South-East Asia, as the cohort consisted exclusively of young children, a demographic less likely to travel to high-prevalence regions than older age groups. This may result in an underestimation of travel-associated importation as a contributor to community ESBL prevalence, and the generalizability of these findings to the broader population should be interpreted with caution.

    1. eLife Assessment

      This manuscript provides a timely and important statistical re-evaluation of a paper by Epp et al., on the discordance of BOLD and CMRO2 measures. The authors present a convincing case based on rigorous re-analysis of the data that these previous results arise predominantly from uncertainty in measurement, rather than physiological features. These findings have implications that are of importance to all studies of brain function using BOLD FMRI.

    2. Reviewer #1 (Public review):

      The study by Epp et al. has indeed gotten a lot of attention. As so often in the fMRI literature, some voices had taken the results out of proportion as if this result would suggest that we cannot trust fMRI. This is so, while informed researchers are aware of the capabilities and challenges of BOLD as a measure of neural activity. The paper was discussed and criticized on many aspects from various angles. E.g. with respect to unestablished models of estimating CMRO2, the 40% figure is being overestimated by the mask definition, and expected neuronal and vascular effects underlying the discordance.

      The first publications of these discussions are being shared now. E.g. Chen et al. https://doi.org/10.1038/s41593-026-02288-y. The manuscript at hand augments this discussion. Specifically, the manuscript provides a direct statistical refutation of the recently proposed widespread physiological sign reversal between BOLD and CMRO2.

      By reanalyzing a high-profile dataset, the authors demonstrate that the previously reported 40% discordance rate is an artifact of statistical uncertainty rather than a genuine physiological phenomenon. This critical re-evaluation restores some confidence in the canonical interpretation of BOLD signals that was recently challenged. It highlights the necessity of rigorous statistical validation in quantitative fMRI.

      The following points should be addressed:

      (1) Absence of evidence is taken as evidence of absence

      The group-level significance analysis, summarized in the horizontal bar chart and cortical surface maps, labels non-significant voxels as 'CMRO2 not reliable', and the discussion concludes that positive BOLD responses are predominantly concordant with metabolism.

      The paper treats voxels with non-significant CMRO2 effects as 'statistically uncertain' rather than as potentially reflecting genuine null metabolic changes, conflating absence of evidence with evidence of absence. Because the 77.2% of voxels shown as light orange could reflect either real null metabolism or insufficient power, the paper cannot distinguish between these. This ambiguity matters because a genuine null metabolic response to positive BOLD would itself be physiologically interesting and would not straightforwardly support 'predominant concordance'.

      (2) Contextualization in other current literature

      I feel that the introduction of the paper could also consider the embedding of the current literature about biophysical processes in the negative areas.

      The negative responses have partly been discussed in the literature on quantitative physiology: e.g., Bohraus et al have been able to pinpoint the source of negative CMRO2 in positively activated voxels to large veins (https://doi.org/10.1016/j.celrep.2023.113341). Huber et al. have found that the neurovascular coupling (arterial venous weighting) is different in positively and negatively activated brain areas, making the interpretation of derived parameters on physiology hard.

      (3) Stylistic comments.

      In places, the tone of the language could be revised to ensure that it is perceived as making a constructive contribution to the discussion.

    3. Reviewer #2 (Public review):

      Summary:

      The rebuttal aims to provide a statistical re-evaluation of Epp et al. to investigate the effects of CMRO2 uncertainty on concordance/discordance analysis between BOLD signal responses and CMRO2 change estimates based on an R2 framework. The authors observe markedly higher variance in CMRO2 compared to BOLD, which raises concerns about sign classification purely based on group means/medians.

      Strengths:

      The study is well motivated, and the analytical pipeline is rigorous and has been provided. Overall, the manuscript provides several thoughtful and rigorous analyses that contribute meaningfully to the ongoing discussion surrounding neurovascular coupling and CMRO₂ estimation.

      Weaknesses:

      Some aspects of the analytical framework could be improved, as well as the discussion of the caveats of the methods of this and the original paper.

      (1) The binomial framework discussed on line 110 and described on line 321 reduces continuous ΔBOLD and ΔCMRO2 measurements to binary concordant/discordant labels, which may overemphasize unstable sign flips near zero effect sizes while discarding potentially meaningful magnitude information. The authors acknowledge that this overly strict approach yields very few meaningful voxels. A better justification or explanation of what we are meant to take away from this, other than the variability in the measurement, which is also explored elsewhere, would be helpful to the reader.

      (2) In the methods, in the section entitled: Voxel Selection: BOLD Activation Mask, the authors describe their more traditional univariate statistical method as compared to the PLS approach used in the Epp paper. While I appreciate why the authors chose this approach, which simplifies interpretation, is it possible that this led to a lower number of discordant voxels? If yes, then I would suggest this be also added in the discussion of how the original Epp paper's methodological choices led to the very large percentage of discordant voxels.

      (3) In the original paper, it looks to me like the discordant voxels have low CBF change and low rOEF. The gadolinium-based CBV measurement used to calculate OEF is a measure of total blood volume, while the blood volume that contributes to BOLD resides predominantly in veins and capillaries. Given the long PLD of the ASL acquisition and the total blood volume measurement, it seems to me that it is possible that discordant voxels may have high arterial blood volume, leading to overly large CBV measurement and an underestimation of CBF at this PLD (especially given their young age, for which I would expect ATT to be closer to 1-1.5s based on recent literature). While this is not currently discussed in this paper, it might be relevant to discuss how acquisition choices could bias some voxels towards erroneous CMRO2 estimates, which in turn would lead to these voxels being identified as discordant.

      (4) In the methods, on line 267, the authors describe how they calculated ΔCMRO2 and how it differs from the original paper. A short discussion of how this choice is likely to affect the variance estimates would be warranted, given that the original paper seems to have chosen their method for the explicit purpose of decreasing error propagation. Especially, I wonder if this difference could account for the observation that "77.2% of voxels showed no statistically significant group-level ΔCMRO₂ effect".

    1. eLife Assessment

      This useful study employs longitudinal widefield cortical imaging to investigate how bilateral vision loss reshapes spontaneous activity across the mouse cortex over time, revealing a state-dependent alteration in the locomotion-related modulation of visual cortical activity. The work provides solid support for its main findings and offers a thorough characterization of the large-scale reorganization of cortical dynamics following adult vision loss. However, the mechanistic interpretation remains limited, as the conclusions are based on a single abrupt and irreversible manipulation without sham controls and on a recording approach that cannot resolve the cell-type-specific mechanisms invoked in the discussion.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, the authors perform longitudinal mesoscale calcium imaging of visual and other cortical areas following binocular enucleation (blinding through the removal of the eyes) in adult mice. The study is observational and exploratory, and analyzes changes in the frequency distribution of calcium signals during locomotion and quiescence as a function of time after enucleation. They also analyze correlations between calcium signals in different brain regions to ask how apparent connectivity between regions changes over time. The main conclusions are (1) that there are multiple timescales of plasticity; (2) that the coupling between locomotion and activity in visual areas flips sign after enucleation, and (3) that correlations between brain areas are modulated by this long-lasting plasticity. Overall, the data are likely to be useful to researchers studying the impact of injury and catastrophic loss of sensory inputs on brain reorganization, but it is hard to draw firm conclusions from the observations provided beyond the very general conclusions listed above.

      Strengths:

      (1) The longitudinal imaging of multiple brain areas simultaneously allows the investigators to follow plastic changes in the same animals over time, to address questions about how apparent connectivity and brain state modulation unfold after injury.

      (2) The data suggesting a flip in sign of the coupling between movement and "activity" in visual areas is interesting and potentially novel.

      Weaknesses:

      (1) The mesoscale imaging has limitations. In particular, the authors use words/phrases such as "activity" and "functional connectivity" without ever discussing what the measures they provide with this approach (frequency distribution of summed calcium fluctuations, and the correlation between this measure across brain areas) actually mean, or how they approximate spike-based measures or cellular-resolution Ca signals. The manuscript would benefit from an in-depth discussion of these limitations.

      (2) In general, the figures are difficult to follow. In many cases, what is being plotted is hard to extract without a lot of work, and metrics are not well-justified. For example, they calculate the R value between movement power and spectral power of the Ca signal to quantify changes across time in the coupling between movement and activity (Figure 2). But from the example given, this does not look like a continuous relationship, and though R values are significant its not clear that this correlation is a good way of quantifying the change in sign they attempt to document. Figure 7 is impossible to read, and areas quantified are not indicated. The reader should not have to work this hard to figure out what they are plotting.

      (3) It would be reassuring to rule out an effect of repeated imaging on the metrics they describe here. Longitudinal imaging of the same duration without enucleation would be the best control. Alternatively, they do have multiple baseline measurements that they collapse into one value in most of their plots.

      (4) The discussion is very long. They spend a lot of time trying to relate their findings to the larger literature on visual deprivation, but because of differences in paradigms (enucleation, laser ablation, visual deprivation, binocular vs monocular) and differences in measures (see point 1), it's hard to draw conclusions. In my view, the manuscript would benefit from less speculation about plasticity mechanisms and more discussion of the strengths and weaknesses of their approach.

    3. Reviewer #2 (Public review):

      Summary:

      This study uses cortex-wide mesoscopic calcium imaging to investigate how adult vision loss induced by bilateral enucleation alters spontaneous cortical activity across behavioral states, including quiescence, locomotion, and anesthesia. The authors perform longitudinal imaging over two time scales, spanning days to weeks and weeks to months after enucleation, enabling them to track the changes of cortical reorganization.

      The main findings are that oscillatory activity in V1 undergoes a strong reversal in its relationship to behavioral state. Before enucleation, V1 activity is positively correlated with locomotion and negatively correlated with quiescence, whereas after vision loss, this pattern reverses. State-transition dynamics are similarly altered: locomotion onset shows reduced V1 activation, while cessation of locomotion is associated with increased activity after enucleation, while it caused suppression during baseline. In addition, the authors report an increase in slow-wave (0.1-4 Hz) activity in V1 after enucleation, starting in the first week and lasting over many weeks. Although these effects show partial recovery over time, many abnormalities persist for weeks to months.

      At the network level, the study reveals altered large-scale cortical organization, including reduced functional connectivity involving V1 that appears to remain impaired.

      Strengths:

      Overall, the work provides a thorough characterization of how adult vision loss reshapes cortical dynamics, particularly with respect to behavioral-state modulation.

      Weaknesses:

      However, there is also a lack of clarity due to the way the data are presented. Moreover, the study remains largely descriptive, as it does not address the mechanisms underlying these changes or their functional significance, making it difficult to interpret the broader implications of the observed cortical reorganization.

    4. Reviewer #3 (Public review):

      Summary:

      The authors track cortical activity across the dorsal cortex of head-fixed mice for up to ten weeks following bilateral eye removal, asking how the cortex reorganizes over an extended period after vision loss. They report a rapid and long-lasting reversal of the normal relationship between movement and visual cortex activity, together with a delayed, weeks-long window of enhanced slow-wave activity during rest and a persistent reorganization of large-scale cortical correlations.

      Strengths:

      The longitudinal scope is the work's strength. Tracking the same animals over a ten-week window after sensory loss is technically demanding and rarely done, and it yields a temporal picture that short studies cannot provide. The observation that the movement-related activation of the visual cortex inverts within a day and only partially recovers over weeks is striking and has not been documented at this timescale. The analysis is internally consistent across two protocols (short- and long-term) and frames the changes by behavioral state, focusing on rest versus movement. This is a useful analysis that the field has not systematically applied to studies of deprivation.

      Weaknesses:

      The manipulation is unusually severe: removing both eyes eliminates patterned vision, non-image-forming light input, and all residual retinal signals abruptly and irreversibly, in contrast to the milder and often reversible manipulations the discussion draws on. Without a sham-surgery control, the early effects cannot be cleanly separated from the surgery itself.

      The language of "plasticity" runs ahead of what the data actually measure, since the study quantifies spontaneous activity and pairwise correlations but does not assess receptive fields, evoked responses, synaptic changes, or the causal manipulation of any candidate circuit. The discussion nevertheless attributes findings to specific interneuron circuits, molecular pathways, and thalamocortical reorganization, none of which are tested in this study.

      The imaging method also constrains what can be claimed: widefield calcium signals are dominated by superficial-layer and excitatory output and cannot resolve the cell-type-specific mechanisms invoked in the discussion. Because the key findings lie in the low-frequency band where vascular contamination is greatest, the hemodynamic correction, particularly in the deprived state, where vascular tone itself may be altered, deserves more validation than it currently receives.

      Finally, the presentation relies heavily on group-level heatmaps in the main figures, with raw traces, spectrograms, and per-animal trajectories at the key inflection points (day 1, week 1, week 10) largely absent. This makes it difficult to judge whether the reported patterns are coherent across animals.

    1. eLife Assessment

      This is a valuable paper that compares various deep learning models, trained with different objective functions, on their ability to predict fMRI data collected during naturalistic video gameplay. The data and analysis provide solid within-distribution evidence that models trained with PPO and imitation learning outperform untrained models and standard convolutional networks. However, the evidence for brittleness in out-of-distribution encoding remains incomplete, as the claim that this stems from the networks' training rather than from alternative causes-like overfitting of ridge regression parameters-is not yet fully supported.

    2. Reviewer #1 (Public review):

      Summary:

      This study uses an encoding model approach to compare a range of different deep learning models in predicting functional MRI data, collected while participants played the game "Super Mario Bros" inside the scanner. The fMRI data is rich, within-subject data, with around 15 hours of gameplay for each of five participants who took part in the study. A range of models are compared, including deep RL models (PPO), behaviour cloning (imitation learning), supervised visual models (ResNet), and untrained but structurally equivalent models. The main metric of model comparison is brain prediction (i.e., cross-validated R^2, and within-subject generalisation to out-of-distribution gameplay), rather than focussing on which model features are being encoded.

      The core results are:

      (1) The deep RL and imitation learning models show a modest improvement in prediction accuracy relative to the untrained and visual models (around a 1-2% increase in R^2). Notably, this is against a background in which the untrained model - essentially random projections of the gameplay pixels - can explain around 6 or 7% of the variance in fMRI data (Figure 2). So, the improvement in model fit is a small (but significant) one, and a major driver of prediction scores appears to be low-level visual stimulation as opposed to gameplay prediction.

      (2) There is little variation across layers in prediction accuracy in the trained models. In the untrained model, prediction accuracy drops across layers. This suggests that the prediction accuracy in this untrained model results from its (early-layer) representations being closer to what is presented on screen - as the random weights move the untrained model's representation away from sensory features, it becomes less predictive of the brain. In a trained model, meaningful representations are maintained in deeper layers - and interestingly, there is no clear correspondence between layers of the model and layers of the visual pathway.

      (iii) There is a noticeable improvement in brain prediction by both the deep RL and imitation models with model training. In other words, the 1-2% increase in R^2 mentioned in point (i) is a result of the training, rather than any other factor.

      (iv) None of the models, including the untrained model, perform well in generalising to out-of-distribution data held out from the training/evaluation. This leads to the claim that the brain's encoding representations are 'brittle'.

      Strengths:

      (1) A major strength of the dataset is that it contains rich, extended naturalistic gameplay data within individual subjects. This mirrors some of the advantages seen in other naturalistic datasets (e.g., natural scenes dataset, storybook listening, video watching) - but there are very few examples of such data where the subject is controlling or generating the behaviour in the naturalistic task. This allows potentially new questions to be asked about how these representations are learned across time, within individual participants.

      (2) A further strength of the manuscript is the clarity with which the aims and hypotheses are articulated in the introduction, and evaluated/discussed throughout the paper. This provides a clear set of objective criteria against which to evaluate the performance of the resulting models; the paper is also written in a very clear and honest way, in that some of the a priori hypotheses are not supported - this makes for a more transparent report than one written in an a posteriori manner.

      (3) Finally, although the results in comparing different models are perhaps not as impressive as one might have hoped, the authors have been quite careful in making the models comparable in terms of their architecture and number of parameters, etc. This means that any variation in prediction is likely attributable to the different objective functions used to train the models, rather than other features of the model architecture.

      Weaknesses:

      (1) The work is currently framed as "training neural networks from scratch...leads to brittle brain encoding" - but I'm not sure that the results fully support this. First, the brittleness is still present in the untrained network (i.e., random projections of pixels), as shown in Figure 5b. This implies that the brittleness may not be a consequence of the network training, but of overfitting to the encoding (ridge regression) model of the fMRI data (as the authors acknowledge when presenting these results). I would instead encourage the authors to shift the emphasis slightly towards the (modest) improvement in prediction using the RL/imitation objectives, and/or the (similarly modest) improvement in prediction with training, rather than foregrounding the brittleness of the encoding.

      (2) While the analyses of how model prediction improves with training are nice, it is a shame that there is no consideration of how prediction improves (or otherwise) across the training of the participants. Do participants improve across the 15 hours of gameplay - or do they, for instance, become more predictable by the imitation learning model? Is this more true in the naïve participants than those with extensive past experience of Mario? And does this in any way lead to better alignment with model predictions across sessions? These all seemed like natural questions that could benefit from the unique longitudinal nature of this dataset, and it seemed a shame that they were not touched upon at all.

      (3) While there is little variation between the models in terms of predictive performance, it is currently a little unclear whether this is simply due to fitting a set of highly parameterised models to the data, or because the models are themselves fundamentally similar in their representations. One way to address the latter point might be to perform some kind of RSA or CKA (Kornblith et al, arXiv 2019; Williams et al, bioRxiv 2024) across the layer representations within-model, and between-models, to ask how similar (or different) the learned representations are between the different models used for fMRI prediction.

    3. Reviewer #2 (Public review):

      Summary:

      This paper aims to test whether training models to play video games from visual inputs through reinforcement learning leads to better matches to human visual encoding during gameplay, compared to models with the same architecture and training images but with different training objectives. The authors find a slight advantage for the RL model, but encoding performance and generalization overall are weak and variable.

      Strengths:

      This was a reasonable hypothesis to test, and the model comparisons adequately represent other possibilities for training a model of the given architecture. The ResNet proxy is a particularly interesting way to benefit from a larger model's pre-training while still using the same constrained architecture and training set.

      Weaknesses:

      I always prefer to see learning curves for models on the tasks they were trained on, just to contextualize their performance on the brain encoding results, but they are not shown here.

      The paper misses some of the relevant literature that has performed similar comparisons across learning objectives for visual encoding models, such as https://arxiv.org/abs/2112.02027 and https://pmc.ncbi.nlm.nih.gov/articles/PMC10569538/

      The authors end up advocating for the idea that large-scale pre-training is needed in order to build good visual encoders for matching human data. In many ways, this was already known (given that brain encoding scores scale with imagenet performance, which requires at least a moderate amount of general-purpose image training to achieve). However, they also note that "the brain encoding performance of the ResNet model was not significantly different from that of the Untrained model." I would assume that an ImageNet-trained ResNet would be in the direction of the type of large-scale pre-trained model the authors advocate for (even when not trained for action generation), yet their results don't support this direction being the solution. Are their results about Resnet not surpassing an untrained model consistent with prior work, and if not, why not? How do they view this in light of their argument for the use of larger models?

    4. Reviewer #3 (Public review):

      Summary

      In this paper, the authors have 5 human subjects learn to play Super Mario Bros while undergoing fMRI for 15 hrs each. They compare a reinforcement learning (RL) model (PPO), an imitation learning (IL) model, and a vision model (ResNet) in their ability to play the game, match human behavior, and, critically, explain human brain activity.

      The key findings can be summarized as follows:

      (1) RL, IL, and vision models explain similar amounts of variance in the BOLD signal (Fig 2a), with a significant but small trend of RL > IL > ResNet (Tab 1).

      (2) Untrained models with the same architecture explain a smaller but very similar amount of variance (Figure 2a, Table 1).

      (3) The brain maps across all models (and layers) are strikingly similar, with the strongest effects in visual, parietal, and motor regions (Figures 2b, 2d; Supplementary Material II).

      (4) Behavioral and neural performance are correlated across model checkpoints (but not levels), such that later checkpoints in training have better behavioral and neural encoding performance (Figures 3 & 4), although the neural effect plateaus pretty quickly.

      (5) Out-of-distribution performance is quite poor, both behaviorally (Figure 5a) and neurally (Figure 5b).

      I believe this work will be of interest to neuroscientists, cognitive scientists, and AI researchers alike. There has been a growing trend in neuroscience to adopt AI models as cognitive models of complex perception and action, while at the same time, AI researchers are increasingly looking at the brain for inspiration. The key finding of this paper -- that these models fail to generalize to out-of-distribution levels -- questions the core assumptions of this whole enterprise.

      Strengths:

      Unlike previous studies applying machine learning to naturalistic game-play, the authors take great care to make sure their models are evaluated on an equal footing, using equivalent or similar architectures/number of parameters and training data.

      While the number of subjects (5) is relatively small, the amount of data per subject (15 hours) is impressive, which is important for fitting the imitation learning & ResNet models and for obtaining reliable encoding performance for each individual subject. The authors employed a train/val/test split and held out sets, the gold standard in the literature.

      Overall, the paper was well-written and easy to follow. The figures clearly illustrate the main findings.

      Weaknesses:

      (1) Missing statistical tests

      I think the main weakness of the paper is that many of the claims are qualitative in nature and lack appropriate statistical tests, for example:

      - "The conv3 layer has the highest brain encoding score";<br /> - "Robust association between task performance and brain encoding" ;<br /> - "Level patterns strongly predict brain encoding";<br /> - "Brain encoding performance was severely degraded";<br /> - "Effect of training on brain encoding was apparent".

      While these effects are indeed qualitatively visible in the figures, it is unclear which of these differences are significant (with the notable exception of Table 1). I believe the paper would benefit substantially if these effects were quantified and every claim were supported by the appropriate statistical tests. As an example, with the exception of Table 1 and the corresponding paragraph, I could not find any p-values in the results section.

      (2) Missing model performance and human-likeness

      Also absent from the results is an assessment of model performance on the task and similarity to human performance/behavior. From Figures 3 and 4, we can see that the game score of PPO is around 500-1000 - how does that compare to the humans? We can also see that the imitation scores for IL are around 0.4-0.7, but what does that mean? Such results would be crucial to assess if the models have indeed learned to play the games and/or imitate the humans, and therefore, whether they would be good candidates as cognitive models (before even looking at brain activity). At minimum, plotting the human versus model game scores (see e.g. Tomov et al. 2023 Neuron, Figure 2) would be helpful; or, if you'd like to dig deeper, showing that human actions are more valuable or more likely under those models (see e.g. Cross et al. 2022 Neuron, Figure 2). It might also be helpful to look at imitation scores for the RL model and game performance of the imitation model -- I suspect they will both be bad, but they can at least serve as informative baselines for their counterparts.

      (3) Possible undertraining

      Relatedly, one possible explanation for why the Untrained model does so well is that all the models may be effectively undertrained. For example, while there are no training curves in the paper, it seems from the spacing of the checkpoint game scores (x-axis on Figure 3c) that the RL model may not have converged yet (it would be helpful if those were somehow colored by training epoch). Showing training curves would be helpful (i.e., something similar to Figure 3a, except with performance on the y-axis).

      Additionally, it would be great to provide more details regarding the PPO training protocol. How many episodes? How many steps per episode? How many steps for all of the training? Similarly, for the imitation learning model: batch size, number of epochs, optimizer, scheduler, etc.

      (4) Mysterious poor encoding performance of Untrained and ResNet models on the held-out set

      Critically, and related to that, I'm a little confused about the Untrained model results on the held-out set (Figure 5b, top row on the right). Why should those be any different from the test set results with the Untrained model (Figure 2a, right, fourth row from the top)? It makes sense why the other models are worse on the held-out set -- they have never been trained on any frames from those levels. However, the untrained model has not been trained on *any* frames from *any* levels, including the test set and the held-out set.

      The same is true for the ResNet model, which is pre-trained on a completely separate data set and yet similarly shows worse performance on the held-out set compared to the test set.

      This cannot be explained by the ridge regression, which has no parameters or hyperparameters fitted on either the test set or the held-out set.

      The big discrepancy in the untrained model & ResNet results between the test and the held-out set makes think that there is something substantially different about the levels in that held-out set; that they are truly out of distribution compared to the other 20 levels (e.g., maybe they're the last 2 hardest levels and look completely differently? e.g. ResNet proxy in Fig 5c shows worse performance than the mean, which is indicative of an anti-correlation). Alternatively, it may be some issue with the analysis pipeline. The poor generalization results are central to the claims of the paper, so I believe this should be clarified.

      (4) Brittleness conclusion rationale

      I'm not quite on board with the author's rationale that "[poor model performance on the out-of-distribution levels] demonstrates that the models we tested are limited in scope and may not provide a valid inference of brain-like processing, as human behavior remains robust and generalizable across levels".

      For one, unlike the models, humans were actually trained on those levels, so it would not be surprising if they perform just as well on them as on the other levels (but do they? Again, it would be great to see some behavioral data from the humans and the models).

      Second, as the authors themselves show, task performance and human-likeness do not really correlate with neural encoding across levels (Fig 4a & b, respectively), so even if model performance remained "robust and generalizable" on the held-out levels, that will not necessarily translate to good neural encoding.

      Thirdly, and perhaps most importantly, unless the test set and held-out set were sampled exclusively from the practice phase when the subjects have mastered all the levels (that doesn't seem to be the case, but the authors should clarify), then the humans are continuously learning, which means that their own internal representations of the game are evolving. That's not the case for the models, which I assume are in "inference mode" when their representations are extracted for neural encoding. That is, their weights are frozen. So there's a fundamental mismatch between the mode in which humans are operating (continuously learning and executing) and the mode in which the models are operating (just executing). While this is true for all the levels, it may partially account for the discrepancy in the held-out set specifically.

    1. eLife Assessment

      This study adds important data on the transcriptional identity of the motor neurons innervating eye muscles in larval zebrafish, and shows how disruption to a specific gene, sim1a, impairs the movements of the eye. The evidence supporting the claims is convincing, with bulk and single-cell RNA sequencing as well as functional testing of the vestibulo-ocular reflex. This work will be of interest to developmental biologists and eye movement specialists.

    2. Reviewer #1 (Public review):

      This study adds important data identifying how ocular motor neurons are transcriptionally specified and identifies additional genes important in ocular motor neuron function. The evidence supporting the claims is convincing, with bulk and single-cell RNA sequencing as well as functional testing of the vestibulo-ocular reflex. This work will be of interest to developmental biologists and eye movement specialists.

      Gershowitz, Hamling, et al investigate genes that specify specific cell populations within cranial motor nuclei III and IV, which control eye movements, by bulk and single-cell RNA sequencing, confirmatory in situ hybridization, and functional studies of vestibulo-ocular reflex in knock-out animals. They take advantage of the timing difference in the generation of dorsal versus ventral cells to selectively mark early-born (dorsal) vs late-born (ventral) cells using the Kaede photolabile protein. They used bulk RNASeq to identify differentially expressed genes between the two populations (which innervate different extraocular muscles). They next used single-cell RNASeq to further identify specific subpopulations of motor neurons and identify 3 main clusters, which broadly map to dorsal CNIII, CNIV, and ventral CNIII. They show that the differentially expressed genes identify subpopulations of neurons, rather than reflecting temporal changes related to cell age via a series of in situ hybridizations across ages. Finally, they show that knock-out of Sim1a, which is unregulated in dorsal nIII neurons, leads to decreased vestibulo-ocular reflex, despite a normal number of neurons in nIII. They tested the knock-out of two other differentially expressed genes, nav2a and onecut1, but found both normal cell number and normal vestibulo-ocular reflex.

      The conclusions of this paper are well supported by the data. As the authors acknowledge, additional experiments would add to the interpretation. Since the Sim1a mutants have normal cell numbers, the authors hypothesize that axon guidance may be disrupted, leading to the phenotype. This could be relatively easily assessed using the Isl1-GFP transgenic line and examining innervation patterns in the extraocular muscles. Additionally, testing horizontal eye movements and eye movements in response to visual, rather than vestibular, inputs would further refine the phenotypes and perhaps identify eye movement abnormalities in the mutant fish with normal VOR.

      More information on why these specific genes were prioritized for functional testing would be helpful, as it is unclear why these three genes were the top candidates.

      The authors should also include a discussion of other subtypes of oculomotor neurons, beyond which muscle they innervate. For example, there are oculomotor neurons that form single neuromuscular junctions on fast, singly-innervated fibers, and there is a separate pool of motor neurons that innervate the slow, multiply-innervated fibers. It would be interesting to note if there were any gene expression differences within the clusters that might represent this subdivision of neurons.

      This data is likely to be of great use to the field in further studies of cranial motor neuron biology.

    3. Reviewer #2 (Public review):

      Summary:

      The goal of the work is to identify genes that are uniquely expressed in subsets of eye muscle-innervating motor neurons, as a way to identify candidate genes for strabismus, a congenital vision disorder in humans. The author's previous work identified birth-order differences that correlate with the positions of neurons in the oculomotor (cranial nerve III) motor nucleus. Here, they use Kaede photoconversion to distinguish early- from late-born neurons and identified transcriptional differences between them by bulk RNA sequencing of FACS-sorted cells. Separately, they used single-cell RNA-Seq to sequence the transcriptomes of 89 extraocular motor neurons. They find signatures of early-born mIII, late-born mIII, and mIV neurons. While there is some overlap in gene expression, some of the differentially expressed genes are confirmed by HCR as being unique to one of these three populations of extraocular motor neurons.

      The authors test the functions of three differentially expressed genes in the vestibulo-ocular reflex by measuring the speed of rotation of the eye in response to the larval fish being tilted 15° from horizontal. One mutant, in the sim1a transcription factor, has markedly slowed responses. Although this is a global knock-out, the authors argue that this defect in the vestibulo-ocular reflex is due to a loss of sim1a function specifically in dorsal mIII neurons because sim1a is not expressed in the two upstream neurons in the vestibulo-ocular reflex circuit.

      Strengths:

      (1) This is the first time that transcriptional differences between and within extraocular muscle-innervating neurons have been described during development. In identifying differentially expressed genes that correspond with anatomical, functional, and temporal subdivisions of these neurons, they support the idea that gene expression programs established early in development underlie the functional differences amongst these neurons.

      (2) The combination of bulk RNA-Seq and single-cell RNA-Seq strengthens the identification of sim1a-expressing early-born mIII neuron subtype.

      (3) The work identifies candidate genes for strabismus.

      Weaknesses:

      (1) The authors show that sim1a is only expressed in mIII neurons and no other cells in the vestibulo-ocular reflex, as evidence that the phenotype in sim1a mutants is due to loss of its expression specifically in mIII neurons. However, as the authors note in the discussion, sim1a has other functions in zebrafish, including global calcium homeostasis via specification of the corpuscles of Stannius. The loss of this, or of some other sim1a function, could be indirectly responsible for the slow vestibulo-ocular response in sim1a mutants.

      (2) The authors perform the vestibulo-ocular response test in sim1a mutants at 7 dpf, which is within a day of when the mutants die, raising the concern that the slowed response is due to a dire systemic condition. The argument that nav2 mutants also die at 7 dpf but have a normal response is weak, since death does not always take a single course.

      (3) The evaluation of the sim1a mutant phenotype is limited to the vestibulo-ocular reflex. The authors do not explore whether the oculomotor neuron innervation of target extraocular muscles is affected in sim1a mutants.

    1. eLife Assessment

      This paper presents a valuable theoretical model of cell breakout from spheroids, a situation relevant to tissue invasion and metastasis; a helpful feature of the model is to include the extracellular matrix as a network of springs. The paper explains the interesting observation that fluid-like spheroids made of soft cells appear experimentally more able to remodel the extracellular matrix (ECM) while they generically display smaller mechanical stress, by invoking feedback loops between shape, strain, stress, and adhesion. While the theoretical evidence is solid, the model suffers from topological limitations inherent to the vertex model and leaves open questions regarding the means by which cells achieve cell-level stress amplification. The connection between the model's assumptions and known molecular mechanisms could be developed further.

    2. Reviewer #1 (Public review):

      Summary:

      In this article, the authors couple a 3d vertex model to the extracellular matrix and include activity through contractile springs at the edge. They study, sequentially, the distribution of shear stresses in liquid and solid spheroids, the correlation between stress and cell shape, and the spatial distribution of stresses. The authors find that stresses are higher in solid spheroids (somewhat unsurprisingly), but that the stress distributions are wider in the fluid spheroids. Moreover, stress and shape are not correlated with each other in solids (that seems to be due to vertex model peculiarities), but they are for liquids. In contrast, for solids, the stresses are concentrated at the interface.

      The authors attribute a lot of the phenomenology to strain-stiffening properties of vertex models as being akin to a network model (correctly in my opinion). Then they strain individual cells and confirm this link, though I missed any explanation of how they did this. Would it have to be within a medium for computational consistency?

      Finally, they generate an extended vertex model, where they replace the single face linking cells with a double face and mechanoresponsive springs. This allows for stronger coupling of individual cell motion to eventual movement out of the spheroid.

      Strengths:

      Coupling a three-dimensional vertex model to the extracellular matrix, modelled as a crosslinked fiber model, is a computational tour-de-force. Adding activity through fluctuations at the interface is also of the correct symmetry (stresses), instead of the self-propulsion which has been used by other authors, and which is not compatible with Newton's 3rd law. This also allows for accurate back-and-forth mechanical coupling between the cells and the ECM.

      I would like to highlight that deriving vertex model stress tensors in full three dimensions is an open problem due to the complex topology. Any progress is valuable, and decomposing things into tetrahedra like here will allow for connections with, in particular, finite element approaches. Therefore, adding some of these results (eq. 13) to the main text would strengthen the paper in my opinion.

      Adding the nonlinear springs to the VM in the 3rd act is a good idea, and a first step to mechanical feedback. One might argue that at this point, removing the vertex model part would even be an option.

      Weaknesses:

      The paper is written in a very qualitative manner, with all of the model equations and analysis hidden in the supplementary information. I do not understand this choice, as it makes things fuzzy and hard to read. The conclusion is also very long and simply reiterates the previous points.

      At the same time, this paper is rather thin on new results and reads more like a handful of new simulations carried out using the method established in [10] (from largely the same authors). Moving some of the actual results to the main text would help, in particular, the 3d stress formulation and the definitions of different measures.

      Vertex models also have a very clear limitation: They cannot model the transition from a confluent to a non-confluent tissue, and individual cells or groups of cells leaving the spheroid. Even having a surface and having significant deformations of the surface are numerically dicey, so the current model is at the edge of what is feasible. The model as written can only do "invasion" by a single cell moving outward, and then another following it a bit (or not).

      I strongly suspect that further progress on 3d cell models will need particle-based models or models where cells are fully meshed surfaces (some of which are in development currently).

      However, none of these problems is mentioned anywhere in the text. The authors also do not review the increasingly broad zoology of other models.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript concerns the mechanisms by which cells in a spheroid embedded in the extracellular matrix can escape, either as single or multiple cells.

      Strengths:

      Overall, the manuscript is well written and easy to follow. The claims are mostly justified by the data. Some data can be better analyzed and presented to strengthen the conclusion.

      Weaknesses:

      (1) The description around Figure 2c is not exactly well supported by their results. While values close to 0 for sigma3 dot g3 for solid-like spheroids indicate little correlation between the direction of maximum stress and maximum elongation, this analysis alone does not imply that highly stressed cells are necessarily less globular. The dot product combines the magnitudes of the two vectors and the angle between them. For the distribution graph, it would be useful to have the cumulative frequency equal 1.

      (2) One of the central claims of the paper is that morphology alone is not a reliable indicator of mechanical state. Since the authors compute cellular stresses and cellular shape in their simulation (i.e., Figure 3a and b), can the authors directly plot these two quantities for individual cells in solid-like and fluid-like spheroids?

      (3) There is experimental evidence showing the solid stress inside a spheroid is higher than at the periphery (e.g., https://www.nature.com/articles/ncomms14056). How does this cellular stress relate to these experimental measurements, since they are opposite to what is simulated here (i.e., the authors find max shear stress is lowest in the center and increases towards the boundary, which is opposite to what is measured?

      (4) It's worth pointing out that stress fibers aren't really prominent in cells in 3D spheroids. Nonetheless, cells moving on collagen fibers would have stress fibers and utilize contractile actomyosin bundles to generate traction forces.

      (5) In section 2D, it talks about the result that as the kcc associated with the boundary cell is decreased 10-fold for every 5 percent strain decrease in the fiber target spring length, can this result be shown? I have a hard time seeing where this came from.

      (6) The results of single-cell vs. two-cell breakouts shown in Figure 5 b and c are very qualitative and should be accompanied by some quantitative comparison.

    4. Reviewer #3 (Public review):

      Summary:

      The authors describe a mathematical and computational approach used to compute stresses and cellular deformations in a multicellular spheroid embedded in a fiber network. This approach is then used to predict stress and cellular anisotropy distributions in "solid-like" and "fluid-like" spheroids. Simulations show that shear stresses in solid-like spheroids are large and concentrated at the boundary of the spheroid, yet cells do not align with the direction of the largest shear. Conversely, shear stresses in fluid-like spheroids are smaller and uniformly distributed in the spheroid. In this case, cellular elongation is more likely to be aligned with the direction of the largest shear stress. The model and simulations also predict a nonlinear stress-strain relationship that is indicative of strain stiffening. This strain-stiffening is more pronounced in fluid-like spheroids. In an extension of the preliminary polyhedral vertex model, in which cellular interfaces are shared, the authors incorporate mechanical cell-cell interactions via adhesion springs between neighboring vertices. Using this extension, they show that cell breakout is more likely to occur in fluid-like spheroids, where cells are more likely to elongate and stiffen, allowing for larger forces to be exerted on the surrounding fiber network. Furthermore, the authors state that anisotropic cell-cell adhesion is required for multicell streaming during breakout.

      Strengths:

      The modeling and computational approach used in this research is this work's biggest strength. Treating the embedded spheroid as a set of polyhedra, where each polyhedron represents a single cell, is a mechanically robust, yet still tractable way to model multicellular spheroids in three dimensions. Starting with expressions for constraining cell volume and surface area as well as a surface energy term, the authors derive an expression for an averaged stress tensor for each polyhedron. This allows the authors to approximate the stress in each polyhedral cell that is caused by cellular deformations during mechanical interactions with the extracellular fiber matrix. This is a clever and robust approach that is based on fundamental mechanical principles that allow one to make reasonable predications about the mechanical state of the spheroid under a variety of conditions.

      Weaknesses:

      The weakness of the manuscript is the exposition. There are significant pieces of critical information missing from the manuscript that would make the presented work significantly more understandable and better support the authors' claims. Most importantly, many necessary details of the model are missing. I was able to get a better understanding of some of these details by reading the authors' earlier work (ref [10] in the submitted manuscript), and for this reason, I do feel that this work has value. However, several descriptions must be added for the paper to be more readily understandable. These include (1) a better explanation of what drives motion, in particular in the case where no external fiber network is present. (2) What physically distinguishes fluid-like spheroids from solid-like spheroids? Simply stating the value of the parameters s0 with no explanation is not sufficient. (3) An explanation of how histograms in Figure 2 are calculated is necessary. Are these histograms based on one simulation or several simulations? (4) The experimental results are briefly mentioned, but significantly more connection between these results and the numerical results of the cell breakout model is needed. (5) The description of the model that incorporates variable cell-cell attachments and cell breakout is very terse and needs more detail. Moreover, while the description of the results of this model is strong, the figure that illustrates cell breakout (Figure 5) is difficult to interpret. Addressing these and other issues will make the current manuscript, which presents an interesting model and result, much stronger and easier to read.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this article, the authors couple a 3d vertex model to the extracellular matrix and include activity through contractile springs at the edge. They study, sequentially, the distribution of shear stresses in liquid and solid spheroids, the correlation between stress and cell shape, and the spatial distribution of stresses. The authors find that stresses are higher in solid spheroids (somewhat unsurprisingly), but that the stress distributions are wider in the fluid spheroids. Moreover, stress and shape are not correlated with each other in solids (that seems to be due to vertex model peculiarities), but they are for liquids. In contrast, for solids, the stresses are concentrated at the interface. The authors attribute a lot of the phenomenology to strain-stiffening properties of vertex models as being akin to a network model (correctly in my opinion). Then they strain individual cells and confirm this link, though I missed any explanation of how they did this. Would it have to be within a medium for computational consistency?

      We thank the reviewer for this helpful comment. The current manuscript already describes this procedure in Sec. II.C, “Cell strain-stiffening with volume-preserving deformations,” where we state that individual cells are taken from the final spheroid configuration and then strained by imposing a prescribed volume-preserving deformation along their principal elongation axis. Figure 4 then compares the original and strained cells and shows the resulting increase in maximum shear stress.

      We agree, however, that this point was not explained clearly enough. In the revised manuscript, we will make explicit that this is a single-cell deformation test designed to isolate the intrinsic strain-stiffening response of the vertex-model cell. The cell does not need to remain embedded in a surrounding medium for this specific test, since the goal is not to simulate the full coupled cell–ECM dynamics, but rather to measure how the stress of an individual vertex-model cell changes under imposed strain.

      Indeed, single cells can exhibit strain stiffening as presumably can a spheroid. However, given that we are studying strain stiffening in the context of single/few cell breakout, we also plan to measure the stress in the breakout cells in the extended vertex model to determine the extent of strain stiffening given the surrounding medium of fibers and cells.

      Finally, they generate an extended vertex model, where they replace the single face linking cells with a double face and mechanoresponsive springs. This allows for stronger coupling of individual cell motion to eventual movement out of the spheroid.

      Strengths:

      Coupling a three-dimensional vertex model to the extracellular matrix, modelled as a crosslinked fiber model, is a computational tour-de-force. Adding activity through fluctuations at the interface is also of the correct symmetry (stresses), instead of the self-propulsion which has been used by other authors, and which is not compatible with Newton's 3rd law. This also allows for accurate back-and-forth mechanical coupling between the cells and the ECM.

      I would like to highlight that deriving vertex model stress tensors in full three dimensions is an open problem due to the complex topology. Any progress is valuable, and decomposing things into tetrahedra like here will allow for connections with, in particular, finite element approaches. Therefore, adding some of these results (eq. 13) to the main text would strengthen the paper in my opinion.

      Adding the nonlinear springs to the VM in the 3rd act is a good idea, and a first step to mechanical feedback. One might argue that at this point, removing the vertex model part would even be an option.

      Weaknesses:

      The paper is written in a very qualitative manner, with all of the model equations and analysis hidden in the supplementary information. I do not understand this choice, as it makes things fuzzy and hard to read. The conclusion is also very long and simply reiterates the previous points.

      At the same time, this paper is rather thin on new results and reads more like a handful of new simulations carried out using the method established in [10] (from largely the same authors). Moving some of the actual results to the main text would help, in particular, the 3d stress formulation and the definitions of different measures.

      We thank the reviewer for this constructive criticism. We agree that the main text was too qualitative and that placing most of the equations and definitions in the Supplement made the manuscript harder to read. In the revised version, we will move the essential technical material into the main text, including the 3D cell stress formulation, the definitions of maximum shear stress and cell-shape anisotropy, and the stress–shape alignment measure. Longer derivations and implementation details will remain in the Supplement.

      We will also shorten and reorganize the Discussion/Conclusion to avoid reiterating previous points. Finally, we will revise the presentation to make the new contributions beyond Ref. [10] clearer: the 3D polyhedral-cell stress formulation, the stress-distribution and spatialpatterning analyses, the single-cell strain-stiffening test, and the extended adhesion-spring model used to distinguish single-cell from multi-cell breakout. These changes should make the paper less qualitative and make the main results more visible in the body of the manuscript.

      Vertex models also have a very clear limitation: They cannot model the transition from a confluent to a non-confluent tissue, and individual cells or groups of cells leaving the spheroid. Even having a surface and having significant deformations of the surface are numerically dicey, so the current model is at the edge of what is feasible. The model as written can only do "invasion" by a single cell moving outward, and then another following it a bit (or not).

      I strongly suspect that further progress on 3d cell models will need particle-based models or models where cells are fully meshed surfaces (some of which are in development currently).

      However, none of these problems is mentioned anywhere in the text. The authors also do not review the increasingly broad zoology of other models.

      We thank the reviewer for raising this important limitation of standard vertex models. We agree that a strictly confluent 3D vertex model is not designed to fully capture the transition from a confluent tissue to freely migrating detached cells, and we will make this limitation explicit in the revised Discussion. However, the standard 3D vertex model can still capture collective spheroid deformation, surface remodeling, and local protrusive deformations prior to complete breakout. Thus, it remains useful for studying the mechanical state of the spheroid and the onset of outward deformation before full cell detachment.

      At the same time, we clarify that this very limitation motivated the extended vertex model introduced in Sec. II.D and Supplement G. In this model, cells no longer share interfaces as in a standard confluent vertex model; instead, neighboring cells interact through explicit, tunable cell– cell adhesion springs. This allows us to represent, in a coarse-grained mechanical way, the separation of a boundary cell from the spheroid and the motion of a follower cell behind it. Thus, while the model does not describe full post-detachment migration, it partially addresses the confluent-to-nonconfluent transition at the level needed to study the mechanical onset of breakout.

      We will revise the manuscript to make this distinction clearer and state that our goal is to identify minimal mechanical ingredients for incipient breakout—strain stiffening, adhesion weakening, and adhesion anisotropy—rather than to provide a complete model of long-time invasion.

      We will also note that the current Introduction already discusses several existing modeling approaches, including cellular automaton simulations, a 2D Voronoi model, phenotypeswitching/ECM-remodeling models, and the prior 3D vertex–fiber framework. However, we agree that this discussion should be broadened, and we will add a more explicit comparison with particlebased, phase-field, cellular Potts, and fully meshed deformable-surface models, which may be better suited for later-stage non-confluent migration.

      Reviewer #2 (Public review):

      Summary:

      The manuscript concerns the mechanisms by which cells in a spheroid embedded in the extracellular matrix can escape, either as single or multiple cells.

      Strengths:

      Overall, the manuscript is well written and easy to follow. The claims are mostly justified by the data. Some data can be better analyzed and presented to strengthen the conclusion.

      Weaknesses:

      (1) The description around Figure 2c is not exactly well supported by their results. While values close to 0 for sigma3 dot g3 for solid-like spheroids indicate little correlation between the direction of maximum stress and maximum elongation, this analysis alone does not imply that highly stressed cells are necessarily less globular. The dot product combines the magnitudes of the two vectors and the angle between them. For the distribution graph, it would be useful to have the cumulative frequency equal 1.

      We thank the reviewer for pointing this out. We agree that the interpretation of Fig. 2c should be stated more carefully. In our calculation, the vectors used in the dot product are normalized eigenvectors of the stress tensor and the gyration tensor. Thus, the plotted quantity measures only directional alignment between the principal stress direction and the cell elongation axis, not the magnitudes of stress or shape anisotropy. We will revise the text to make this explicit.

      We also agree that Fig. 2c alone does not support statements about whether highly stressed cells are more or less globular. It only quantifies alignment between stress and shape directions. To address this, we will add or refer to an additional analysis, such as the correlation between maximum shear stress and cell-shape anisotropy, or the shape-anisotropy distribution conditioned on high-stress cells.

      Finally, we agree that the distribution in Fig. 2c should be normalized more clearly. In the revised figure, we will plot the distribution as a probability density or cumulative distribution with total probability equal to one, and we will update the caption accordingly.

      (2) One of the central claims of the paper is that morphology alone is not a reliable indicator of mechanical state. Since the authors compute cellular stresses and cellular shape in their simulation (i.e., Figure 3a and b), can the authors directly plot these two quantities for individual cells in solidlike and fluid-like spheroids?

      We thank the reviewer for this helpful suggestion. We agree that a direct cell-by-cell comparison of cellular stress and cellular shape would strengthen the central claim that morphology alone is not a reliable indicator of mechanical state. In the revised manuscript, we plan to add scatter plots of maximum shear stress versus cell-shape anisotropy for individual cells in both solid-like and fluid-like spheroids.

      (3) There is experimental evidence showing the solid stress inside a spheroid is higher than at the periphery (e.g., https://www.nature.com/articles/ncomms14056). How does this cellular stress relate to these experimental measurements, since they are opposite to what is simulated here (i.e., the authors find max shear stress is lowest in the center and increases towards the boundary, which is opposite to what is measured?

      We thank the reviewer for raising this important point. We agree that the comparison with experimental stress measurements in compressed spheroids should be clarified.

      The main distinction is that the cited experiments measure local pressure, or isotropic compressive stress, from the volume change of embedded elastic beads. In contrast, Fig. 3 in our manuscript shows the cellular maximum shear stress, which reflects the deviatoric part of the cell stress tensor. These quantities do not necessarily have the same spatial profile: a region can be under high isotropic compression while having low shear stress. The loading conditions are also different. The experiments apply external osmotic/mechanical compression to the whole spheroid, whereas our simulations consider active cell–ECM coupling through contractile linker springs at the spheroid boundary. Thus, the elevated boundary shear stress in our model reflects local cell– ECM force transmission, not internal hydrostatic pressure. We indeed will revise the manuscript to make this distinction explicit, cite this experimental work, and avoid implying that maximum shear stress is directly comparable to measured solid pressure. Where appropriate, we will also discuss the isotropic component of the simulated cell stress tensor as a more direct comparison to pressure-based measurements.

      (4) It's worth pointing out that stress fibers aren't really prominent in cells in 3D spheroids. Nonetheless, cells moving on collagen fibers would have stress fibers and utilize contractile actomyosin bundles to generate traction forces.

      We thank the reviewer for this clarification. We did not intend to imply that prominent stress fibers are generally present in cells within the interior of 3D spheroids. The relevant statements in the manuscript were meant to refer to strained boundary cells or cells engaging collagen fibers during mesenchymal-like motion. We will revise the wording in Secs. II.C and II.D to make this distinction explicit and avoid suggesting that bulk spheroid cells generally contain prominent stress fibers.

      (5) In section 2D, it talks about the result that as the kcc associated with the boundary cell is decreased 10-fold for every 5 percent strain decrease in the fiber target spring length, can this result be shown? I have a hard time seeing where this came from.

      We thank the reviewer for this comment. The 10-fold decrease in kcc for every 5% decrease in the fiber target spring length was meant as a phenomenological adhesion-weakening protocol, not as a directly measured law. We agree that this was not made clear enough. In the revised manuscript, we will explicitly state this.

      (6) The results of single-cell vs. two-cell breakouts shown in Figure 5 b and c are very qualitative and should be accompanied by some quantitative comparison.

      We thank the reviewer for this helpful suggestion. We agree that the current presentation of Fig. 5b,c is too qualitative. In the revised manuscript, we plan to add a quantitative comparison between the single-cell and two-cell breakout cases. Specifically, we plan to track the displacement of the pulled boundary cell, the separation between this leader cell and its neighboring/follower cell, and the distance between the follower cell and the remaining spheroid as the fiber target length is decreased.

      Reviewer #3 (Public review):

      Summary:

      The authors describe a mathematical and computational approach used to compute stresses and cellular deformations in a multicellular spheroid embedded in a fiber network. This approach is then used to predict stress and cellular anisotropy distributions in "solid-like" and "fluid-like" spheroids. Simulations show that shear stresses in solid-like spheroids are large and concentrated at the boundary of the spheroid, yet cells do not align with the direction of the largest shear. Conversely, shear stresses in fluid-like spheroids are smaller and uniformly distributed in the spheroid. In this case, cellular elongation is more likely to be aligned with the direction of the largest shear stress. The model and simulations also predict a nonlinear stress-strain relationship that is indicative of strain stiffening. This strain-stiffening is more pronounced in fluid-like spheroids. In an extension of the preliminary polyhedral vertex model, in which cellular interfaces are shared, the authors incorporate mechanical cell-cell interactions via adhesion springs between neighboring vertices. Using this extension, they show that cell breakout is more likely to occur in fluid-like spheroids, where cells are more likely to elongate and stiffen, allowing for larger forces to be exerted on the surrounding fiber network. Furthermore, the authors state that anisotropic cellcell adhesion is required for multicell streaming during breakout.

      Strengths:

      The modeling and computational approach used in this research is this work's biggest strength. Treating the embedded spheroid as a set of polyhedra, where each polyhedron represents a single cell, is a mechanically robust, yet still tractable way to model multicellular spheroids in three dimensions. Starting with expressions for constraining cell volume and surface area as well as a surface energy term, the authors derive an expression for an averaged stress tensor for each polyhedron. This allows the authors to approximate the stress in each polyhedral cell that is caused by cellular deformations during mechanical interactions with the extracellular fiber matrix. This is a clever and robust approach that is based on fundamental mechanical principles that allow one to make reasonable predications about the mechanical state of the spheroid under a variety of conditions.

      Weaknesses:

      The weakness of the manuscript is the exposition. There are significant pieces of critical information missing from the manuscript that would make the presented work significantly more understandable and better support the authors' claims. Most importantly, many necessary details of the model are missing. I was able to get a better understanding of some of these details by reading the authors' earlier work (ref [10] in the submitted manuscript), and for this reason, I do feel that this work has value. However, several descriptions must be added for the paper to be more readily understandable.

      These include

      (1) A better explanation of what drives motion, in particular in the case where no external fiber network is present.

      We thank the reviewer for pointing this out. We agree that the source of motion should be described more clearly. In the embedded simulations, motion arises from overdamped dynamics driven by the forces from the total mechanical energy, including spheroid mechanics, fibernetwork elasticity, and active contractile linker springs at the boundary. The shortening of the linker-spring target lengths provides the active cell–ECM pulling, while effective fluctuations promote cell-shape fluctuations and rearrangements.

      When no external fiber network is present, these linker-mediated cell–ECM forces are absent. The spheroid then evolves only through vertex-model mechanical relaxation, surface tension, cell rearrangements, and effective fluctuations. We will clarify that this no-network case is a control for the intrinsic spheroid stress state, not a simulation of ECM-driven invasion.

      (2) What physically distinguishes fluid-like spheroids from solid-like spheroids? Simply stating the value of the parameters s0 with no explanation is not sufficient.

      We thank the reviewer for pointing out that the physical distinction between solid-like and fluid-like spheroids was not sufficiently explained. We agree that simply stating the values of s_0 is not adequate.

      In this 3D vertex model, the target shape index s_0 controls the mechanical cost of cell rearrangements. Below the rigidity transition (s_0 < s_0^), neighbor exchanges are associated with finite energy barriers, leading to slow structural relaxation and solid-like behavior. Above the transition (s_0 > s_0^), these barriers become very small or vanish, allowing cells to readily move past one another and continuously reorganize their local neighborhood structure. The resulting tissue exhibits fluid-like behavior with efficient stress relaxation through cell rearrangements.

      This distinction was characterized in detail in Ref. [9], where the bulk 3D vertex model was shown to undergo a rigidity transition at approximately (s_0^*=5.39), based on the decay of the neighbor-overlap function and cell trajectories. The solid-like value used here lies below this transition, whereas the fluid-like value lies above it. We acknowledge that the present manuscript only briefly summarized this point, mainly in Supplementary Material A. In the revised manuscript, we will add a clearer explanation in the main text of how the target shape index controls the state of the spheroid and why the selected values correspond to solid-like and fluidlike regimes.

      (3) An explanation of how histograms in Figure 2 are calculated is necessary. Are these histograms based on one simulation or several simulations?

      We thank the reviewer for pointing out that this was not sufficiently clear. The histograms in Fig. 2 are obtained by pooling cell-level quantities from multiple independent simulations, not from a single realization. As listed in Table I, we use 30 independent realizations. We plan to state this explicitly in the revised figure caption and main text.

      (4) The experimental results are briefly mentioned, but significantly more connection between these results and the numerical results of the cell breakout model is needed.

      We agree. In the current manuscript, the experimental data are used mainly to motivate the single-cell and streaming-like breakout modes shown in Fig. 5. We plan to revise Sec. II.D and the Fig. 5 caption to make the connection more explicit: the MEF spheroid experiments show the invasion modes that motivate the model, while the extended vertex model tests minimal mechanical ingredients capable of producing analogous single-cell and follower-cell breakout.

      (5) The description of the model that incorporates variable cell-cell attachments and cell breakout is very terse and needs more detail. Moreover, while the description of the results of this model is strong, the figure that illustrates cell breakout (Figure 5) is difficult to interpret. Addressing these and other issues will make the current manuscript, which presents an interesting model and result, much stronger and easier to read.

      We thank the reviewer for this constructive assessment. We agree that the extended model with variable cell–cell attachments was described too tersely and that Fig. 5b,c was difficult to interpret in its current qualitative form.

      To make Fig. 5 more quantitative, we plan to add measurements comparing the single-cell and two-cell breakout cases. Specifically, we plan to track the displacement of the pulled boundary cell, the separation between this leader cell and its neighboring/follower cell, and the distance between the follower cell and the remaining spheroid as the fiber target length is decreased.

    1. eLife Assessment

      The authors combine experiments and mathematical modeling to determine how the infectivity of human cytomegalovirus scales with the viral concentration in the inoculum, i.e., considering the multiplicity of infection (MOI). They propose and test different model assumptions to explain a mechanism termed "apparent cooperativity" of virions based on an observed super-linear increase of the number of infected cells with increasing inocula. The authors present a solid study showing valuable findings for virologists and quantitative scientists working on the analysis and interpretation of viral infection dynamics for which quantitative knowledge of MOI is needed.

    2. Reviewer #1 (Public review):

      Summary:

      In this paper, the authors conduct both experiments and modeling of human cytomegalovirus (HCMV) infection in vitro to study how the infectivity of virus (measured by cell infection) scales with the viral concentration in the inoculum. A naïve thought would be that this is linear in the sense that doubling the virus concentration (and thus the total virus) in the inoculum would lead to double the fraction of infected cells. However, the authors show convincingly that this is not the case for HCMV, using multiple strains, two different target cells, and repeated experiments. In fact, they find that for some regimens (inoculum concentration) infected cells increase faster than the concentration of the inoculum, which they term "apparent cooperativity". The authors then provided possible explanations for this phenomenon and construct mathematical models and simulations to implement these explanations. They show that these ideas do help explain the cooperativity, but can't be conclusive as to what is the correct explanation. In any case, this advances our knowledge of the system and it is very important when quantitative experiments involving MOI are performed.

      Strengths:

      Careful experiments using state-of-the-art methodologies and advancing multiple competing models to explain the data.

      Weaknesses:

      Minor weaknesses in explaining the implementation of the model. However, some specific assumptions, which to this reviewer were unclear, could have substantial impact on the results. For example, whether cell infection is independent or not. This is expanded below.

      In the revised version, the authors address almost all of these minor weaknesses, strengthening the paper and its reproducibility.

      Suggestions to clarify the study:

      In the revised version, the authors carefully consider these suggestions and provide further details, clarifications and even some new results. Regarding the question of how infection of a cell with one virus could lead to lower probability for a secondary infection, I think that it is possible that infected cells activate antiviral programs that lead, for example, to lower expression of surface receptors. This has been considered at least in hepatitis C virus infection. However, this is a minor point.

      Overall, I think the revised version provides a sound study with relevant conclusions, and I thank the authors for their thoughtful consideration of my previous comments.

    3. Reviewer #2 (Public review):

      In their article, Peterson et al. wanted to show to what extent the classical "single hit" model of virion infection, where always the same quantity of virion is required to infect a cell, does not match with empirical observations based on human cytomegalovirus in vitro infection model, and how this would have practical impacts in experimental protocols.

      Strengths:

      - The use of a very simple and robust experimental assay, where they infected cells with serially diluted virions and measured the proportion of infected cells with flow cytometry. This convincingly showed how the proportion of infected cells differed from a "single hit" model which they simulated using a simple mathematical model ("power-law model"), and better fitted a model where virions need to cooperate to infect cells.

      - The use of different cell types and virus strains, which allows to draw some generalizations.

      - The exploration of the mechanisms that could explain this apparent cooperation, using biologically plausible simulations.

      - The practical consequences that this phenomenon has for lab virologists as well as modelers.

      Weaknesses:

      - The impossibility to discriminate between biological mechanisms is an important limitation of this study and calls for developing experimental designs able to further understand this question.

      - The outcome of the virion clumping remains highly sensitive to the choice of the clumps size distribution, which is itself very complicated to estimate, especially at high dilution.

      - The impossibility to directly fit the mathematical models to the data limit them to a qualitative discussion.

      Overall, this work is very valuable as it raises the general question of how the estimate of infectivity can be biased if extrapolated from a single virus titer assay. The observation that HCMV virions often cooperate and that this cooperation varies between context seems robust. The putative biological explanations would require further exploration.

      This topic is very well known in the case of segmented viruses and the semi-infectious particles, leading to the idea of studying "sociovirology", but to my knowledge this is the first time that it was explored for a non-segmented virus, and in the context of MOI estimation.

    4. Author response:

      The following is the authors’ response to the current reviews.

      Public Review:

      Reviewer #1 (Public review):

      Suggestions to clarify the study:

      In the revised version, the authors carefully consider these suggestions and provide further details, clarifications and even some new results. Regarding the question of how infection of a cell with one virus could lead to lower probability for a secondary infection, I think that it is possible that infected cells activate antiviral programs that lead, for example, to lower expression of surface receptors. This has been considered at least in hepatitis C virus infection. However, this is a minor point.

      Yes, the possibility that infection of a cell by a virion would reduce chance of infection by another virion was allowed in our model. However, such as a process will not result in apparent cooperativity (n>1) in our model, and thus, is irrelevant to the issue of apparent cooperativity we identified.

      Reviewer #2 (Public review):

      In their article, Peterson et al. wanted to show to what extent the classical "single hit" model of virion infection, where always the same quantity of virion is required to infect a cell, does not match with empirical observations based on human cytomegalovirus in vitro infection model, and how this would have practical impacts in experimental protocols.

      Strengths:

      The use of a very simple and robust experimental assay, where they infected cells with serially diluted virions and measured the proportion of infected cells with flow cytometry. This convincingly showed how the proportion of infected cells differed from a "single hit" model which they simulated using a simple mathematical model ("power-law model"), and better fitted a model where virions need to cooperate to infect cells.

      The use of different cell types and virus strains, which allows to draw some generalizations.

      The exploration of the mechanisms that could explain this apparent cooperation, using biologically plausible simulations.

      The practical consequences that this phenomenon has for lab virologists as well as modelers.

      Thank you.

      Weaknesses:

      The impossibility to discriminate between biological mechanisms is an important limitation of this study and calls for developing experimental designs able to further understand this question.

      The outcome of the virion clumping remains highly sensitive to the choice of the clumps size distribution, which is itself very complicated to estimate, especially at high dilution.

      The impossibility to directly fit the mathematical models to the data limit them to a qualitative discussion.

      Overall, this work is very valuable as it raises the general question of how the estimate of infectivity can be biased if extrapolated from a single virus titer assay. The observation that HCMV virions often cooperate and that this cooperation varies between context seems robust. The putative biological explanations would require further exploration.

      This topic is very well known in the case of segmented viruses and the semi-infectious particles, leading to the idea of studying "sociovirology", but to my knowledge this is the first time that it was explored for a non-segmented virus, and in the context of MOI estimation.

      Thank you. We would note, however, that inability to discriminate between alternative models is not a weakness per se. It shows that our work goes beyond a somewhat typical approach in mathematical modeling to offer a single explanation for a phenomenon in question (rather than focusing on discriminating between alternatives that is often hard to do).

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) I now understand better the graphical abstract. I think my eye was too much attracted by the increase in specific infectivity that you see for more than 1 genome/cell, which is not the point of your paper. I am wondering if you should not guide even more the reader, by pointing out that the fact that the initial decline in specific infectivity represents apparent cooperativity.

      Let’s hope that the readers are smart enough to understand what to focus their eyes on. At the end, this is a graphical abstract that is not supposed to have too much text explaining where to look.

      (2) For your one-inflated geometric distribution, I agree that the estimations would remain very hypothetical because you would have to make many assumptions, however I think a hurdle model where you would fit the P(clump size = 1)=f1 and P(clump size = (i) following a one-truncated geometric distribution would be more appropriate because it would lead to a distribution closer to your PDF from figure S11C.

      The issue is that our data are not in clump sizes but in diameter of the clump D. This is why we opted for using a mixture of continuous distributions, not a mixture of discrete distributions. We are sharing the DLS data, so others are welcome to do another try of fitting other types of distribution to the data.

      (3) For the DLS data, I understand your choice to include all the datapoints, however I find the interpretation confusing: if I understand correctly, you consider that f1, the fraction of the smaller distribution, represents clumps of one virion. However, its median size is 10 times smaller than a virion. So, the number of clumps with one virion would be overestimated. I think it would be helpful for the reader to clarify this aspect, either in the results around lines 503-512, or in the discussion. Could it be that at higher dilution, what is represented by this smaller distribution would almost only be debris because the virions are so rare?

      When fitting a mixture of two log-normal distributions f<sub>1</sub> represents the proportion of clumps of larger size (as was described in the materials and methods). The actual estimated value of f<sub>1</sub> is not highly relevant in calculating change in PDF of the distribution only for D>=d (230nm) as shown in Suppl Fig S11C. But we now realize that this variable f<sub>1</sub> may be confused with a variable f<sub>1</sub> used to denote the fraction of clumps with virion size=1 (in Fig 5C). We now mention that in the caption of Supp Fig S10.

      (4) For the dashed diagonal lines of fig 2, what I don't understand is the choice of the intercept that seems a bit random. I was wondering if it would not be more helpful to make it so that the dashed line intersects the observation for 1 genome/cell, which could then be interpreted as a deviation from the "single hit" model extrapolated outside of 1 genome/cell?

      The diagonal lines in Fig 2 are exactly the same in ALL panels, as are the x/y axes ranges; the slope of the line (equals to 1) allows visually to see when the regression (shown by think black lines) deviates from slope=1, i.e., indicates apparent cooperativity. We will keep the lines are they are. Thank you for the suggestion, though.


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      In this paper, the authors conduct both experiments and modeling of human cytomegalovirus (HCMV) infection in vitro to study how the infectivity of the virus (measured by cell infection) scales with the viral concentration in the inoculum. A naïve thought would be that this is linear in the sense that doubling the virus concentration (and thus the total virus) in the inoculum would lead to doubling the fraction of infected cells. However, the authors show convincingly that this is not the case for HCMV, using multiple strains, two different target cells, and repeated experiments. In fact, they find that for some regimens (inoculum concentration), infected cells increase faster than the concentration of the inoculum, which they term "apparent cooperativity". The authors then provided possible explanations for this phenomenon and constructed mathematical models and simulations to implement these explanations. They show that these ideas do help explain the cooperativity, but they can't be conclusive as to what the correct explanation is. In any case, this advances our knowledge of the system, and it is very important when quantitative experiments involving MOI are performed.

      Strengths:

      Careful experiments using state-of-the-art methodologies and advancing multiple competing models to explain the data.

      Weaknesses:

      There are minor weaknesses in explaining the implementation of the model. However, some specific assumptions, which to this reviewer were unclear, could have a substantial impact on the results. For example, whether cell infection is independent or not. This is expanded below.

      Suggestions to clarify the study:

      (1) Mathematically, it is clear what "increase linearly" or "increase faster than linearly" (e.g., line 94) means. However, it may be confusing for some readers to then look at plots such as in Figure 2, which appear linear (but on the log-log scale) and about which the authors also say (line 326) "data best matching the linear relationship on a log-log scale".

      This is a good point. We included a clarification to indicate that linear on the log-log scale relationship does not imply linear relationship on the linear-linear scale. We wrote:

      “Because most data did not exhibit a linear relationship between virion concentration and infection probability we fitted the models to subsets of data best matching a linear relationship on a log-log scale. Note that linear relationship on log-log scale may still be nonlinear (on linear-linear scale) when n!=1.”

      (2) One of the main issues that is unclear to me is whether the authors assume that cell infection is independent of other cells. This could be a very important issue affecting their results, both when analyzing the experimental data and running the simulations. One possible outcome of infection could be the generation of innate mediators that could protect (alter the resistance) of nearby cells. I can imagine two opposite results of this: i) one possibility is that resistance would lead to lower infection frequencies and this would result in apparent sub-linear infection (contrary to the observations); or ii) inoculums with more virus lead to faster infection, which doesn't allow enough time for the "resistance" (innate effect) to spread (potentially leading to results similar to the observations, supra-linear infection).

      In our models we assumed cells to be independent of each other (see also responses to other similar points). Because we measure infection in individual cells, assuming cells are independent is a reasonable first approximation. However, the reviewer makes an excellent point that there may be some between-cell signaling happening in the culture that “alerts” or “conditions” cells to change their “resistance”. It is also possible that at higher genome/cell numbers, exposure of cells to virions or virion debris may change the state of cells in the culture, and more cells become “susceptible” to infection. This is a good point that we now list in Limitations subsection of Discussion; it is a good hypothesis to test in our future experiments. We write:

      “Accrued damage model is also consistent with the idea that at higher genome/cell values, the inoculum itself (including cell and/or virion debris) may impact overall susceptibility of all cells in the well, for example, making them more susceptible to infection. It may be expected, though, that exposing cells to debris would increase cell resistance to infection; this would result in n < 1 that we did not observe at small genomes/cell values.”

      (3) Another unclear aspect of cell infection is whether each cell only has one chance to be infected or multiple chances, i.e., do the authors run the simulation once over all the cells or more times?

      Each cell has only one chance to be infected. Algorithm 1 clearly states that; we will add an extra sentence in “Agent-based simulations” to indicate this point.

      (4) On the other hand, the authors address the complementary issue of the virus acting independently or not, with their clumping model (which includes nice experimental measurements). However, it was unclear to me what the assumption of the simulation is in this case. In the case of infection by a clump of virus or "viral compensation", when infection is successful (the cell becomes infected), how many viruses "disappear" and what happens to the rest? For example, one of the viruses of the clump is removed by infection, but the others are free to participate in another clump, or they also disappear. The only thing I found about this is the caption of Figure S10, and it seems to indicate that only the infected virus is removed. However, a typical assumption, I think, is that viruses aggregate to improve infection, but then the whole aggregate participates in infection of a single cell, and those viruses in the clump can't participate in other infections. Viral cooperativity with higher inocula in this case would be, perhaps, the result of larger numbers of clumps for higher inocula. This seems in agreement with Figure S8, but was a little unclear in the interpretation provided.

      This is a good point. We did not remove the clump if one of the virions in the clump manages to infect a cell, and indeed, this could be the reason why in some simulations we observe apparent cooperativity when modeling viral clumping. We have explored this in the revision and found that it does not really impact how infection rate scales with the genomes/cell (e.g., see Suppl Fig S8).

      (5) In algorithm 1, how does P_i, as defined, relate to equation 1?

      These are unrelated because eqn.(1) is a phenomenological model that links infection per cell to genomes per cell. P_i in algorithm 1 is “physics-inspired” potential barrier.

      (6) In line 228, and several other places (e.g., caption of Table S2), the authors refer to the probability of a single genome infecting a cell p(1)=exp(-lambda), but shouldn't it be p(1)=1-exp(-lambda) according to equation 1?

      Indeed, it was a typo, p(1)=1-exp(-lambda) per eqn 1. Thank you, it has been corrected in the revised paper.

      (7) In line 304, the accrued damage hypothesis is defined, but it is stated as a triggering of an antiviral response; one would assume that exposure to a virion should increase the resistance to infection. Otherwise, the authors are saying that evolution has come up with intracellular viral resistance mechanisms that are detrimental to the cell. As I mentioned above, this could also be a mechanism for non-independent cell infection. For example, infected cells signal to neighboring cells to "become resistance" to infection. This would also provide a mechanism for saturation at high levels.

      We do not know how exposure of a cell to one virion would change its “antiviral state”, i.e., to become more or less resistant to the next infection. If a cell becomes more resistant, there is no possibility to observe apparent cooperativity in infection of cells, so this hypothesis cannot explain our observations with n>1. Whether this mechanism plays a role in saturation of cell infection rate at lower than 1 value when genome/cell is large is unclear but is a possibility. We added this point to Discussion in revision (see our text above that includes this point).

      (8) In Figure 3, and likely other places, t-tests are used for comparisons, but with only an n=5 (experiments). Many would prefer a non-parametric test.

      We repeated the analyses in Fig 3 with Mann-Whitney test, results were the same, so we would like to keep results from the t-test in the paper.

      Reviewer #1 (Recommendations for the authors):

      (1) The strains of HCMV used have a fluorescent reporter "in place of the US11 gene". Can you provide a brief comment on whether and how this gene deletion affects HCMV replication?

      US11 is a resident ER protein that is considered an "immune evasion factor". It promotes ERAD of MHC I and has no observable effect on replication of HCMV in cultured cells (Berger 2000 JVI, Wiertz 1996 Cell). We now add this information in Materials and methods section of the paper. We write:

      “All BAC clones were modified to express green fluorescent protein (GFP) or the monomeric red fluorescent protein mCherry (mCherry) with En passant recombineering by replacing US11 with the eGFP or mCherry gene, respectively. US11 is a resident ER protein that is considered an “immune evasion factor”. It promotes ERAD of MHC I and has no observable effect on replication of HCMV in cultured cells [27, 28]. Infectious HCMV was recovered by electroporation of BAC-DNA into MRC5 cells which were then co-cultured with either HFFCs (TB and TR) or HFF-tet cells (ME).”

      (2) I didn't understand what the section "Virus titer assays" refers to. When was this used? How or why is this different from the "Virus stock dilution and dose-response assay"? Also in this section, you refer to NHDF cells - can you provide more information about these? And how does a different type of cell affect the titer assay (here measured as infected cells), since this is one of the main points of your paper?

      Apologies for the confusion. In Ryckman lab we routinely generate viral stock and titrate it using a specific cell type, Normal (or neonatal) Human Dermal Fibroblasts (NHDF). This way, the titer of the stock is consistent between experiments by different researchers in the lab. We then use standard 10-fold dilutions to define the number of infectious units per mL of the stock. We now name this subsection as “Quantification of viral stock infectivity using standard 10-fold dilutions”. After the stock was quantified, we then used that stock in our actual experiments with very small dilution factor df that allowed us to detect deviations of the rate of infection from single hit model.

      (3) In many places, "powerlaw" is written. This is usually written as two words, "power law".

      Because powerlaw comes together with “model”, we decided to use “power-law model”.

      (4) Line 75: "have" instead of "has"?

      (5) Line 84: "with" repeated.

      Corrected, thank you.

      (6) Line 116: This section "Cell lines" seems to describe three cell lines, "HFF cells and MRC5 cells" and then "EC" cells.

      HFF cells are fibroblasts used in our main experiments and MRC5 cells are another type of fibroblasts. We used MRC5 cells in the first step of recovering infection HCMV from BAC DNA (electroporation). We clarified this in Materials and methods. We write:

      “Cell lines. Human foreskin fibroblast cells (HFFCs or fibroblasts) and MRC5 cells (also fibroblasts) were cultured in Dulbecco’s modified Eagle’s medium (DMEM, Sigma) supplemented with 5% heat-inactivated fetal bovine serum (FBS, Rocky Mountain Biologicals, Missoula, MT, USA) and 5%Fetalgro® (Rocky Mountain Biologicals, Missoula, MT, USA). We used MRC5 cells in the first step of recovering infection HCMV from BAC DNA (electroporation). For main experiments we used HFFCs as fibroblasts. Human retinal pigment epithelial cells (ECs or ARPE-19, American Type Culture Collection, Manassas, VA, USA) were cultured in a 1:1 mixture of DMEM and Ham’s F-12 medium (DMEM:F-12, Gibco) and supplemented with 10% FBS.”

      (7) Line 188: Because the virus is double-stranded, do you have to divide the qPCR result by 2 to get genomes?

      This is typically accounted for in our calculations of genome/cell.

      (8) Line 200: Typically, one would write "500g" and not "500xg".

      Corrected.

      (9) Line 248: It would be clearer to write "cell type C different from cell type C2".

      Here C and C_2 refer to actual numbers of cell in the titration/growth experiments, so it is comparing numbers, not cell types. We kept the relationship as it is.

      (10) Definition of cell class: what is n in p_n, the total number of cells, or are these divided into n classes of resistance?

      This part was incorrectly copied from an earlier version, both cell resistance and virion infectivity was sampled from normal distributions with different mean and variances (see Table 1). We corrected the text to reflect this.

      (11) Line 272 to 273: Something seems to be missing, as the change of line doesn't make sense.

      Thank you. Edited to improve readability. Now it reads

      “Clumping hypothesis. In the basic model the number of virions a given cell is exposed to follows a Poisson distribution. However, it is well recognized that as virions are produced by infected cells, they may form clumps/aggregates; the number of virions per clump/aggregate may deviate from, for example, the Poisson distribution [33].”

      (12) Line 283: How lambda is chosen is not indicated here, only later (line 424), but at this point, one can confuse it with lambda in equation 1. Is it the same? It also doesn't seem to be indicated in your Table 1.

      The mean of the Poisson distribution in clump simulations lambda is not the same as lambda in eqn 1; we re-named the mean of Poisson distribution as lambda_c which is estimated by fitting a Poisson distribution to clump size distribution estimated from DLS experiments. Because it was dependent on the virus stock dilution, it is not listed in Table 1. However, we did perform additional simulations assuming lambda_c=2 (Suppl Fig S10).

      (13) Equation 6: I understand that you mostly used kappa=0, but in equation 6, would it be positive or negative (if not zero)?

      We probably expect kappa to be negative but we did not fully explore this extension of the model.

      (14) Line 350: Instead of "infection rates" would "infection frequencies" be better?

      We agree. Changed (also changed in the sentence above that line).

      (15) Line 366: I found this sentence a bit awkward.

      We edited it to the best of our ability to improve it.

      “Importantly, for most HCMV strain-target cell combinations we estimated n>1 (Figure 2 and Supplemental Table S2). With n>1 increase in virion concentration (i.e., higher genomes/cell values) results in a higher than linear increase in the probability of a cell to be infected (eqn. (1)) indicating cooperation between virions at infecting cells. We call this phenomenon “apparent cooperativity”.

      (16) Figure 2, panel L: I wonder if it would be better to include the panel with the name of the experiment, but no data. Currently, it takes a while to find what you are talking about in panel L (or at the very least, indicate the panel in the caption).

      Changed

      (17) Figure 2: When you say that experiments were done at least twice, are you referring to the GFP and mCherry versions of the experiment, or replicates within each of those fluorescent labels?

      Replicates with each of those labels.

      (18) Figure 3: What is the number on top of the black bars? I think it is the average of the paired fold change. Is this right? Why, in panel E, is it 1.32 when only one goes up?

      Yes, fold change. Indeed, 1.32 was a typo, it is 0.70, thank you for noting.

      (19) Line 408: delete the word "there".

      Done. Thank you.

      (20) Line 412: Instead of "The", it should be "Then".

      Done. Thank you.

      Reviewer #2 (Public review):

      In their article, Peterson et al. wanted to show to what extent the classical "single hit" model of virion infection, where one virion is required to infect a cell, does not match empirical observations based on human cytomegalovirus in vitro infection model, and how this would have practical impacts in experimental protocols.

      They first used a very simple experimental assay, where they infected cells with serially diluted virions and measured the proportion of infected cells with flow cytometry. From this, they could elegantly show how the proportion of infected cells differed from a "single hit" model, which they simulated using a simple mathematical model ("powerlaw model"), and better fit a model where virions need to cooperate to infect cells. They then explore which mechanism could explain this apparent cooperation:

      (1) Stochasticity alone cannot explain the results, although I am unsure how generalizable the results are, because the mathematical model chosen cannot, by design, explain such observations only by stochasticity.

      Our null model simulations are not just about stochasticity; they also include variability in virion infectivity and cell resistance to infection. We agree that simulations cannot truly prove that such variability cannot result in apparent cooperativity; however, we also provide a mathematical proof that increase in frequency of infected cells should be linear with virion concentration at small genome/cell numbers.

      (2) Virion clumping seemed not to be enough either to generally explain such a pattern. For that, they first use a mathematical model showing that the apparent cooperation would be small. However, I am unsure how extreme the scenario of simulated virion clumping is. They then used dynamic light scattering to measure the distribution of the sizes of clumps. From these estimates, they show that virion clumps cannot reproduce the observed virion cooperation in serial dilution assays. However, the authors remain unprecise on how the uncertainty of these clumps' size distribution would impact the results, as most clumps have a size smaller than a single virion, leaving therefore a limited number of clumps truly containing virions.

      As we stated in the paper, clumping may explain apparent cooperativity in simulations depending on how stock dilution impacts distribution of virions/clump. This could be explored further, however, better experimental measurements of virions/clump would be highly informative (but we do not have resources to do these experiments at present). Our point is that the degree of apparent cooperativity is dependent on the target cell used (n is smaller on epithelial cells than on fibroblasts) that is difficult to explain by clumping which is a virion property. Per comment by reviewer 1, we have done more analyses of the clumping model to investigate importance of clump removal per successful infection on the detected degree of apparent cooperativity. We found that it was not critical to our conclusions (Suppl Fig S8).

      The two models remain unidentifiable from each other but could explain the apparent virion cooperativity: either due to an increase in susceptibility of the cell each time a virion tries to infect it, or due to viral compensation, where lesser fit viruses are able to infect cells in co-infection with a better fit virion. Unfortunately, the authors here do not attempt to fit their mathematical model to the experimental data but only show that theoretical models and experimental data generate similar patterns regarding virion apparent cooperation.

      In the revision we now provide examples of our earlier simulations that “match” experimental data with a relatively high degree of apparent cooperativity (Supp Fig S9).

      Finally, the authors show that this virions cooperation could make the relationship between the estimated multiplicity of infection and viruses/cell deviate from the 1:1 relationship. Consequently, the dilution of a virion stock would lead to an even stronger decrease in infectivity, as more diluted virions can cooperate less for infection.

      Overall, this work is very valuable as it raises the general question of how the estimate of infectivity can be biased if extrapolated from a single virus titer assay. The observation that HCMV virions often cooperate and that this cooperation varies between contexts seems robust. The putative biological explanations would require further exploration.

      This topic is very well known in the case of segmented viruses and the semi-infectious particles, leading to the idea of studying "sociovirology", but to my knowledge, this is the first time that it was explored for a nonsegmented virus, and in the context of MOI estimation.

      Thank you.

      Reviewer #2 (Recommendations for the authors):

      Major comments:

      Two aspects of the work would benefit from further thought:

      (1) The simulation of virion clumps: in both cases (Poisson distribution or one-inflated geometric distribution), the proportion of clumps containing more than one virion will be small. For the Poisson distribution, as you fit the powerlaw model on the range of genomes/cell < ~ 3 genomes/cell (Figure 4B). I wonder to what extent this explains the sudden rise in infections/cells you observe above that limit. It would be interesting to plot the (cumulative) distribution of the clump sizes at different dilution levels to have a better idea.

      The reviewer has a good eye, indeed, the relationship between infection frequency and genomes/cell is linear up to a point, and we believe the inflection point reflects the genomes/cell values when clumps contain more than 1 virion. Here is the results of simulations with distribution of virions/clump plotted:

      Similarly, for the one-inflated geometric distribution, the proportion of clumps of size 1 is the sum of two events: f1, plus 1-f1 times the probability that the geometric distribution is zero, if I follow the methods on lines 287-294. I wonder if this is appropriate regarding the estimates made with the DLC. In particular, Figure 5C shows that the proportion of clumps of size 1 is more than ~ half of all the clumps, and does not seem to be the same distribution as the estimates made on Figure S9C. Maybe a hurdle model would be more appropriate?

      This is a fair point. In our analyses we found that modeling clump size distribution is tricky and required various assumptions. The issue with the DLS data is that we do not really know the distribution of intact virions per clump so how to relate the size of the clump to the number of virions in a clump is wide-open; we explored several possibilities and found that the answer (whether clumping results in apparent cooperativity) depends on assumptions of how clumps are modelled (e.g., compare Fig 4B and Suppl. Fig S11). Hurdle model is not appropriate for clumps because by our definition of a clump, it must have at least 1 virion. Our key observation, however, is that the degree of apparent cooperativity depends on the target cell type – and thus should be independent of virion clumping (unless there is viral cooperativity in the clumps). Overall, we decided that exploring more clumping models would take extra effort, but it is unclear if it brings any benefits to our conclusions.

      The analysis of the clump size distribution using dynamic light scattering, in Figure S8. If I interpret correctly, events with size < 230 nm should be excluded as they do not represent clumps of virions but rather media impurities or cell debris. Therefore, I don't understand the choice of fitting the whole set with a combination of two normal distributions, as even the larger normal distribution covers clumps < 230 nm. If the f1 indicated here is the one used in the methods line 287-294, this is then wrong because it does not represent the fraction of clumps of size 1, but rather debris.

      We used two normal (on log-scale) distributions when quantifying clump distribution data (Supp Fig S10) to avoid sub-selection of the data; in this way, two distribution fit the whole dataset with excellent quality. An alternative approach would be to sub-select data with size >=230nm and fit a normal (or similar) distribution of the clumps; such an approach may generate biases and/or unreliable estimates at high dilutions due to small number of clumps with large size (e.g., see Supp Fig S10S-X). In our simulations to model clump distribution and infection (Fig 5) we attempted to simulate the estimated clump size distribution (Suppl Fig S11C) only approximately. Again, because in our measurements we don’t really know the number of virions per clump, efforts to model exactly clump size distribution, we believe, are not going to give full answers.

      (2) Figure 4 and results lines 419-465: Why didn't you try to fit the different models to the data, instead of qualitatively comparing the estimate of n in the simulations with arbitrary parameters to the one for empirical data? Your models match the expectation of virion cooperation by design, so they are not more convincing for a virologist than logical non-quantitative reasoning. They would be of stronger evidence in my opinion if you could show how well they fit the data. You could then directly compare the different models' fits using goodness-of-fit metrics and decide whether one is better than another or if they all explain equally well the observations.

      Well, we have 11 different relationships between infection rate and genome/cell, finding parameter combinations that would match all the data with at least 2 alternative models seems excessive at present but it is a good direction as we get extra funding to continue this work. It is also difficult to extensively search for the parameter values that would result in a perfect fit of the stochastic simulations to data since the methods of fitting agent-based models to data are not fully developed. However, following this suggestion we now show results of simulations for the two alternative models (accrued damage and viral compensation) that we believe do match experimental data somewhat (see new Suppl Fig S9).

      Minor comments:

      (1) Graphical abstract: This requires more context as it is too rough here to help me understand the general idea of the paper. Plus, why does specific infectivity first decrease with genome/cell?

      We added few elements to the graphical abstract including the strain and target cell used. The decrease in specific infectivity at lower genome/cell is due to apparent cooperativity.

      (2) Equation (7): It would be beneficial for the reader if the reasoning behind the likelihood computation were further described.

      This is a relatively standard approach to model/estimate parameters of a binary outcome, e.g., see Wikipedia: https://en.wikipedia.org/wiki/Logistic_regression

      (3) Line 352-357: could the drop in infectivity also be enhanced/explained by increased cell mortality? Did you gate on cell viability during FCM?

      The infection rate was measured in live cells only, so increased cell mortality may be an explanation.

      (4) Figure 2: I don't understand the dashed diagonal lines: what do they represent exactly? Especially, wouldn't the single-hit model depend on p(1), in which case it should vary by cell x virus?

      As the caption to Figure 2 clearly states, diagonal dashed lines show the slope =1 (i.e, single hit model), so one would be able compare how far the data and/or model fit line deviate from 1. The note for p(1) in panel A is to illustrate how p(1) is calculated; obviously it varies by the strain-cell combination as is indicated in Suppl. Tab S2).

      (5) Fig3G: Is it not surprising to find a positive relationship between p(1) and n? I would have intuitively expected that the stricter the environment is, the more cooperation you observe. But maybe these viruses did not evolve in this context, and therefore, this relationship is different from what you expect from an evolutionary optimum.

      Well, we simply don’t know. The relationship simply suggests that there is connection between infectivity of a single virion and the degree of apparent cooperativity. We are not certain what is the context in which these viruses have evolved.

      (6) Flow cytometry assay: could it be possible that cells infected by more virions generate more fluorescent proteins and are therefore less likely to be false negatives? Maybe you could compare the fluorescence intensity distribution among infected cells in the context of low MOI vs high MOI?

      This is an interesting point. From presented flow cytometry plots (e.g., Suppl Fig S3), the MFI for infected cells does not seem to depend on the dilution (or genome/cell).

      (7) Figure S9B: I did not understand this figure. Are the axes labels correct? How is it possible to have less than 1 virion/well?

      The y axis shows a scaled number calculated from integrating estimated clump size distribution, we assume 1 “scaled” virion/well at highest virion/cell values. With scaling, yes, it is possible to have less than 1 virion/well.

      Reviewer #3 (Public review):

      Summary:

      The authors dilute fluorescent HCMV stocks in small steps (df ≈ 1.3-1.5) across 23 points, quantify infections by flow cytometry at 3 dpi, and fit a power-law model to estimate a cooperativity parameter n (n > 1 indicates apparent cooperativity). They compare fibroblasts vs epithelial cells and multiple strains/reporters, and explore alternative mechanisms (clumping, accrued damage, viral compensation) via analytical modeling and stochastic simulations. They discuss implications for titer/MOI estimation and suggest a method for detecting "apparent cooperativity," noting that for viruses showing this behavior, MOI estimation may be biased.

      Strengths:

      (1) High-resolution titration & rigor: The small-step dilution design (23 serial dilutions; tailored df) improves dose-response resolution beyond conventional 10× series.

      (2) Clear quantitative signal: Multiple strain-cell pairs show n > 1, with appropriate model fitting and visualization of the linear regime on log-log axes.

      (3) Mechanistic exploration: Side-by-side modeling of clumping vs accrued damage vs compensation frames testable hypotheses for cooperativity.

      Thank you.

      Weaknesses:

      (1) Secondary infection control: The authors argue that 3 dpi largely avoids progeny-mediated secondary infection; this claim should be strengthened (e.g., entry inhibitors/control infections) or add sensitivity checks showing results are robust to a small secondary-infection contribution.

      This is an important point. We do believe that the current knowledge about HCMV virion production time – it takes 3-4 days to make virions per multiple papers (see Fig 7 in Vonka and Benyesh-Melnick JB 1966; Fig 3B in Stanton et al JCI 2010; and Fig 1A in Li et al. PNAS 2015) – is sufficient to justify our experimental design but we do agree that an additional control to block novel infections with would be useful. We had previously performed experiments with a HCMV TB-gL-KO that cannot make infectious virions (but the stock virions can be made from complemented target cells). We will investigate if our titration experiments with this virus strain have sufficient resolution to detect apparent cooperativity. However, at present we do not have the resources to perform novel experiments.

      (2) Discriminating mechanisms: At present, simulations cannot distinguish between accrued damage and viral compensation. The authors should propose or add a decisive experiment (e.g., dual-color coinfection to quantify true coinfection rates versus "priming" without coinfection; timed sequential inocula) and outline expected signatures for each mechanism.

      Excellent suggestion. Because infection of a cell is a result of the joint viral infectivity and cell resistance, it may be hard to discriminate between these alternatives unless we specify them as particular molecular mechanisms. But we tried our and listed potential future experiments in the revised version of the paper. Specifically, we write:

      “Second, while we have proposed alternative mechanisms that may result in apparent cooperativity, at present we could not discriminate between these alternatives, in part, because the models lacked specifics – e.g., if virions interacting with a cell reduce its resistance to infection, what does it mean exactly [12]? If virions in a collection augment their infectivity (which may be expected for segmented viruses), how does that viral compensation actually work? Designing experiments that would discriminate between these alternatives would require focusing on a specific mechanism. For example, it may be that that the initiation of gene expression is difficult but is more efficient when there are more virions bringing in more tegument transactivators like pp72/ppUL35 [59]. Alternatively, it may be that there is a bona fide resistance mechanism at play here (e.g. “interferon”) that is antagonized by a viral tegument protein (like TRS1/IRS1 that acts against PKR and 2’5’OAS) [60]. Accrued damage model is also consistent with the idea that at higher genome/cell values, the inoculum itself (including cell and/or virion debris) may impact overall susceptibility of all cells in the well, for example, making them more susceptible to infection. It may be expected, though, that exposing cells to debris would increase cell resistance to infection; this would result in n < 1 that we did not observe at small genomes/cell values. Addressing these hypotheses is an area of future research that will require funding.”

      (3) Decline at high genomes/cell: Several datasets show a downturn at high input. Hypotheses should be provided (cytotoxicity, receptor depletion, and measurement ceiling) and any supportive controls.

      Another good point. We do not have a good explanation, but we do not believe this is because of saturation of available target cells. It seemed to only happen (or was most pronounced) with the ME stocks, which are typically lower in titer and so the higher MOI were nearly undiluted stock. It may be the effect of the conditioned medium. Or perhaps there are non-infectious particles like dense bodies (enveloped particles that lack a capsid and genome) and non-infectious, enveloped particles (NIEPs) that compete for receptors or otherwise damage cells and these don’t get diluted out at the higher doses. We included the point about cell death in Discussion of the revised version of the paper. Specifically, we write:

      “We also do not have a clear explanation of why infection frequency declines at high genomes/cell values for some strain-cell combinations (e.g., Figure 2A, C, D, I, J). Because we measured cell infection in live cells, increase in cell death at higher genomes/cell values may result in the decrease in the number of viable cells.”

      (4) Include experimental data: In Figure 6, please include the experimentally measured titers (IU/mL), if available.

      This is a model-simulated scenario, and as such, there is no measured titers.

      (5) MOI guidance: The practical guidance is important; please add a short "best-practice box" (how to determine titer at multiple genomes/cell and cell densities; when single-hit assumptions fail) for end-users.

      Good suggestion. We now include best-practice box using guidelines developed in Ryckman lab over the years in the revised version of the paper. This is how it reads:

      “Match viral titration methods to the experiment as far as possible. This includes using the same dilution of the viral stock, the cell type, duration of inoculation, and readout of infection.

      When possible, determine the degree of apparent cooperativity (“n”-value, eqn. (1)) for each virus strain/cell type pair being studied.

      If n= 1 (no cooperativity), it is reasonable to calculate experimental MOI based on stock infectivity value determined from a convenient stock dilution.

      If n > 1 or unknown, then stock infectivity should be determined at a dilution resulting in an MOI as close as possible to the desired experimental MOI. Alternatively, the inoculum size can be empirically determined to yield the desired number of infected cells. In these ways different virus/cell type pairs can be compared more fairly.

      Box 1: Recommendations on titrating viral stocks and on performing experiments when comparing different viral strains.”

      Reviewer #3 (Recommendations for the authors):

      FROM PUBLIC REVIEWS (2) Discriminating mechanisms: At present, simulations cannot distinguish between accrued damage and viral compensation. The authors should propose or add a decisive experiment (e.g., dual-color coinfection to quantify true coinfection rates versus "priming" without coinfection; timed sequential inocula) and outline expected signatures for each mechanism.

      This is a good point but to propose a good experiment we need to narrow down the “generic” mechanism to specific processes/genes. We put forward some ideas but clearly more work is needed here:

      “Second, while we have proposed alternative mechanisms that may result in apparent cooperativity, at present we could not discriminate between these alternatives, in part, because the models lacked specifics – e.g., if virions interacting with a cell reduce its resistance to infection, what does it mean exactly [12]? If virions in a collection augment their infectivity (which may be expected for segmented viruses), how does that viral compensation actually work? Designing experiments that would discriminate between these alternatives would require focusing on a specific mechanism. For example, it may be that that the initiation of gene expression is just difficult but is more efficient when there are more virions bringing in more tegument transactivators like pp72/ppUL35 [59]. Alternatively, it may be that there is a bona fide resistance mechanism at play here (e.g. “interferon”) that is antagonized by a viral tegument protein (like TRS1/IRS1 that acts against PKR and 2’5’OAS) [60]. Accrued damage model is also consistent with the idea that at higher genome/cell, the inoculum itself (including cell and/or virion debris) may impact overall susceptibility of all cells in culture, for example, making them more susceptible to infection. It may be expected, though, that exposing cells to debris would increase cell resistance to infection; this would result in n < 1 that we did not observe at small genomes/cell values. Addressing these hypotheses is an area of future research that will require funding.”

      (1) Methods transparency: Include raw spreadsheets or tables of dilution factors and per-well genome estimates used for Figure 1A; this will help reproducibility of the df = 1.3-1.5 pipeline.

      Provided as supplemental xlsx file.

      (2) Epithelial vs fibroblast contrast: Since n is lower on epithelial cells, expand on cell-intrinsic barriers that could dampen apparent cooperativity, and if this argues against simple clumping.

      Indeed, this is our point that we raised in Discussion. Since ECs show lower n than fibroblasts, this observation argues against clumps. Going forward the contrast between cell types will be an approach to understand mechanism. One difference is entry pathways, the ECs involve endocytosis and endosome acidification whereas the fibroblasts do not. There are clearly different receptors involved also, although they are not clearly characterized. One recent report that might be relevant is Ohman 2024 PNAS that shows the gH/gL/UL128-131 complex (aka, "pentamer") is not just dispensable for entry into fibroblasts, but inhibitory. They suggest that the pentamer might bind to a receptor on fibroblasts that activates a pathways that acts against viral IE expression, It could be that in this situation, more virions are really helpful to overcome that block, whatever it is. We now update this point in Discussion.

      (3) Visualization: In Figure 2, consider showing confidence bands for the fitted slope (n) within the colored fit window and reporting n {plus minus} SE in the panels.

      Because we used custom scripts to fit models to data, showing bands of model predictions was a bit complex and would interfere with data points. But we now show 95% Cis for the estimated value n (that are listed in Suppl. Tab S2).

      (4) Symbols: Define all symbols (e.g., V₀, n) on first use in the main text, not only in Methods.

      Done.

      (5) Plot axes check: Explain non-uniform axis labeling ("genomes/cell," "infections/cell").

      This comment was unclear – which labels were not “uniform”? Genomes/cell indicate the expected number of genomes (or virions) that a cell is on average exposed to, infections/cell indicates the probability that a cell actually gets infected.

      (6) Confidence interval for estimated parameters: Figure 3 A-C, please report estimated parameter intervals.

      These are listed in Suppl. Tab S2. Putting Cis for all estimates would clutter the figure making it hard to tell which CIs are for which estimate. But we put the Cis for estimated parameter n in Figure 2.

    1. eLife Assessment

      This is a valuable study of changes in host genome histone methylation and transcription changes associated with Chlamydia infection. The data presented are solid but further analysis would strengthen the authors overall conclusions.

    2. Reviewer #1 (Public Review):

      This study by Charendoff et al provides interesting observations related to global histone hypermethylation in host cells, during Chlamydia trachomatis infections. The core observation they report is that the host histones are highly hypermethylated during infection, and this appears to be an amplifying effect due to continuous inhibition of demethylases, in part due to a metabolic shift in the host where succinate amounts (which inhibit demethylases) increases. The authors claim specifically due to the bacteria, since antibiotic treatment prevents histone hypermethylation (but leaves you wondering about cause/consequence correlations).

      The core observation of hyper methylation is very interesting, and well documented. There are a number of points to consider though in order to fully substantiate the findings, and close out loose ends. My comments are broad - and built around the interpretations (vs the data presented).

      (1) Related to observations coming Fig 1C etc, and connecting to Fig 3 - the hyper methylation appears to be across different protein arg/lys residues - and is not histone specific. So, is it just a consequence of high SAM pools and flux in infected cells? i.e. the bacterial infection increases SAM pools in cells, and provides an increase in substrate pools for the methyltransferases, leading to protein hyper methylation. The approach used here only measures steady-state SAM amounts (and not SAM flux or utilisation). For example, reduced SAM amounts in nuclei could be due to increased utilisation of SAM. The experiments done with the demethylase does not actually answer this question - if you decrease demethylase activity, you will get an increase in net methylation. The authors see an increase in net methylation in the infected cells - this would suggest that in addition (or perhaps primarily) to reduced demethylase activity, there could be much higher SAM utilisation/flux. Again, the over expression of JMJ proteins does not resolve this problem.

      (2) Adding to this - what happens to SAM pools in the cells treated with the inhibitors? This actually may not look like the slightly reduced SAM pool observed in infected cell nuclei. Also, what is the SAM/SAH ratio (a very useful indicator of methylation activity).

      (3) There is a correlation/implication issue here in Fig 2 - cells with C. trachoma's infection show hyper methylation. But these are the only cells with high C. trachomatis. So it is a bit ingenious to say that histone hyper methylation correlates with bacterial proliferation. The cells without bacteria don't have hyper methylation - and that does not have anything to do with the bacterial proliferation.

      (4) The claim that demethylase activity is down in infected cells again comes primarily from the increased succinate (2-fold) amounts in infected nuclei - and then correlated with experiments where succinate, (permeable) a-KG are supplemented in excess. While I personally like the hypothesis that the hypermethylation might be a result of an imbalance in cofactors (succinate vs a-KG) in infected cells, the data presented is very premature to make that conclusion. Again, steady state measurements of only succinate cannot provide a clear answer to that question. For example, is there a clear allocation/flux difference (between a-KG, and leading out to glutamate/glutamine, vs flux through the TCA and increased succinate accumulation? Is there a bottleneck/build-up of succinate in cells that might lead to the increase in nuclei? This also opens another direction of possible regulation - increased histone succinylation. When you see a large increase in succinate in the nucleus, before looking at demethylase activity - it becomes obvious if succinate itself increases histone succinylation (through HATs).

      (5) What might the authors hypothesise about why this hyper methylation happens? It appears in some ways that hyper methylation happens - potentially due to a metabolic bottleneck that the bacteria triggers (and there is a build-up of SAM and/or succinate, and altered flux out of a-kg). The methylation is just a visible outcome - but may not be central to pathogenesis or viability.

    3. Reviewer #2 (Public Review):

      Strengths:

      (1) Because the study compares genuinely infected cells with uninfected cells within the same infected cell population, it enables a clearer and more rigorous comparison.

      (2) By using multiple Chlamydia species and cells from multiple host species (human and mouse), and obtaining consistent findings across these systems, the study demonstrates the generality of bacterium-induced epigenomic alterations.

      (3) The study shows that the epigenomic changes are caused by reduced activity of JMJC domain-containing lysine demethylases, demonstrating through multiple complementary approaches-including the use of a demethylase inhibitor, overexpression of target-specific demethylases, and analysis from the perspective of cofactors required for JMJC domain-containing demethylases-that decreased lysine demethylase activity constitutes the molecular mechanism underlying the increased H3 methylation levels induced by Chlamydia infection.

      (4) By performing ChIP-seq analyses of H3K4me3 and H3K9me3, the study clearly delineates, on a genome-wide scale, how infection leads to increased levels of these epigenomic marks.

      Weakness:

      (1) Reduction of cofactors such as Fe2+ or a-KG decreases the activity of JMJC-domain-containing lysine demethylases (thereby directly affecting histone H3 lysine methylation). However, these cofactors are also involved in the activities of other epigenetic regulators, such as TET enzymes that contribute to DNA demethylation and SIRT family proteins that mediate histone deacetylation. Therefore, it cannot be excluded that modulation of these factors indirectly leads to the changes in H3 lysine methylation dynamics targeted in this study.

      (2) Related to point 1, although overexpression of JMJC-type demethylases has been shown to reduce the Chlamydia infection-induced increase in H3 lysine methylation, it is well known that over production of these enzymes, while target-specific, also leads to a genome-wide reduction of lysine methylation. Thus, a decrease in lysine methylation upon expression of these demethylases does not necessarily demonstrate that the infection-induced increase in H3 lysine methylation is caused by impaired JMJC-type demethylase activity.

    4. Reviewer #3 (Public Review):

      In this manuscript, the authors explore a molecular basis for hypermethylation of histones in epithelial cells infected with the obligate intracellular bacterial pathogen Chlamydia trachomatis. This is of particular interest given that Chlamydia is known to drastically alter host cell gene transcription, and histone hypermethylation would suggest a new way by which Chlamydia interferes with gene expression of its host. Histone methylation was previously implicated in the introduction of dsDNA breaks in infected cells, and the chlamydial effector NUE was reported to methylate histones, but the role of this modification in dictating host cell gene transcription has been unexplored. The authors use a suite of tools to approach this question, including various -omics techniques, genetic approaches, and biochemical assays. Overall, the manuscript provides many interesting pieces of data, though some of them are difficult to reconcile, which may reflect methodological hurdles that are not fully addressed in the current version of the manuscript. My major concerns regard the rationale/interpretation for various mechanistic experiments and that the heterogeneity of the histone hypermethylation phenotype is not addressed which I believe may explain some apparent inconsistencies in the results.

      Using an immunofluorescent approach, the authors show that a subpopulation of the nuclei in Chlamydia-infected cells (~10-20%) exhibit high amounts of methylated histone species. This occurs during the late stages of infection, near the time when Chlamydia would lyse the host cell and positively correlates with bacterial burden. Accordingly, halting chlamydial growth blocks the onset of histone hypermethylation. Exogenously supplying cofactors for histone demethylases, the low activity of which is implicated in the histone hypermethylation phenotype, reduces histone hypermethylation. In general, these data are compelling and raise interesting questions about the role of histone methylation in governing chlamydial egress from infected cells. Interestingly, these behaviors seem to arise independently of NUE, the secreted chlamydial histone methyltransferase, supporting the notion that a metabolic reprogramming may underlie the hypermethylation phenomenon.

      As noted above, the authors propose that hypermethylation arises due to decreased demethylase activity in infected cells. However, the data do not conclusively support this interpretation. For example, the approaches used to probe demethylase activity rely on (i) a direct biochemical measure of demethylase activity, (ii), pharmacological inhibition of demethylase, and (iii) heterologous expression of a specific demethylase. With the exception of (i), these approaches would be expected to alter histone methylation regardless of the source. That is, inhibition of demethylases should increase histone methylation regardless of whether the source of methylation is increased methylase or decreased demethylase activity. Similarly, overexpression of a demethylase would be expected to reduce cognate histone methylation arising either from increased methylase or decreased demethylase activity.

      Moreover, the authors report that the effect of the demethylase inhibitor on histone hypermethylation is significantly potentiated by infection, suggesting that infected cells have greater methylase activity than uninfected cells, because the latter barely respond to the presence of demethylase inhibitor. In other words, a dramatic increase in histone methylation in the presence of demethylase inhibitor is most parsimoniously explained by increased methylation (no longer being removed by demethylase), not decreased demethylation (which would be analogous to treatment with demethylase inhibitor). The authors do not directly assay methylase activity. These concerns extend to the rationale used to justify experiments with infected mice, which the authors treat with the demethylase inhibitor.

      The authors perform experiments to characterize the consequence of hypermethylation genome-wide. Because the authors do not enrich for those cells which exhibit histone hypermethylation, the results reflect the mixed population, and therefore presumably dilute out important signal related to the phenomena under investigation. For example, the proteomic analysis of post-translational modifications identifies only one methylated histone species, whereas the immunofluorescent approach shows consistent effects across five different methylated histone species. Moreover, the chromatin immunoprecipitation analysis indicates that there is unexpectedly a lower density of methylated histones at regions which are also enriched in uninfected cells. The authors argue that this suggests increased methylation is happening "outside" of these histone-dense regions, but direct evidence in support of this claim is lacking.

      In sum, this paper provides compelling evidence in support of the notion that histones are hypermethylated at various residues late in chlamydial infection, that this process is modulated by known cofactors of demethylases, and is the result of high levels of bacterial replication in the cell. That histone hypermethylation governs host gene transcription during chlamydial infection suggests a relatively novel mechanism by which Chlamydia subverts the host cell to establish a replicative niche or egress to infect a new cell. The information obtained regarding the methylation status of host proteins and host gene transcription controlled by a metabolic cofactor during infection will be a useful resource for other researchers. However, in the current version of the manuscript, the mechanistic basis for these behaviors is relatively unclear.

    5. Author response:

      Reviewer #1 (Public Review):

      This study by Charendoff et al provides interesting observations related to global histone hypermethylation in host cells, during Chlamydia trachomatis infections. The core observation they report is that the host histones are highly hypermethylated during infection, and this appears to be an amplifying effect due to continuous inhibition of demethylases, in part due to a metabolic shift in the host where succinate amounts (which inhibit demethylases) increases. The authors claim specifically due to the bacteria, since antibiotic treatment prevents histone hypermethylation (but leaves you wondering about cause/consequence correlations).

      The core observation of hyper methylation is very interesting, and well documented. There are a number of points to consider though in order to fully substantiate the findings, and close out loose ends. My comments are broad - and built around the interpretations (vs the data presented).

      (1) Related to observations coming Fig 1C etc, and connecting to Fig 3 - the hyper methylation appears to be across different protein arg/lys residues - and is not histone specific. So, is it just a consequence of high SAM pools and flux in infected cells? i.e. the bacterial infection increases SAM pools in cells, and provides an increase in substrate pools for the methyltransferases, leading to protein hyper methylation. The approach used here only measures steady-state SAM amounts (and not SAM flux or utilisation).

      For example, reduced SAM amounts in nuclei could be due to increased utilisation of SAM. The experiments done with the demethylase does not actually answer this question - if you decrease demethylase activity, you will get an increase in net methylation. The authors see an increase in net methylation in the infected cells - this would suggest that in addition (or perhaps primarily) to reduced demethylase activity, there could be much higher SAM utilisation/flux. Again, the over expression of JMJ proteins does not resolve this problem.

      This is an important point. Indeed, one limitation of the initial version of the paper was that we had measured SAM concentration only at one time point (40 hpi) and on the whole population. During revision we used a ratiometric sensor to measure SAM concentration in cells (PMID 34937909). We observed cell-to-cell heterogeneity in SAM levels in HeLa cells, as previously reported in other cell lines. Chlamydia inclusions develop asynchronously, which allows to observe, 40 hpi, a continuum of early (low bacterial load) to late (high bacterial load) stages of infection. We observed no correlation between bacterial load and SAM level, and SAM levels were globally similar when comparing infected and non-infected cells. This experiment strongly supports the hypothesis that protein hypermethylation is not due to an increase in SAM during infection. The data were added in the New Fig. 3. Note that the former Fig. 3 is now split into New Fig. 3 and New Fig. 4.

      (2) Adding to this - what happens to SAM pools in the cells treated with the inhibitors? This actually may not look like the slightly reduced SAM pool observed in infected cell nuclei. Also, what is the SAM/SAH ratio (a very useful indicator of methylation activity).

      Based on the high cell-to-cell heterogeneity of SAM levels observed with the ratiometric probe, we reasoned that measuring SAM/SAH ratio without single cell resolution would not bring crucial information. Also, the discrepancy between data displayed in new Fig. 3A (nuclear extracts) and 3C (live cell imaging) indicate that SAM might be less stable in cellular extracts from infected cells compared to non-infected ones, which would complicate the interpretation of the data. Therefore, we did not implement LC-MS/MS on nuclear extracts to measure SAM/SAH ratio.  

      (3) There is a correlation/implication issue here in Fig 2 - cells with C. trachoma's infection show hyper methylation. But these are the only cells with high C. trachomatis. So it is a bit ingenious to say that histone hyper methylation correlates with bacterial proliferation. The cells without bacteria don't have hyper methylation - and that does not have anything to do with the bacterial proliferation.

      In Fig. 2B, we compared the methylation signal within the population of infected cells only (excluding the uninfected cells). We edited the text to clarify this point. “We observed that, within the population of infected cells, the sum intensity of the mCherry signal was higher in cells that displayed hypermethylation of H3K9me3 than in cells with low level of H3K9me3, indicating that histone hypermethylation correlated with bacterial load (Fig. 2B).”

      (4) The claim that demethylase activity is down in infected cells again comes primarily from the increased succinate (2-fold) amounts in infected nuclei - and then correlated with experiments where succinate, (permeable) a-KG are supplemented in excess. While I personally like the hypothesis that the hypermethylation might be a result of an imbalance in cofactors (succinate vs a-KG) in infected cells, the data presented is very premature to make that conclusion. Again, steady state measurements of only succinate cannot provide a clear answer to that question. For example, is there a clear allocation/flux difference (between a-KG, and leading out to glutamate/glutamine, vs flux through the TCA and increased succinate accumulation? Is there a bottleneck/build-up of succinate in cells that might lead to the increase in nuclei? This also opens another direction of possible regulation - increased histone succinylation. When you see a large increase in succinate in the nucleus, before looking at demethylase activity - it becomes obvious if succinate itself increases histone succinylation (through HATs).

      Our work confirms the accumulation of succinate in cells infected by C. trachomatis, previously reported in Rother et al 2018. The reason for this accumulation remains to be investigated in detail. We have previously shown that OxPhos is relatively stable in infected cells (PMID 35931114), indicating that the flux through the TCA of the eukaryotic host proceeds normally. As mentioned in our discussion, the TCA of the bacteria is disrupted with several enzymes missing, although not in the step immediately downstream of succinate/fumarate production. Still, synthesis of succinate and fumarate (fumarate accumulation was observed in the Rother 2018 study) by bacterial enzymes might contribute to their accumulation in infected cells. The approach we chose to measure methylation at the proteome level is not suitable to look for histone succinylation, because of the diversity of post translational modifications on histones, which occur in combinations. However, following on this reviewer’s comment, we reanalysed the proteomic data to compare protein succinylation levels in infected and non-infected samples. We detected 41 succinylated peptides in the infected samples, against 23 in the uninfected samples. For many of these, we did not have quantitative data in all condition and only one protein, transportin 1 (TNPO1), reached statistical significance, with a 4-fold increase in succinylation in infected samples. Thus, while essentially qualitative, this analysis fully supports the hypothesis that succinate accumulates in infected cells. These data were added to Table S1 and to the result section.

      (5) What might the authors hypothesise about why this hyper methylation happens? It appears in some ways that hyper methylation happens - potentially due to a metabolic bottleneck that the bacteria triggers (and there is a build-up of SAM and/or succinate, and altered flux out of a-kg). The methylation is just a visible outcome - but may not be central to pathogenesis or viability.

      We discussed this question in the penultimate paragraph of the discussion by giving some elements of answer to the question: “Does it benefit the host or the bacteria? ». In our study, we showed that protein hypermethylation affected the transcriptional response of the host. We did not investigate whether the activity of some of the host proteins engaged in the response to infection were affected. It might be the case, considering that methylation is a common PTM regulating protein’s activity. Still, we agree with this reviewer that hypermethylation might not be central to pathogenesis or viability. Addressing this question would require a complex model in which protein methylation levels could be controlled experimentally.  

      Reviewer #2 (Public Review):

      Strengths:

      (1) Because the study compares genuinely infected cells with uninfected cells within the same infected cell population, it enables a clearer and more rigorous comparison.

      (2) By using multiple Chlamydia species and cells from multiple host species (human and mouse), and obtaining consistent findings across these systems, the study demonstrates the generality of bacterium-induced epigenomic alterations.

      (3) The study shows that the epigenomic changes are caused by reduced activity of JMJC domain-containing lysine demethylases, demonstrating through multiple complementary approaches-including the use of a demethylase inhibitor, overexpression of target-specific demethylases, and analysis from the perspective of cofactors required for JMJC domain-containing demethylases-that decreased lysine demethylase activity constitutes the molecular mechanism underlying the increased H3 methylation levels induced by Chlamydia infection.

      (4) By performing ChIP-seq analyses of H3K4me3 and H3K9me3, the study clearly delineates, on a genome-wide scale, how infection leads to increased levels of these epigenomic marks.

      Weakness:

      (1) Reduction of cofactors such as Fe2+ or a-KG decreases the activity of JMJC-domaincontaining lysine demethylases (thereby directly affecting histone H3 lysine methylation). However, these cofactors are also involved in the activities of other epigenetic regulators, such as TET enzymes that contribute to DNA demethylation and SIRT family proteins that mediate histone deacetylation. Therefore, it cannot be excluded that modulation of these factors indirectly leads to the changes in H3 lysine methylation dynamics targeted in this study.

      Indeed, reduction of the concentration of Fe2+ and aKG is expected to have other consequences in addition to the inhibition of JMJC-domain containing lysine demethylases on which we focus in this study. As a matter of fact, we reported a decrease in the methylation level of host DNA in infected cells, and we brought some elements that might explain the discrepancy between DNA and histone methylation status in the discussion (e.g., infected cells display enhanced expression of GADD45, which recruit TET enzymes and thus facilitate DNA demethylation). This example illustrates the complexity of host/pathogen interplay, which affect many parameters simultaneously. Indeed, we cannot rule out that modulation of enzymatic activities other than JMJC-domain containing lysine demethylase contribute significantly to the hypermethylation phenotype.

      (2) Related to point 1, although overexpression of JMJC-type demethylases has been shown to reduce the Chlamydia infection-induced increase in H3 lysine methylation, it is well known that over production of these enzymes, while target-specific, also leads to a genome-wide reduction of lysine methylation. Thus, a decrease in lysine methylation upon expression of these demethylases does not necessarily demonstrate that the infection-induced increase in H3 lysine methylation is caused by impaired JMJC-type demethylase activity.

      We fully agree. We included this experiment to show that increasing the expression of one demethylase only restored demethylation of its cognate target. This support the hypothesis that if the hypermethylation is due to poor demethylase activity, it is likely that several demethylases show impaired activity (as opposed to a scenario in which failure of activity of a single demethylase would indirectly affect all other methylation marks).  

      Reviewer #3 (Public Review):

      In this manuscript, the authors explore a molecular basis for hypermethylation of histones in epithelial cells infected with the obligate intracellular bacterial pathogen Chlamydia trachomatis. This is of particular interest given that Chlamydia is known to drastically alter host cell gene transcription, and histone hypermethylation would suggest a new way by which Chlamydia interferes with gene expression of its host. Histone methylation was previously implicated in the introduction of dsDNA breaks in infected cells, and the chlamydial effector NUE was reported to methylate histones, but the role of this modification in dictating host cell gene transcription has been unexplored. The authors use a suite of tools to approach this question, including various -omics techniques, genetic approaches, and biochemical assays. Overall, the manuscript provides many interesting pieces of data, though some of them are difficult to reconcile, which may reflect methodological hurdles that are not fully addressed in the current version of the manuscript. My major concerns regard the rationale/interpretation for various mechanistic experiments and that the heterogeneity of the histone hypermethylation phenotype is not addressed which I believe may explain some apparent inconsistencies in the results.

      We thank this reviewer for insightful comments. We address these two major concerns during revision and bring some elements in our responses below.

      Using an immunofluorescent approach, the authors show that a subpopulation of the nuclei in Chlamydia-infected cells (~10-20%) exhibit high amounts of methylated histone species. This occurs during the late stages of infection, near the time when Chlamydia would lyse the host cell and positively correlates with bacterial burden.

      Accordingly, halting chlamydial growth blocks the onset of histone hypermethylation. Exogenously supplying cofactors for histone demethylases, the low activity of which is implicated in the histone hypermethylation phenotype, reduces histone hypermethylation. In general, these data are compelling and raise interesting questions about the role of histone methylation in governing chlamydial egress from infected cells. Interestingly, these behaviors seem to arise independently of NUE, the secreted chlamydial histone methyltransferase, supporting the notion that a metabolic reprogramming may underlie the hypermethylation phenomenon.

      As noted above, the authors propose that hypermethylation arises due to decreased demethylase activity in infected cells. However, the data do not conclusively support this interpretation. For example, the approaches used to probe demethylase activity rely on (i) a direct biochemical measure of demethylase activity, (ii), pharmacological inhibition of demethylase, and (iii) heterologous expression of a specific demethylase. With the exception of (i), these approaches would be expected to alter histone methylation regardless of the source. That is, inhibition of demethylases should increase histone methylation regardless of whether the source of methylation is increased methylase or decreased demethylase activity. Similarly, overexpression of a demethylase would be expected to reduce cognate histone methylation arising either from increased methylase or decreased demethylase activity.

      We agree with the reviewer’s comments. The experiment using pharmacological inhibitors (ii) show that infected cells are sensitized to these inhibitors but doesn’t provide direct mechanistic insight. The experiment using heterologous expression of demethylases (iii) was included to show that increasing the expression of one demethylase only restored demethylation of its cognate target. This supports the hypothesis that several demethylases show impaired activity (as opposed to a scenario in which failure of activity of a single demethylase would indirectly affect all other methylation marks).  

      The most direct evidence for impaired demethylase activity come from the direct measure of demethylation of H3K4me3 in nuclear extract (i). It is strengthened by indirect evidence that metabolite concentrations hinder demethylase activities late in infection: 1/ iron and DMKG supply diminish hypermethylation of histone lysine residues 2/ succinate levels (a competitor of aKG) are two-fold higher in nuclei isolated from infected cells. This latter finding was confirmed during revision as we identified more succinylated proteins in infected samples compared to non-infected ones.

      We also considered the possibility that infected cells displayed increased histone methyl transferase (HMT) activity. This would be compatible with decrease KDM activity and could contribute to the histone hypermethylation. Unfortunately, this hypothesis cannot be tested directly (as we did for the measure of H3K4me3 demethylation activity). Indeed, SAM is notoriously labile and in vitro assays to measure HMT require to add exogenous SAM to cell extracts to detect any HMT activity, which would not allow us to test activity based on endogenous SAM levels.

      Instead, we used a ratiometric sensor to measure SAM concentration in cells (PMID 34937909). Chlamydia inclusions develop asynchronously, which allows to observe, 40 hpi, a continuum of early (low bacterial load) to late (high bacterial load) stages of infection. There was no correlation between bacterial load and SAM level, and this level was globally similar when comparing infected and non-infected cells. This experiment supports our hypothesis that protein hypermethylation is not due to an increase in SAM during infection.

      This experiment was also very interesting because it revealed a high cell-to-cell heterogeneity in SAM levels in HeLa cells. Thus, in some cells, SAM might be limiting, which could explain why only a fraction of cells display histone hypermethylation.

      Still, we cannot fully rule out the possibility that increase in SAM availability late in the infectious cycle in some cells, and is immediately consumed through protein methylation, resulting in no net [SAM] increase. The discussion was expanded to take these comments into consideration.

      Altogether, we think that the evidence of decrease KDM activities in infected cells late in infection are strong. Our data do not rule out the possibility that additional mechanisms may contribute.

      Moreover, the authors report that the effect of the demethylase inhibitor on histone hypermethylation is significantly potentiated by infection, suggesting that infected cells have greater methylase activity than uninfected cells, because the latter barely respond to the presence of demethylase inhibitor. In other words, a dramatic increase in histone methylation in the presence of demethylase inhibitor is most parsimoniously explained by increased methylation (no longer being removed by demethylase), not decreased demethylation (which would be analogous to treatment with demethylase inhibitor). The authors do not directly assay methylase activity. These concerns extend to the rationale used to justify experiments with infected mice, which the authors treat with the demethylase inhibitor.

      The observation that the same concentration of JIB-04 leads to an increase of histone methylation in infected cells and not in non-infected cells, is coherent with the data showing that aKG or iron supply diminish histone hypermethylation in infected cells. Indeed, the inhibitor is taken up similarly by infected and uninfected cells but the potency of the inhibitor will depend partly on levels of iron, aKG and succinate found in the cellular milieu so same concentration of inhibitor may inhibit demethylase activity in cells with higher succinate and/or low aKG and low iron but fail to inhibit demethylase activity in cells with higher iron or aKG or lower succinate. In other words, high iron, high aKG or low succinate will “buffer” JIB-04 and make it less potent since JIB-04 partly acts by competing with the iron (competitively) and the aKG (mixed competitive inhibition) PMID 23792809. The same phenomenon is expected for SD70 and TACH101 that share aspects of the mode of action of JIB-04 regarding partly competing for aKG and/or iron in the catalytic site.

      The authors perform experiments to characterize the consequence of hypermethylation genome-wide. Because the authors do not enrich for those cells which exhibit histone hypermethylation, the results reflect the mixed population, and therefore presumably dilute out important signal related to the phenomena under investigation. For example, the proteomic analysis of post-translational modifications identifies only one methylated histone species, whereas the immunofluorescent approach shows consistent effects across five different methylated histone species. Moreover, the chromatin immunoprecipitation analysis indicates that there is unexpectedly a lower density of methylated histones at regions which are also enriched in uninfected cells. The authors argue that this suggests increased methylation is happening "outside" of these histone-dense regions, but direct evidence in support of this claim is lacking.

      The caveat of bulk analyses as opposed to single cell resolution is indeed important to consider when analysing the chIP-seq data and we emphasized this point in the revised manuscript. We could have sorted the cells with high bacterial burden; this would probably have given stronger differences between the two samples. Still, the change in distribution of H3K4me3 in infected samples was very clear and statistically significant. A change in H3K9me3 distribution would be more difficult to catch, as the mark is more widespread.

      In sum, this paper provides compelling evidence in support of the notion that histones are hypermethylated at various residues late in chlamydial infection, that this process is modulated by known cofactors of demethylases, and is the result of high levels of bacterial replication in the cell. That histone hypermethylation governs host gene transcription during chlamydial infection suggests a relatively novel mechanism by which Chlamydia subverts the host cell to establish a replicative niche or egress to infect a new cell. The information obtained regarding the methylation status of host proteins and host gene transcription controlled by a metabolic cofactor during infection will be a useful resource for other researchers. However, in the current version of the manuscript, the mechanistic basis for these behaviors is relatively unclear.

      We thank this reviewer for constructive feedback. We believe that the mechanistic conclusions of our report have been strengthened during revision with additional experiments and text clarification.

    1. eLife Assessment

      This valuable study advances our understanding of confidence in reinforcement learning by considering value confidence and decision confidence within a common Bayesian computational framework. The evidence is solid, supported by converging analyses across multiple datasets, though the direct interaction between the two forms of confidence and the model identifiability requires further clarification. The work will be of primary interest to researchers in reinforcement learning, decision-making, and metacognition.

    2. Reviewer #1 (Public review):

      Summary:

      This study addresses an important question in reinforcement learning and metacognition by distinguishing value confidence from decision confidence and testing how each is computationally represented. The findings are significant because they suggest that value confidence is well captured by Bayesian uncertainty, whereas decision confidence reflects a hybrid computation combining probability correct with broader value certainty. The evidence is promising, supported by multiple datasets and model comparisons.

      Strength.

      (1) A major strength of the study is that the authors test their hypotheses across multiple datasets, including previously published datasets and newly collected data. This broad empirical approach increases the generality of the findings.

      (2) The Bayesian model of value confidence has a clear theoretical basis. The proposed hybrid model of decision confidence is also intuitive. It appears to capture important aspects of the decision confidence data.

      (3) The paper provides a useful framework for linking how certainty about value estimates guides the subsequent choice and the corresponding decision confidence.

      Weakness

      (1) The conceptual link between value confidence and decision confidence is not yet fully established. The manuscript argues that overall value certainty contributes to decision confidence, but this conclusion is based largely on the latent variable that the model infers from the decision confidence experiment alone. A more direct test would require measuring value confidence and decision confidence within the same participants and task, and analysing how these two types of confidence interact.

      (2) The individual-difference analyses in Figure 5 are methodologically challenging. The predictors used in these analyses are derived from model fits to the behavioural data and are then correlated to behaviour in the same task. This creates a risk that correlations inevitably arise. Thus, it does not assure that correlations are cognitively meaningful.

      (3) The model recovery results suggest that some candidate models are not clearly distinguishable.

      (4) The manuscript would benefit from clearer explanations of why specific models capture particular behavioural patterns.

      (5) The claim that value confidence modulates the exploration-exploitation trade-off should be interpreted carefully, because the model uses global uncertainty across both options, not option-specific value confidence.

    3. Reviewer #2 (Public review):

      Summary:

      In this work, the authors propose a common value-estimation framework based on Bayesian inference and show that it can account for both participants' confidence in their value estimates ("value confidence") and for their confidence in their final choices ("decision confidence").

      Strengths:

      The study extends several established findings in the confidence and reinforcement-learning literature. In particular, the authors not only examine decision confidence but also directly model value confidence, and they replicate the idea that decision confidence reflects a combination of multiple computations, previously described for categorical decisions (Navajas et al., 2017), in the context of continuous value-based decisions. I therefore consider the work a useful contribution to the field.

      Weaknesses:

      However, I believe that the scope of the conclusions is overstated relative to the results that are actually presented.

      (1) Interaction between value confidence and decision confidence

      The abstract and introduction frame the study as addressing a major gap in the literature, namely, the lack of direct investigation of the interaction between value confidence and decision confidence. Yet the manuscript never directly tests the interaction between these two quantities. Instead, the authors show that the reported decision confidence depends not only on the probability of being correct, but also on the precision of the decision variable DV, which is related to the precision of the value estimates underlying value confidence. While this is related to the proposed research question, it is not a direct analysis of the interaction between value confidence and decision confidence themselves.

      (2) Unified computational framework

      Similarly, the claim that the study provides a "unified computational framework" appears somewhat overstated. The proposed models build on standard and well-established Bayesian frameworks and extend them specifically to account for decision confidence. While this demonstrates that both forms of confidence can be expressed within a common Bayesian formalism, the manuscript does not establish a direct computational interaction or shared mechanism between them beyond their dependence on the same underlying uncertainty estimates.

      (3) "Phenotypes" interpretation

      The interpretation of the observed individual differences as distinct "behavioural phenotypes" also appears overstated. The reported analyses primarily show continuous variability across participants in the relative weighting of different components contributing to confidence reports, rather than evidence for qualitatively distinct categories or computational subtypes of decision-makers.

      (4) Decision confidence terminology

      I also found some conceptual ambiguity in the terminology used throughout the manuscript. Early in the paper, decision confidence is defined normatively as the subjective probability of having made the correct choice, corresponding to P(DV>0). Later, however, the authors show that participants' confidence reports are better explained by a combination of this probability and the precision of the decision-variable distribution. Despite this distinction, the manuscript continues referring to the reported quantity simply as "decision confidence." Clarifying the distinction between the theoretical construct and the empirical reports (for example, by referring to "reported decision confidence") would improve conceptual clarity.

    4. Reviewer #3 (Public review):

      Summary:

      Comay, Solovey, and Barttfeld aim to provide a unified computational account of confidence in reinforcement learning by distinguishing value confidence-the certainty associated with latent value estimates-from decision confidence-the confidence that a particular choice is correct. Across new experiments and reanalyses of previously published datasets, they argue that value confidence is best described by Bayesian posterior precision, that this form of confidence adaptively reduces decision noise as learning progresses, and that decision confidence is better captured by a hybrid model combining Bayesian probability correct with a more global estimate of value certainty. They further propose that individual differences in the relative weighting of these components define "confidence phenotypes" that predict task performance, exploration-exploitation behavior, and metacognitive accuracy.

      Strengths:

      A major strength of the study is that it addresses an important conceptual distinction that is often blurred in the confidence literature. The paper usefully separates uncertainty about latent environmental states from confidence in an action derived from those latent beliefs. This distinction is especially important in reinforcement learning, where uncertainty is not merely a retrospective judgment about accuracy but can directly shape future sampling, learning, and action selection. The manuscript is therefore well positioned to bridge work on Bayesian confidence in perceptual decision-making with work on uncertainty-guided learning and exploration.

      A second strength is the authors' use of multiple datasets and model comparisons. The claim that value confidence tracks Bayesian uncertainty is supported across tasks in which participants explicitly report confidence in value estimates, including datasets where reward variance is manipulated. The latter manipulation is particularly useful because it helps distinguish a Bayesian uncertainty account from simpler models based only on the number of observations. The finding that value confidence modulates the softmax slope and thereby promotes more exploitative choices as uncertainty decreases is also theoretically coherent and supported across several datasets, including a preregistered replication.

      The manuscript's most interesting and potentially impactful contribution is the hybrid model of decision confidence. The authors show that a model based only on Bayesian probability correct captures confidence on correct trials better than on incorrect trials, whereas adding an "overall value confidence" term improves the fit. This is a useful result because it suggests that confidence reports in reinforcement learning may not be a pure readout of decision-level discriminability, but instead may combine decision-specific evidence with more global latent-state uncertainty. This could help explain why human confidence often deviates from ideal Bayesian predictions, especially on error trials.

      Weaknesses:

      However, the interpretation of the hybrid model remains the main weakness of the paper. The second term, overall value confidence, is not equivalent to the precision of the decision variable. It can dissociate from decision difficulty: two options can be far apart but individually uncertain, or nearly identical but individually well estimated. The authors appear to recognize this issue and have reframed the term as "overall value confidence" rather than decision-variable precision. This is a useful clarification, but the conceptual role of the term still requires sharper treatment. In its current form, it is sometimes described as part of a unified confidence computation, but it may be more accurately understood as a biasing or contextual signal that modulates reported confidence without necessarily improving decision calibration.

      A related concern is model identifiability. In many reinforcement-learning tasks, probability correct and overall value confidence both change systematically over the course of learning. As a result, the hybrid model may gain predictive power partly because it captures generic time-on-task or learning-progress effects, rather than because participants explicitly combine two separable uncertainty signals. The manuscript would be stronger if it more clearly demonstrated that the two latent variables are distinguishable in the behavioral data, for example, through model recovery, parameter recovery, cross-validated prediction, and analyses of the correlation between latent regressors across task conditions and individuals.

      The link between the decision rule and confidence model also deserves more scrutiny. The authors use value confidence to modulate decision noise in the choice model, and then use a related global value-confidence term in the confidence-report model. This creates an appealing unified architecture, but it also raises the possibility that the same latent variable is doing multiple kinds of explanatory work. The paper would benefit from a clearer separation between uncertainty as a driver of choices, uncertainty as a determinant of confidence reports, and uncertainty as an inferred latent variable extracted from the same behavioral data.

      From a computational neuroscience perspective, the manuscript would also benefit from a more explicit discussion of how these confidence quantities might be represented neurally. The current model treats value confidence, probability correct, and overall value confidence as scalar latent variables available to the observer. Yet uncertainty-related computations may be represented nonlinearly in neural population activity rather than as explicit scalar readouts. Work on nonlinear neural decoding and population codes has shown that task-relevant variables can be carried by nonlinear statistics of neural activity, especially when nuisance variables obscure mean tuning, and that behavioral choices can reveal whether such nonlinear information is efficiently decoded. This literature provides a useful framework for connecting the present behavioral model to possible neural implementations of value and decision confidence.

      Overall, the authors largely achieve their goal of demonstrating that value confidence and decision confidence are computationally dissociable in reinforcement learning. The evidence for Bayesian value confidence is strong, and the evidence that confidence-guided exploitation improves the account of choice behavior is convincing. The evidence for the hybrid account of decision confidence is promising but would be strengthened by additional analyses clarifying model identifiability, the interpretation of the overall value-confidence term, and the conditions under which the model makes distinct predictions from simpler time-, value-, or evidence-based alternatives. The paper is likely to be useful for researchers interested in computational models of confidence, metacognition, and adaptive behavior under uncertainty.

    1. eLife Assessment

      This important study identifies a non-canonical essential role for acyl carrier protein in maintaining apicoplast metabolism and blood-stage survival in Plasmodium falciparum. The main conclusions are largely supported by strong genetic and biochemical evidence, although some claims regarding the dispensability of fatty acid synthesis pathways remain incomplete. The work provides novel mechanistic insight into ACP-mediated stabilization of pyruvate kinase II and will be of broad interest to the malaria and apicoplast biology communities.

    2. Reviewer #1 (Public review):

      This study provides evidence that the apicoplast-locaized isoform of acyl-carrier protein (ACP) has acquired important non-enzymatic functions in the malaria parasite. Previous studies have shown that the apicoplast-located FASII-dependent pathway of fatty acid synthesis is not essential in Plasmodium blood stages. In contrast, genome-wide knockout studies suggested that ACP, a key protein in this pathway, is essential in these stages, indicating that it may have additional non-canonical functions. In this study, the authors confirm that ACP is essential in Pf blood stages (using both apicoplast IPP rescue and conditional knockdown); show that this essential function requires modification with 4-phosphopantetheine and use proximity biotinylation and complementary immunoprecipitation pull-down approaches to provide compelling evidence that ACP binds to and stabilizes the apicoplast-located isoform of pyruvate kinase II. Notably, these interactions appear to differ from those associated with the binding of mitochondrial isoforms of ACP to proteins involved in Fe-S biosynthesis. Loss of ACP was shown to lead to a decrease in PKII levels and apicoplast DNA/RNA synthesis, consistent with loss of NTP synthesis in this organelle. The data are clear and very well described, and the findings represent a significant advance in our understanding of metabolic regulatory mechanisms in apicomplexan apicoplast studies.

      Strengths:

      The study uses a variety of complementary genetic approaches to demonstrate the essentiality of ACP and the enzyme involved in its activation with 4-PP in Pf blood stages, demonstrating that the ascribed non-enzymatic function is mediated by holo-ACP. Similarly, a number of complementary biochemical approaches, including proximity biotinylation, immunoprecipitation, and co-expression of PfACP and PK-II in a heterologous bacterial expression system, are used to confirm the physiological significance of the PfACP and PK-II interaction. The study also reports additional findings, such as the independence of P. faciparum blood stages on exogenous (media) fatty acids, indicating that intracellular stages can salvage all of their requirements from the red blood cell.

      Weaknesses:

      Overall, this is a very strong study. While questions remain around the function of other apicoplast ACP-interacting proteins detected in this study, I don't have any suggestions for significant improvements.

    3. Reviewer #2 (Public review):

      This study focuses on revealing the essential divergent function of the Acyl Carrier protein (ACP) in the deadliest human malaria parasite, Plasmodium falciparum. More precisely, using inducible KO, cellular and biochemical approaches, the authors determined that instead of a canonical role for ACP allowing the de novo synthesis of fatty acids in the apicoplast (essential relict plastid) of the parasite, the enzyme couples with pyruvate kinase II to generate nucleoside triphosphate to maintain parasite survival during blood stages. The study is novel, well-designed, providing interesting new data on Plasmodium and apicomplexa biology. The results convincingly support the major claim of the study. However, it is currently incomplete to support some claims on the essentiality of some apicoplast pathways.

      In this study, Geher et al. focused on deciphering the role of the Acyl Carrier Protein (ACP) present in the relict non-photosynthetic plastid, i.e. the apicoplast of the most lethal human malaria parasite, Plasmodium falciparum. More particularly, they determined an essential function of ACP independent of its usual/typical function as the central protein for the normal function of the apicoplast Type II fatty acid synthesis (FASII) pathway. Rather, the protein seems to associate with the apicoplast Pyruvate Kinase II, together generating an essential nucleoside triphosphate (NTPs) source to fuel the apicoplast and parasite survival instead.

      By generating a TetR-DOZY-based inducible KD line for ACP, they confirmed that the protein is indeed essential to maintain apicoplast integrity and parasite survival during asexual blood stages, as previously predicted and experimentally shown. They showed that ACP requires a biochemical modification, typically activating the protein for its function in the FASII pathway, i.e. binding of the 4-PP group by holoACP synthase. Then, they showed that the other enzymes of the FASII pathway are likely dispensable during the blood stage, as they were able to generate a KO line of the first enzyme of the pathway, FabD (which was predicted to be essential in P. falciparum). Based on a cell culture approach in a controlled culture medium, they further claimed that, unlike current evidence-based hypotheses, the FASII pathway (and thus a potentially FASII-linked ACP) has no role/activity during blood stages. Using a proximity biotinylation approach, they determined that ACP associates with the apicoplast pyruvate Kinase II (PKII), previously shown to generate NTPs in the apicoplast for energy and DNA/RNA maintenance (Xia et al. 2019), and not to fuel the FASII pathway as its main function in blood stages. Finally, they showed that the disruption of ACP induces the reduction of the presence/content in PKII in the parasite, as well as the drastic reduction of the apicoplast DNA and RNA content. Together, they concluded that the main function of ACP is indeed the NTP formation via its association with PKII, rather than its canonical role for the generation of fatty acids in the apicoplast.

      This study is novel and focuses on a topic of particular interest in malaria biology, but also for most of the apicomplexa-related diseases, and beyond for plastid bearing orgnaisms and this unusual role for ACP. The study is well thought out with proper biochemical approaches that convincingly point to this association of ACP with PKII for NTP synthesis as a major function during P. falciparum blood stages. However, there are currently some important experimental issues/flaws, missing experiments that induced wrong interpretations and thus do not support some important claims of the study, notably for the role of FASII and the interaction between ACP and PKII.

      Therefore, at this point, the study is only partial and would require major additions and/or important text edits/revisions before being considered for acceptance.

      Major points:

      From the graph of P. falciparum growth, we can see that in the lipid-rich condition, where both FabH KO and ACP KO can survive, the addition of mevalonate was essential for the growth of ACP KO. Along with the other evidence (PKII association, DNA levels...), we therefore agree that PfACP is involved in the mevalonate pathway. The authors claim that the FASII pathway is inactive/not essential in the P. falciparum blood stage. However, the authors have not shown any evidence on whether ACP is or not involved in the FASII pathway during the asexual blood stage. As currently designed, the experiments presented cannot conclude on that point for several reasons. Indeed, it was previously shown that (i) the expression of the protein from the FASII pathway are all present in blood stages and are significantly upregulated in patients that are under under "nutrient starvation" (Daily et al. Nature 2007), (ii) that, growing parasites under similar low lipid conditions in vitro induces an activation/upregulation of FASII, which can be measured by stable isotope precursor labelling and lipidomics (Botté et al. 2013), (iii) that growing the PfFabI KO line under deprived lipid conditions leads to parasite death (Amiar et al. 2020), indicating that the FASII pathway can become critical, if not essential, depending on the host nutritionnal content together correlating patients' data and metabolic adaptation for the same reasons in the related parastie Toxoplasma gondii (Amiar et al. 2020, Krishnan et al. 2020, Liang et al. 2020, Primo et al. 2021, Charital et al. 2024, Dass et al. 2024, Bitew et al. 2025).

      Here, the authors are expecting to show that FabH (and thus the FASII pathway) is not essential in an experiment that is not designed to be in low lipid conditions but rather in lipid rich conditions: Such high lipid conditions of culture in this study is granted by daily feedings with high fatty acid supplement (30-90 uM palmitic acid and 30-60 uM oleic acid). These fatty acid concentrations were used previously by Mitamura et al. (2005) and Mi-ichi et al.(2007) to replace non-determined supplements such as Serum or Albumax supplement to grant similar growth by a completely controlled culture medium.

      This means the concentrations above do not represent limited fatty acid concentrations, especially not with daily feeding (representing an excess supplied amount of lipids, unlike regular 48h feedings) that allowed the authors to easily reach very high non-physiological parasitaemia of more than 20%!! Amiar et al. previously showed essentiality of FabI in P. falciparum in the limited fatty acid culture at a lower concentration (<30uM 16:0, <45um 18:1), than the Mi-Ichi et al. controlled medium with regular 48 h culture feeding. Therefore, with the current experimental settings, the FAH KO is placed in high lipid conditions, thus preventing any conclusion on its essentiality under low lipid conditions.

      Furthermore, it is too uncertain to conclude that ACP is only essential for the mevalonate pathway. This would be a similar discussion to the Yeh et al. 2011 and the Swift et al., where induced Apicoplast knockout caused parasites to require IPP to survive, but there were always remnant apicoplast vesicles and thus the putative presence of an active FASII in the parasite, where de novo fatty acid synthesis could be maintained. Amiar et al. (2020) and Krishnan et al. (2020) showed that disruption of FASII and absence of de novo FA synthesis in T. gondii could be compensated by the exogenous supplementation of myristic acid, C14:0. Here, high fatty acid supplementation using commercially available fatty acids may include unexpected fatty acid species such as myristic acid in palmitic acid or oleic acid, since all commercially available fatty acids guarantee only >99% but not 100%. If P. falciparum requires a very, very low amount of myristic acid to survive, the amount of possible contamination, like 1 nM, may be sufficient to maintain their survival. Thus, ACP and FabH might be very important to generate de novo fatty acids within parasites, but this was not shown by the authors.

      Therefore, the manuscript currently contains incorrect conclusions on the potential essentiality/use of FASII, against current experimental evidence.

    4. Author response:

      We thank the editor and reviewers for the positive comments and critical feedback on our manuscript. We are currently preparing revisions to address the critiques provided by reviewer 2, which focused primarily on growth experiments performed with ∆ACP and ∆FabD P. falciparum parasites in minimal lipid conditions. We note that the major conclusions of our manuscript regarding an essential, FASII-independent function for ACP in apicoplast biogenesis do not require or rely on these experiments in minimal lipid conditions.

      Nevertheless, we believe that these observations have value and agree that they contrast with similar experiments reported in the Amiar et al. 2020 study referenced by the reviewer. We note that this prior study (and others cited by the reviewer) primarily focused on the related apicomplexan parasite, Toxoplasma gondii. We fully agree that available evidence in these and other papers supports a key, fitness-conferring role for FASII activity in growth of T. gondii parasites, including possible expanded functions for ACP that may differ from P. falciparum. We will revise our manuscript to clarify that our results only apply to P. falciparum. We note that our minimal lipid growth experiments with P. falciparum utilized culture conditions and concentrations that appear identical to those reported in the Amiar et al. 2020 study. Nevertheless, we agree with the reviewer that additional experiments will be required to fully test and understand FASII functions in asexual blood-stage malaria parasites, including possible functions in low-lipid conditions. We plan to revise our manuscript to clarify this and other points, and we will include expanded responses to the reviewer critiques.

    1. eLife Assessment

      This important work uses a sophisticated combination of neuromodulator imaging, optogenetics, and two-photon calcium imaging to examine how locus coeruleus-mediated norepinephrine signaling influences distinct hippocampal cell types. The evidence is solid and provides novel insights into cell type-specific responses to norepinephrine release. However, the conclusions would be strengthened by a more thorough analysis of the differences between locomotion-associated activity and optogenetic stimulation of the locus coeruleus.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Duss et al. use several complementary and state-of-the-art strategies to characterize the effects of norepinephrine release from LC axons on post-synaptic cell types in the hippocampus. While a large body of research supports an important role for NE signaling in hippocampal function, the precise role by which NE promotes these effects remains poorly elucidated, in large part due to the complexity that adrenergic subtypes can be expressed in a variety of cell types and promote a variety of responses. Towards assessing this, the authors first establish an optogenetic strategy by which their delivery stimuli mimic endogenous activation of LC in 'moderate' and 'high' acute stress events, using NE sensors to titer stimulation patterns to similar levels of NE release. They then conduct a series of 2P imaging experiments in mice and compare response properties of various cell types in the hippocampus (excitatory and inhibitory neurons, and astrocytes) when the animal is 'naturally' or optogenetically aroused (via activation of the LC). The results are surprising. Whereas natural arousal causes activation of astrocytes, pyramidal cells, and interneurons, optogenetic activation of the LC does almost the opposite, with only astrocytes responding positively. Another important finding from the study is that astrocytes seem to be the most responsive cell type in the hippocampus to NE release, suggesting they could be key components for downstream functional effects of NE release in this brain region.

      Strengths:

      (1) The study was methodically done with respect to the characterization of how optogenetic parameters related to levels of NE release. Also, the analysis of their calcium imaging of various cell types in the hippocampus was very comprehensive.

      (2) Related, their discovery that cell types in the hippocampus respond differently to NE release, while not a completely unexpected finding, is something that has not been addressed experimentally in such a direct way before (to my knowledge).

      (3) Their finding that optogenetic stimulation of the LC produces opposing results to when these cells are naturally activated has wide implications for the LC field and potentially beyond.

      Weaknesses:

      I was surprised that no efforts were made to further assess what might be causing this discrepancy in hippocampal responses to optogenetic vs. natural activation of the LC. Some experiments that I felt were missing:

      (1) The authors go to great lengths to measure NE release in a variety of arousing conditions (tail lift, foot shock, 5Hz LC opto, 20Hz LC opto), but then in their 2P imaging, they're comparing the opto results to a 'natural' arousal state defined as when the mice were in motion. Maybe I missed it, but I wasn't sure that they ever checked the level of hippocampal NE release in this running state, similar to what they did in the other arousal conditions. Thus, it wasn't clear to me how comparable this state was to the optogenetic stimulation.

      (2) The authors do a nice experiment to show that increases in the hippocampal NE sensors are dependent on LC activity via optogenetic inhibition of the LC (Figure 1, Supplement 3). It seems like a missed opportunity to include a similar strategy in their 2P testing, to assess whether the differing responses of pyramidal cells, interneurons, and astrocytes are truly due to NE release. I could imagine it might be difficult to precisely time LC inhibition with periods of movement, but I imagine that mice would still run even if the LC is inhibited.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript aims to determine the extent to which LC-mediated NA release in the CA1 region of the hippocampus (at both population and cellular levels) contributes to physiological arousal responses associated with innate behaviors (stress, locomotion). The manuscript is divided into two parts in which the authors compare time-locked responses in astrocytes, interneurons (pan-targeting), and pyramidal (CaMKIIa-driven targeting) cells.

      In the first part of the manuscript, the authors perform bulk recordings of either NA release or calcium activity locked onto either 'natural arousal' events (tail lift, foot shock, force swim) or direct optogenetic activation of LC somas. A first aim is to identify an optogenetic stimulation frequency that would mimic NE release in the target area by low- and high-intensity stressors. In the second aim, they compared evoked responses across cell types and concluded that stressors and direct LC activation trigger similar responses in astrocytes but not in interneurons or pyramidal cells.

      In the second - and most extended - part of the manuscript, the authors performed 2-photon cellular recordings of these different cell populations and compared responses evoked by the onset of locomotion vs. direct activation of the LC. Doing so, they observed a great degree of heterogeneity across these two conditions and across cell types. They conclude that NA effects on the hippocampus are primarily mediated by astrocytes and that LC-NA neuromodulation alone does not recapitulate the full breadth of 'natural arousal' modulations. They conclude that other neuromodulators likely contribute to how the hippocampus responds to high arousal levels.

      Strengths:

      Overall, the manuscript is well written and the figures are particularly clear.

      Optogenetics is a very successful technique in contemporary neuroscience, yet one important identified limitation is that it operates largely in a non-physiological regime, driving spike rates in regions rarely visited under normal physiological operations. This has raised valid concerns about the physiological relevance of findings obtained from studies using this technique. Here, the authors aimed at calibrating optogenetic manipulations of the LC so as to match the physiological release of NA observed in specific behavioral contexts. This is a valuable endeavor that could bring the field towards more reproducible and broadly valid findings.

      Another important open question is how different cell types coordinate to support global network activity and adaptive behavior. By recording distinct cell populations from the same region (CA1) and in response to the same category of endogenous versus exogenous events (locomotion or LC activation), it becomes possible to unravel important and specific operation modes, here also linked to a specific category of neuromodulation signaling.

      Weaknesses:

      This manuscript was difficult to review. There is clearly a lot of work and effort that went into it, and the multiple techniques seem well implemented, often with appropriate controls. Yet, the general framing, the links between experiments and interpretations, unfortunately, look questionable in my opinion. Below, I unpack what I think are the 4 main weakness points.

      (1) Incomplete calibration of optogenetic manipulations to physiological regimes

      While mapping optogenetic stimulation protocols to physiological variations is valuable, the proposed approach suffers from major limitations. First, the only parameter that is calibrated is the peak of NE release (as estimated from GRAB-NE fluorescence). Thus, it excludes other important aspects of the response, including trial-to-trial variability and the temporal dynamics of the response. Furthermore, stressor and LC activation conditions are simply non-comparable in terms of the duration of the stimulation (e.g., 3 min swim test versus 10s optogenetic stimulation), likely involving neuromodulation at different timescales (phasic vs. tonic). Albeit not explicitly mentioned, the number of trials and inter-trial interval between successive stimulations are also likely unmatched. On another note, the identification of the best stimulation frequency seems based on a grid of predefined values, while a more precise, continuous assessment could have easily been used. Finally, even though phasic NE release is known to depend on baseline tonic NE levels (especially with a sensor that reports a sublinear function of NE concentration), this dimension is ignored.

      (2) Weak links between imposed stressors and spontaneous locomotion

      The general approach is surprising: authors calibrated the optogenetic stimulation protocol on a range of stress-related behaviors and applied this to locomotion behavior. Indeed, while the first part of the manuscript uses different stressors in freely moving contexts to 'naturally' elevate arousal, the second part uses spontaneous locomotion bouts in a head-fixed situation as proxies for heightened 'natural' arousal. These two parts are very difficult to relate, and it is entirely unclear how NE regimes observed in the first context generalize to the second. Yet, on several occasions, the authors directly relate the first (fiber photometry, Fig.1) and second (2-photon, Fig. 2-6) parts of the manuscript. For instance, they conclude in favor of a "weak alignment between astrocytic responses to arousal and to LC stimulation on a cellular basis, despite the similarity of the bulk response." It remains unclear why closer preparations weren't used in the two parts, such as time-locked change in GRAB-NE2m fluorescence according to either locomotion onset or in a fear conditioning assay, both using fiber photometry in a head-fixed setting.

      (3) LC optogenetics and spontaneous locomotion differ by more than the origin of the arousal drive

      By directly comparing spontaneous locomotion and LC activation, the authors imply that the only difference between these two conditions is the origin of arousal: endogenous vs. exogenous, respectively. Furthermore, they interpret LC activation as triggering a pure NA effect while locomotion would reflect the conglomerate modulation from multiple neuromodulatory systems. On the one hand, LC activation likely results in the recruitment of other arousal centers (the raphe serotonin system, for instance, see 10.1101/2025.03.26.644382). On the other hand, differences between these conditions span well beyond specific arousal centers (see the massive motor-related activity in cortical dynamics: 10.1038/s41593-019-0502-4). Another, more methodological concern is the larger instability of the field of view during locomotion by comparison to optogenetic activation. While I am sure the authors corrected for movement-related translation in x and y directions, there might still be residual motion artefacts in the z direction that could account for some of the differences between the two conditions.

      (4) Loose equivalence between locomotion and natural arousal

      On many occasions, the authors draw a direct equivalence between spontaneous locomotion and 'natural arousal'. Arousal is a multifaceted concept that relates to far more behavioral readouts and network states than just locomotion. For instance, imagine a freezing mouse in response to a threat: locomotion would be absent, but the animal would still be quite aroused. It is ok to leave aside a particular readout and focus on other one(s) (especially thus in the case of arousal, which has many aspects). However, in that case, a single readout cannot be equated with 'natural arousal' as a whole. Instead, terms like 'locomotion' or 'locomotion-linked arousal' should be preferred. Indeed, in the particular case of locomotion, what is being readout is the upper part of the arousal continuum, whereas pupil size or whisker pad movements can also provide a more complete readout, including the lower and intermediate parts of that same continuum. While it is not necessary to include other arousal readouts (once claims are appropriately modified), the motivation for leaving out available readouts (lines 187-201) feels like a post-hoc rationalization.

      In sum, these 4 points call in my opinion for a profound change in how results are presented and interpreted. If agreed, a solution could be to leave aside the first part of the manuscript, to provide a more accurate picture of the differences between optogenetic activation and spontaneous locomotion, and to better flag the limitations of the approach (a part that I believe is entirely missing in the current version).

    4. Reviewer #3 (Public review):

      Summary:

      In this study, the authors focused on the CA1 region of the hippocampus to compare Ca2+ dynamics in astrocytes, pyramidal neurons, and interneurons in response to optogenetic stimulation of locus coeruleus-triggered noradrenaline (NA) release, or movement (natural arousal)-triggered NA release. The most striking finding is that all studied cell types responded differently to LC stimulation compared to natural arousal. The description of these findings is important as a resource for further mechanistic studies on how multiple neuromodulator systems may interact or for predicting the consequences of the selective impairment of the noradrenergic system.

      Strengths:

      The technical design and conduct of the experiments, analysis including statistics, as well as the presentation of the results, are timely and very solid.

      Weaknesses:

      The identity and localization of NA receptors responsible for effects on neurons are less clear, and therefore, the difference between LC stimulation and natural arousal is less surprising. However, the presented data are consistent with the established finding that astrocytes directly sense NA mainly through α1 adrenergic receptors, yet in this study, astrocytes that responded strongest to LC stimulation did not respond strongest to natural arousal, and vice versa for other astrocytes.

      The authors seem to favor diversity of astrocyte responsiveness as an explanation, but also mention differences in LC activation pattern and distance of individual astrocytes to NAergic nerve terminals. Therefore, this warrants a careful consideration of a critical aspect of the experimental design. The authors delivered Ca2+/NA sensors as well as the optogenetic tools via AAV. While Figure 1 Supplement 3 suggests that most LC neurons were transduced, AAV transduction will almost certainly lead to a diversity in copy numbers per cell. On the receptor side, this can lead to an artificial diversity in Ca2+ response detection sensitivity among individual cells, but more importantly, for the LC, this could account for a different pattern of activation by optogenetic stimulation compared to activation by natural arousal. Such a problem would remain unnoticed with the currently presented matching of optogenetic and natural arousal stimulations of LC using population NA sensor signals (Figure 1, fiber photometry).

      Major suggestion:

      A critical experiment to test for this caveat would be to ideally express the NA sensor in astrocytes (due to their space-filling process arborizations and direct response to NA; but expression in neurons, as present, would work as well) and study the spatial pattern of NA release using two-photon microscopy, comparing multiple days and LC stimulation by optogenetics versus natural arousal. In case these experiments revealed nonuniform NA signal patterns, stable over days, but different when caused by optogenetic stimulation versus natural arousal, it would possibly shift the interpretation of the astrocyte response patterns towards depending mainly on NA release rather than diversity in NA responsiveness. Such a finding would be consistent with studies that compared arousal-mediated Ca2+ dynamics in NAergic terminals and Bergmann glia in the cerebellum (PMID: 36790089). On the other hand, in case these added experiments revealed similar NA release patterns in response to optogenetic stimulation versus natural arousal, then the presented findings would convincingly represent a biological phenomenon.

      Minor suggestion:

      Using "movement" as a proxy for arousal is very appropriate. To avoid the misunderstanding that different phenomena have been studied, it may be useful to acknowledge that early studies of noradrenergic signaling to astrocytes have found that speed of locomotion does not correlate well with astrocyte Ca2+ responses, and electromyographic signals have been used as a "proxy for movement" (PMID: 24945771).

    5. Author response:

      We thank the reviewers for their positive and constructive feedback and for the careful reading of our manuscript.

      We plan to address the reviewers’ comments and, specifically, to more thoroughly compare movement-associated activity with optogenetic stimulation of the locus coeruleus (LC), with new experiments, clarifications, and additional analyses.

      (1) We plan to perform new experiments using two-photon imaging of noradrenaline (NA) sensors in head-fixed mice during both optogenetic LC stimulation and spontaneous movement. This will, if successful, allow us to directly compare the spatial and temporal structure of NA release across conditions, and to quantify NA amplitude during locomotion versus LC stimulation.

      (2) We will analyze existing NA fiber photometry data for movement-related NA release and compare it to release evoked by LC stimulation.

      (3) In general, we plan to more prominently highlight the limitations of our study that were brought up by the reviewers. In particular, we will expand our discussion of other neuromodulatory systems and their interactions with the LC-NA system, and will tone down conclusions of our study if they cannot be supported by the additional planned experiments and analyses.

      Finally, a reviewer suggested the additional experiment to inhibit LC while performing two-photon imaging in head-fixed animals. These experiments have, due to their technical complexity, a low likelihood of success. In addition, recent work from the lab of Emily Macé already performs LC inhibition during functional recordings (doi: 10.64898/2026.03.06.710089). This work supports our interpretation that the contribution of LC-evoked NA release does not dominate movement-related signals. We will discuss these recent findings in the revised version of our manuscript.

      Together, we believe that these planned experiments, analyses, and revisions will address all main concerns raised by the reviewers.

    1. eLife Assessment

      This important technical development for neural circuit tracing in larval zebrafish consists in an enhanced rabies virus for improved retrograde transneuronal tracing, supporting a new method for combined structural and functional brain mapping which is demonstrated with compelling evidence. The work will interest zebrafish neurobiologists for the identification of neuronal connectivity patterns while simultaneously monitoring circuit activity.

    2. Reviewer #2 (Public review):

      The study by Chen, Deng et al. aims to develop an efficient viral transneuronal tracing method that enables retrograde tracing in larval zebrafish. The authors utilize pseudotyped rabies virus that can be targeted to specific cell types using the EnvA-TvA system.

      Pseudotyped rabies virus has been used extensively in rodent models and, in recent years, has begun to be developed for use in adult zebrafish. However, compared to rodents, the efficiency of spread in adult zebrafish is very low (~one upstream neuron labeled per starter cell). Additionally, there is limited evidence of retrograde tracing with pseudotyped rabies in the larval stage, which is when most functional neural imaging studies are conducted in the field. In this study, the authors systematically optimized several parameters for rabies tracing, including rabies virus strains, glycoprotein types, temperatures, expression construct designs, and the elimination of glial labeling. The optimal configurations developed by the authors are up to 5-10-fold higher than more commonly used configurations.

      The results are compelling and support the conclusions.

    3. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      (1) Presentation of Figures in the Response Letter

      I would like to note that the figures included in the response letter would benefit from improved organization. For example, Author response image 1 lacks clarity for experimental conditions. From the response letter, my understanding is that a "Labeling rate index", Rg−Rn, was calculated to represent the difference in the rate of increase in labeling between neurons and glial across two time intervals based on experiments shown in Figure 2-figure supplement 1C and G. It seems that a mean convergence index was calculated for each experimental condition at each time point for glial and neurons, and then the differences in mean convergence index increase between time intervals were calculated for glial and neurons. The legend needs more detail to enhance clarity.

      Yes, the “labeling rate index” (Rg−Rn) corresponds exactly to the reviewer’s understanding. Specifically, it quantifies the difference between neurons and glia in the increase of the mean convergence index across two defined time intervals, calculated separately for each experimental condition based on the experiments shown in Figure 2–figure supplement 1C and G.

      To improve clarity, we have substantially revised the figure legend to explicitly describe (i) the definition of labeling rate, (ii) how the mean convergence index was computed for neurons and glia at each time point, (iii) how changes across time intervals were derived, and (iv) how to calculate the labeling rate index. In addition, we have moved this analysis to Figure 2-figure supplement 2 and cited it in Line 191.

      Furthermore, the manuscript should clearly distinguish between figures generated from re-analysis of existing data and those based on newly conducted experiments. This distinction should be explicitly stated in the figure legends and/or main text.

      I recommend that all response figures containing data integral to the authors' rebuttal be properly integrated into the manuscript's existing supplementary figure set, rather than remaining isolated in the response document. This would enhance clarity and ensure that key supporting data are fully accessible to readers. For instance, Author response image 1 can be integrated with Figure 2-figure supplement.

      We appreciate the reviewers’ valuable suggestions. We have revised the figure legends and/or corresponding main text to clearly distinguish figures derived from re-analysis of existing data from those based on newly conducted experiments. In addition, all response figures containing data integral to our rebuttal have now been integrated into the current manuscript’s supplementary figure set.

      Specifically, Author response images 1 and 3 have been incorporated into Figure 2–figure supplement 2 and Figure 2–figure supplement 3, respectively; Author response image 2 has been incorporated into Figure 1–figure supplement 2. Author response image 4 has been incorporated into Figure 1,2–figure supplement 1. These changes improve clarity and ensure that all supporting data are readily accessible to readers.

      (2) Glial Cell Labeling and Specificity of Trans-Synaptic Spread

      The authors provided a comprehensive and well-reasoned response to the concern regarding the labeling of radial glial cells. The inclusion of a dedicated section in the revised Discussion and response figures (possibly to be integrated with supplementary figures), strengthens the manuscript.

      The authors have made an interesting observation in Author response image 2 that glial labeling was frequently observed near the soma and dendrites of starter cells, suggesting that transneuronal labeled glial cells may be synaptically associated with the starter neurons. Also astroglia starter cells lead to infection of nearby TVA-negative astroglia, suggesting astroglia-to- astroglia transmission.

      I find the response scientifically satisfactory and appreciate the authors' transparency in addressing the limitations of their approach.

      We thank the reviewer for the positive and thoughtful evaluation. As suggested, we have integrated the revised Discussion and the corresponding response figures into the main text and the supplementary figure set, ensuring that these observations and their interpretation are clearly presented and readily accessible to readers.

      (3) Temperature Effects and Larval Viability

      The authors' justification for raising larvae at 36C to improve labeling efficiency is reasonable. The supporting data indicating minimal impact on larval viability within the experimental timeframe are convincing. Referencing prior behavioral studies and including survival data under controlled conditions adds credibility to their claims. I find this issue satisfactorily addressed.

      We thank the reviewer for this positive and constructive evaluation.

      (4) Viral Toxicity and Dosage Considerations, Secondary Starter Cells

      The authors present a well-reasoned explanation that viral cytotoxicity is primarily driven by replication and not by viral titer or injection volume. However, the inclusion of experimental data directly testing the effects of higher titer or volume on starter cell viability would have strengthened this point, particularly since such tests are relatively straightforward to perform.

      We agree with the reviewer that directly testing the effects of viral titer and injection volume on starter cell viability would further strengthen this point. In practice, we have already used the highest CVS virus titer that could be reliably generated in our system. Therefore, we tested injection volumes of up to 20 nl and observed no detectable effect on starter cell survival, whereas higher injection volumes resulted in deformation of the larval brain, precluding their use.

      Although not shown as a separate figure, these data informed our interpretation of viral toxicity, which is now described more clearly in the revised Discussion. We hope that this explanation and the clarified discussion adequately address the reviewer’s concern.

      Regarding the potential contribution of secondary starter cells, the authors provide a convincing rationale for why such effects are unlikely under their sparse labeling conditions. However, in cases where TVA and G are broadly expressed-such as under the vglut2a promoter, as shown in Author response image 2 it would be valuable to directly evaluate this possibility experimentally. While the authors' interpretation is reasonable, empirical validation would further strengthen their conclusions.

      We appreciate the reviewer’s interest in experimentally evaluating the potential contribution of secondary starter cells under conditions of broad TVA and G expression. In response, we performed additional viral tracing experiments in which TVA and G were driven by the excitatory neuronal marker vglut2a to achieve broad helper expression.

      As shown in a representative case (Author response image 1), newly appearing tdTomato<sup>+</sup> neurons were observed at the later time (6 vs. 3 dpi, circles), many of which were spatially separated from EGFP<sup>+</sup>/tdTomato<sup>+</sup> starter neurons identified at the early time point (3 dpi, dashed circles). Notably, a subset of these newly labeled tdTomato<sup>+</sup> neurons colocalized with EGFP (6 vs. 3 dpi, dashed cyan circles). These new EGFP<sup>+</sup>/tdTomato<sup>+</sup> neurons may represent secondary starter cells or delayed infection of initially targeted starters. Interpretation of tdTomato<sup>+</sup>-only neurons (6 dpi, gray circles) is further complicated by variability in projection distance and synaptic strength, as short-range secondary-order (or multi-level) inputs and long-range first-order inputs may be labeled within similar time windows. In addition, in the presence of multiple primary or secondary starter neurons, unambiguous assignment of labeled inputs to specific starters remains challenging, even with high-temporal-resolution imaging.

      Owing to these constraints, empirical identification of secondary (or multi-level) connections is not readily achievable with the current tracing strategy. A potential solution would be to combine pan-neuronal helper expression with spatiotemporally controlled activation, for example, through a transgenic line enabling light-inducible helper expression (e.g., G protein). Such an approach would enable delayed and cell-specific initiation of secondary (or multi-level) starters, thereby temporally separating long-range first-order inputs from multi-step circuit propagation and permitting input tracing of targeted cells, ultimately improving the spatiotemporal resolution of circuit mapping.

      We have incorporated a dedicated section in the revised Discussion to clarify the applicable scenarios, limitations, and future directions of this viral tracing strategy in zebrafish.

      Author response image 1.

      Recombinant RV-based viral tracing under broad helper expression conditions.

      Time-lapse (3 and 6 dpi) confocal images of the larval hindbrain showing recombinant RV-based viral tracing under broad helper expression (TVA and G, green) via vglut2a promoter-driven UGNT, following posterior hindbrain infection with CVSdG-tdTomato[EnvA] (magenta). Dashed circles, areas enriched with EGFP<sup>+</sup>/tdTomato<sup>+</sup> neurons; gray circles, areas enriched with tdTomato<sup>+</sup>-only neurons; dashed white lines, hindbrain boundaries. C, caudal; R, rostral. Scale bars, 20 μm.

      Reviewer #2 (Public review):

      The study by Chen, Deng et al. aims to develop an efficient viral transneuronal tracing method that allows efficient retrograde tracing in the larval zebrafish. The authors utilize pseudotyped-rabies virus that can be targeted to specific cell types using the EnvA-TvA systems. Pseudotyped rabies virus has been used extensively in rodent models and, in recent years, has begun to be developed for use in adult zebrafish. However, compared to rodents, the efficiency of spread in adult zebrafish is very low (~one upstream neuron labeled per starter cell). Additionally, there is limited evidence of retrograde tracing with pseudotyped rabies in the larval stage, which is the stage when most functional neural imaging studies are done in the field. In this study, the authors systematically optimized several parameters of rabies tracing, including different rabies virus strains, glycoprotein types, temperatures, expression construct designs, and elimination of glial labeling. The optimal configurations developed by the authors are up to 5-10 fold higher than more typically used configurations.

      The results are convincing and support the conclusions. There are some additional changes that are recommended:

      (1) The new data included in the response to reviewer's letter are important to support the main conclusions and should be included in the manuscript.

      We agree with the reviewer that the new data provided in the response are important for supporting the main conclusions. Accordingly, we have now incorporated all four figures from the response into the supplementary figure set of the revised manuscript and added the corresponding descriptions and discussion to the main text where appropriate.

      (2) Line 357-362: This section should include all of the response letter figures and associated details. Additionally, the Author response image 3 is at odds with Fig 2-supplement 1G. In Author response image 3, ~75% of glial cells labeled at 4 dpi loses their fluorescence by 10 dpi. However, Figure 2-supplement 1G shows that glial overall labeling increases ~2 fold from 4 dpi to 10 dpi. This would suggest that the de novo labeling rate for glia is much higher than the net labeling rate calculated from the convergence index. The authors should clarify these findings.

      We agree with the reviewer that the original section at Lines 357-362 should cite the relevant figures and include the associated details. We have now relocated this content to the Results section and incorporated the corresponding figures and descriptions.

      In addition, we fully agree with the reviewer’s interpretation regarding the apparent discrepancy between the high loss rate of early-labeled glial cells (previously Author response image 3, now Figure 2—figure supplement 3) and the net increase in total glial labeling (Figure 2—figure supplement 1G). This pattern indicates that the net convergence index underestimates the true rate of de novo glial infection, as early labeled glial cells progressively lose detectable fluorescence while overall glial labeling continues to increase, implying ongoing de novo infection events outpace this loss. We have clarified this point in the Results section.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      The new data included in the response to reviewer letter are important to support the main conclusions and should be included in the manuscript.

      This recommendation echoes the point raised in Reviewer #2’s Public Comment #1. As detailed in our response there, all new data originally included in the response letter have now been fully integrated into the manuscript’s supplementary figure set, with corresponding descriptions added to the main text.

      Line 357-362: This section should include all of the Author response images and associated details. Additionally, Author response image 3 is at odds with Fig 2-supplement 1G. In Author response image 3, ~75% of glial cells labeled at 4 dpi loses their fluorescence by 10 dpi. However, Figure 2-supplement 1G shows that glial overall labeling increases ~2 fold from 4 dpi to 10 dpi. This would suggest that the de novo labeling rate for glia is much higher than the net labeling rate calculated from the convergence index. The authors should clarify these findings.

      This recommendation echoes the concern raised in Reviewer #2’s Public Comment #2 regarding the apparent discrepancy between glial cell loss and the net increase in glial labeling. Please refer to our response to that comment for a detailed explanation. Briefly, we clarify that the continued increase in overall glial labeling despite substantial loss of early-labeled glia indicates a high rate of ongoing de novo infection that is not captured by net convergence index measurements alone. The relevant figure and associated details, including this clarification, have now been incorporated into the revised main text.

      Data and description for response letter Figure 4 should be quantified and added to the manuscript.

      Across nine infected larvae examined, initial infection was consistently restricted to TVA-positive astroglia, typically involving a single starter glial cell per larva. No viral spread was observed in three larvae injected with SADdG-mCherry[EnvA], whereas astroglia-to-astroglia transmission was detected in three of six larvae injected with CVSdG-tdTomato[EnvA]. Importantly, no neuronal labeling was observed in any of the experiments. These quantitative data and descriptions, originally presented as Author response image 4, have now been incorporated into the main text as Figure 1,2–figure supplement 1).

    1. eLife Assessment

      In this valuable study, de Vries and colleagues aim to determine how the perception of biological motion is organized at the neural level, specifically testing whether this process rests on hierarchical predictive processing by extending a methodological framework that the authors previously published. The evidence is solid for the empirical claim that neural representations of body motion systematically lead the stimulus in time, with simulations validating the regression approach and consistent effects on both peak magnitude and peak latency. Support for the stronger theoretical interpretation that these signatures specifically reflect active hierarchical predictive inference requires further substantiation, since the design and analysis do not distinguish such inference from cached associative retrieval or from nonlinear temporal integration of slowly varying features.

    2. Reviewer #1 (Public review):

      Summary

      The authors apply dynamic representational similarity analysis (dRSA), a method introduced in de Vries and Wurm 2023, to source-reconstructed MEG data from 40 participants who viewed ballet dancing sequences under three conditions: normal viewing, up-down inversion, and temporal piecewise scrambling. In normal viewing, they replicate their previous finding of a hierarchical pattern of leading-edge neural representations, with view-invariant body motion represented earliest in time (around 500 ms before the corresponding stimulus state), followed by view-dependent body motion (around 200 ms) and pixelwise motion (around 150 ms). Inversion selectively attenuates the leading-edge representation of view-invariant body motion while enhancing view-dependent body motion. Scrambling abolishes all leading-edge motion representations and instead increases post-stimulus representations of body posture. The authors interpret these findings as evidence that biological motion perception relies on a hierarchy of priors operating within a predictive-processing framework, with inversion specifically disrupting holistic priors and scrambling disrupting kinematics priors.

      Strengths

      The empirical work is careful and technically ambitious. The dRSA framework introduced in the 2023 paper is a useful methodological contribution to the study of dynamic neural representations, and the present manuscript extends it in well-motivated directions. The dataset is substantial: 40 participants, source-reconstructed MEG, three within-subject conditions. The replication of the 2023 normal-condition findings in an independent 40-subject sample is solid, which is increasingly rare and welcome in the field. The inversion and scrambling manipulations are well-motivated, and the conditions are matched on stimulus identity. Principal component regression is used appropriately to handle the genuine challenge of correlated and autocorrelated stimulus features, and the authors validate this choice through simulations. Eye position is included as a covariate and successfully regressed out, addressing a common confound in MEG decoding work. Behavioral catch trials demonstrate that participants attended to the stimuli across conditions. Both frequentist and Bayesian statistics are reported with appropriate corrections for multiple comparisons. The inversion result, in particular, is striking, and the asymmetry between view-invariant and view-dependent representations is informative.

      Weaknesses

      The central interpretive step in the manuscript treats a negative-lag dRSA peak as direct evidence for active hierarchical predictive inference. The data are equally consistent with at least three other accounts that the manuscript does not engage with, and the conclusion is therefore stronger than the data support.

      First, the leading-edge dRSA signature is a natural consequence of nonlinear temporal integration of autocorrelated stimulus features. A long line of work from the Winawer and Grill-Spector labs (Zhou et al. 2018, Zhou et al. 2019, Stigliani et al. 2017, Kim et al. 2024) has established that the human visual cortex implements compressive temporal summation with delayed divisive normalization and that temporal integration windows progressively increase from early to higher visual areas. A nonlinear-summation response to an autocorrelated feature encodes deviations from the recent baseline. For smooth trajectories, this is essentially a local derivative, and the derivative inherits the trajectory's leading edge as a free consequence - no predictive machinery required. The integration-window hierarchy that Kim et al. (2024) recovered from voxelwise spatiotemporal pRFs maps onto the 150 / 200 / 500 ms hierarchy reported here almost one-for-one. That alignment is unlikely to be coincidental and deserves explicit treatment.

      Second, the experimental design places participants firmly in the regime where Dayan's successor representation (SR) predicts that the brain holds a precompiled associative cache of trajectory structure. Each unique sequence is presented approximately 47 times across the experiment. An SR in Dayan's original formulation is a precompiled lookup table, not an online inference engine - querying it during familiar trajectories produces leading-edge representations through passive associative retrieval, mechanistically distinct from active prediction despite producing similar signatures. The senior author's own lab has demonstrated SR-like representations in V1 (Ekman, Kusch, de Lange 2023 eLife), but this paper is not cited or engaged with in the present manuscript despite its direct relevance.

      Third, the canonical computational model of biological motion perception (Giese and Poggio 2003 Nat Rev Neurosci) is a fully feedforward template-matching architecture that predates the predictive-coding framing of biological motion. It accommodates the inversion effect (templates tuned to upright statistics), the hierarchy of timescales (graded leaky integrator time constants), and the scrambling effect (broken sequence-neuron activation) without invoking generative models or prediction errors. The manuscript cites Giese-tradition work for the inversion-effect literature but does not engage with the model itself, even though it is the field standard.

      The inversion result, while empirically striking, has a simpler interpretation than the one offered. Inversion makes viewpoint-invariant body computation fail because the underlying machinery is tuned to upright body statistics. A weaker representation produces a weaker dRSA signature at every lag, including the leading edge - no appeal to priors in the active-inference sense is required. The view-dependent enhancement under inversion fits this reading naturally: when viewpoint abstraction fails, processing falls back to viewpoint-specific representations that remain extractable. The manuscript implicitly acknowledges this when it states that "predictions were channeled to the level at which prediction was still possible," but does not notice that this concession softens the strong predictive-coding inference.

      The scrambling result is internally awkward on the predictive-coding framing. The paper acknowledges that pixelwise motion prediction should, in principle, survive 200-500 ms scrambled segments (typical latency around 150 ms) but reports that it does not. The proposed save - that segments are "too short to start up prediction" - undercuts the framework, since by the same logic, most of normal viewing would also be pre-prediction. A cleaner reading is that scrambling destroys the temporal autocorrelation of stimulus features, which is the prerequisite both for nonlinear-summation neural responses to produce leading-edge representations and for SR-style associative retrieval to operate.

      A further concern is that the experimental design and analysis pipeline are structurally biased toward producing the cleanest possible predictive signature. The 14 stimuli are repeated extensively, and trials are averaged across repetitions before dRSA is computed, filtering out exactly the variability that would distinguish online prediction from amortized retrieval. The 2023 paper reports a control comparing the first and last thirds of the experiment, but this test is in the post-saturation regime for any plausible associative-learning rate and does not actually adjudicate the question. A first-exposure or first-run analysis would be diagnostic. Finally, the behavioral task changed between the 2023 paper and the present manuscript. The earlier paradigm asked participants to recognize the current motion ("arms moving up?"), while the present paradigm asks participants to judge whether an occluded video continues correctly. The latter explicitly demands prediction. This change transforms the experimental context from naturalistic viewing into one that actively incentivizes predictive engagement, potentially inflating the very signatures the paper interprets as spontaneous prediction.

      The 2023 Nature Communications paper actually navigated these interpretive questions more carefully than the present manuscript does, explicitly stating that the approach "does not provide conclusive evidence for predictive processing/coding theory but leaves the door open for related theories such as adaptive resonance or Bayesian inference without predictive coding." The current manuscript would benefit from restoring that epistemic discipline. The data and methods are valuable; the interpretive frame is overstated relative to what the evidence supports.

      Impact and utility

      The dataset and dRSA framework are useful contributions to the study of neural representation of dynamic stimuli, and the inversion and scrambling conditions open productive lines of inquiry. The interpretive over-commitment to predictive processing risks limiting the paper's reach into adjacent literatures - temporal integration, successor representations, template-matching biological motion models, encoding-model approaches - where the findings could land productively. With a more pluralistic interpretive frame, this work would speak to a substantially broader audience and connect more naturally with existing mechanistic accounts of dynamic visual processing.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, de Vries and colleagues apply successful probabilistic inference and predictive coding frameworks to the question of biological motion perception. In contrast to most studies of predictive processing in humans, which rely on the presentation of discrete events, they instead aimed to track continuous predictions in the context of more naturalistic inputs such as biological motion. In these settings, the authors have previously demonstrated an inverted temporal hierarchy of prediction whereby high-level movement features (e.g., view-invariant body motion) are predicted earlier than lower-level ones (e.g., pixelwise motion). The specific question they set out to address in this manuscript is whether these predictions derive from prior beliefs about the biological and physical organization of biological movements versus the local extrapolation of motion from past observations.

      The authors used anatomical MRI-driven source reconstruction of MEG activity recorded from human participants watching either normal, vertically-mirrored, or temporally scrambled movies. They then aimed to correlate activity in preselected ROIs with summary representations of these movies based on different visual features at 3 different hierarchical levels using RSA. Doing so, they could confirm that predictive processes could be identified prior to the change in the stimulus and organized anatomically along the visual cortical hierarchy. Critically, they report that mirrored movies selectively disrupted the highest processing level while the lowest level remained largely unaffected. Interestingly, the predictions at the intermediate level were boosted in mirrored movies, suggesting a possible channeling of predictions at this level when highest-level predictions are unavailable. Finally, disrupting all predictive aspects with the scrambled movies entirely abolished predictions at all levels, with signals mainly reflecting reactive bottom-up processing of inputs.

      In sum, biological motion perception relies on a tight coordination of multi-level predictions based on both motion-related holistic and kinematics priors.

      Strengths:

      Overall, this is a very strong manuscript, with the text being clearly written. I liked the fact that the authors not only compared responses to normal videos against the same videos flipped upside-down, but also to temporal piecewise scrambling of that same video, allowing to identify the respective roles of holistic motion priors vs. temporal predictions. Of course, more work is needed to tease apart what key quantities are represented in these holistic priors. For now, the authors argue that they likely combine prior beliefs about the biological organization of bodies, such as the likely angle of joint movements, and about the physics of reality, such as gravity. Further work teasing apart these aspects would be interesting to read!

      All analyses seem well executed and, while some aspects of the presentation of results could be slightly improved (see below), the manuscript is very clear and the conclusions are supported by the data. Finally, I liked the words of caution the authors added to the discussion. For instance, while they largely used negative vs. positive latency as a proxy for top-down vs. bottom-up processing respectively throughout the manuscript, they also accurately acknowledge that predictive computations could also modulate processes at positive lags, through, for instance, latency modulation.

      Weaknesses:

      The main aspect of the work I was left to struggle with is this idea that priors can be read out directly from large patterns of activity rates as measured with MEG. While some past experimental work does support this view, theoretical proposals also suggest that one benefit of predictive coding lies in its computational and energy-efficient properties, whereby only novel, unpredicted aspects are encoded in the rate of neural activity. Some other research lines, for instance, focusing on silent working memory, also report the brain's ability to store important computations in ways that are not reflected in costly increases in overall activity. The authors do not really unpack why they expect to see predictions to be encoded in such a way in the first place. They also do not discuss what that implies in terms of neural organization and whether other aspects of neural activity (e.g., oscillations, synaptic weights) could subtend predictive processing in this context. At the end of the day, this activity change is clearly there in the data, so that's totally fine to interpret that; it just would be helpful to unpack what such an implementation of prior beliefs would imply in terms of neural organization.

      The other weakness point I see is the little consideration for behavior throughout the paper. Behavior is indeed mostly treated as a negative control, ensuring that differences between conditions at the neural level do not follow from different behavioral strategies or other peripheral factors. Critically, task design nicely incorporates two types of tasks: one that is related to motion (occlusion of movement) and one that's independent of it (color change of fixation cross). Yet, these conditions are not directly compared at the neural level. It would be useful to see whether the neural signatures of prediction are largely independent from the ongoing task or whether behavior gates the types of priors and prediction processes that are applied to incoming sensory inputs. Moreover, the text says that "neither in accuracy nor in reaction time was there a significant difference between conditions", yet significance stars in Figure 1d seem to suggest there is a difference in the fixation cross task. What am I missing? If there is indeed a difference in overall performance, can the results (esp. the reduced dRSA correlation strength in normal < inverted < scrambled movie) be interpreted in terms of a multi-tasking cognitive cost?

      I also have some other minor questions and comments:

      (1) In this task situation, prediction does not only come in the continuous domain but also relies on a mental simulation model, in particular in the occlusion task. However, corresponding literature, notably the work by Shepard & Metzler (1971) on mental rotation (as well as follow-ups), is not mentioned here, I believe. Could the authors perhaps mention this if they think that's relevant (if not, feel free to ignore).

      (2) I'm concerned that the novelty of dynamic RSA as explained at lines 56-64 might appear slightly exaggerated. After all, isn't it just a generalization of matrix correlation in model and brain time domains? (Again, feel free to ignore if I misunderstood.)

      (3) How do authors explain that high-level motion prediction is still significantly larger than zeros (correct?) in the inverted movie condition? Shouldn't it be entirely abolished?

    4. Reviewer #3 (Public review):

      Summary:

      The authors investigate whether the brain's predictive representation of observed biological motion depends on holistic priors about body structure or on kinematic priors about motion continuity. The manuscript applies dynamic representational similarity analysis to MEG data from a large number of participants viewing ballet sequences under three conditions: normal, upside-down inverted, and temporally scrambled into short epochs.

      Strengths:

      The study reports that inversion selectively attenuates predictions of view-invariant body motion and enhances predictions of view-dependent body motion, while leaving low-level pixel-wise motion prediction unaffected. Further, scrambling eliminates predictive motion representations at every level and instead produces stronger post-stimulus representations of body posture, with view-invariant posture also delayed. The pattern across the two manipulations is internally consistent, holds across both peak magnitude and peak latency measures, and is also supported by a neural-to-neural dynamic representational similarity analysis (dRSA) analysis between normal and inverted conditions. The principal component regression pipeline is validated through simulations showing that it recovers the model of interest while suppressing covarying models. In particular, the inversion result provides strong evidence that high-level predictions of biological motion depend on holistic priors while predictions at lower levels do not, and the finding that disruption at the top of the hierarchy does not propagate down is informative for predictive processing accounts that assume a more cascading architecture.

      Weaknesses:

      The interpretation of the scrambling result is the main caveat of the manuscript. The claim that low-level motion prediction depends on kinematic continuity rests on the absence of pixelwise motion prediction in the scrambled condition, but the 200 to 500-ms segments may not be sufficient for prediction to develop, as the authors also point out. Without a parametric manipulation of segment length, it is difficult to distinguish a genuine dependence on kinematic priors from a floor. The interpretation of increased post-stimulus posture representations as prediction errors is also somewhat indirect, since a positive latency does not rule out potential top-down modulation/factor.

    1. eLife Assessment

      This study presents a fundamental methodological advance that enables measurements of single-channel gating behavior of CRAC channels whose unitary currents are too small to be resolved electrically. By combining a channel-tethered calcium-sensitive dye (JF646-BAPTA) with voltage-clamp TIRF imaging, the authors discovered new kinetic behaviors of CRAC channels and further identified a dye-blinking artifact with implications that are of importance for optical single-channel studies. Although the work is convincing and the findings have biological relevance, some quantitative aspects of the study can be strengthened by additional analysis.

    2. Reviewer #1 (Public review):

      Summary:

      Dhillon and Lewis present an optical approach to record single CRAC channel activity, overcoming the long-standing barrier imposed by the channel's extremely small unitary conductance. By fusing HaloTag to Orai1, labeling with JF646-BAPTA, and combining TIRF microscopy with whole-cell voltage clamp (Patch-TIRF), the authors achieve genuine single-channel resolution. A central contribution is the recognition that JF646-BAPTA undergoes reversible photophysical blinking that can be readily mistaken for gating events. The authors exploit the multi-dye labeling of hexameric Orai1, combined with voltage-clamped definition of open and closed fluorescence levels, to distinguish true gating transitions from blinks. The result is the first kinetic characterization of single CRAC channel openings activated by STIM1, reporting multiple open and closed states with durations from about 0.1 s to tens of seconds, predominantly high open probabilities ({greater than or equal to} 0.7), and an unexpected population of "silent" channels that co-localize with STIM1 but show no detectable activity over the observation window.

      Strengths:

      The work is technically rigorous, and the controls are appropriate. The integration of patch-clamp voltage control with TIRF imaging is a thoughtful methodological choice that defines the open- and closed-channel fluorescence reference levels with precision, providing a quantitative framework that the field has lacked. The use of the non-conducting Orai1-E106A mutant as a specificity control (Figure 4C) is exactly the right experiment, and the demonstration that JF646-BAPTA signals require Ca²⁺ flux through Orai1 itself anchors the entire approach. The identification and characterization of JF646-BAPTA blinking (Figures 2 and 3) is a significant contribution in its own right. The authors show clearly that the dye exhibits long-lived dark states and that transitions to zero fluorescence, rather than to a finite calcium-free baseline, are diagnostic of blinking rather than channel closure. This caveat has immediate implications for the interpretation of recent work using the same dye on other calcium-permeable channels, and will recalibrate the broader field of HaloTag-based single-channel optical recording. The kinetic analysis itself reveals something that was previously inaccessible: seconds-long open times, multi-state gating behavior, and a population of channels that co-localize with STIM1 yet remain electrically silent. These findings are physiologically meaningful and would not have been detectable by macroscopic electrophysiology. Overall, an outstanding study.

      Weaknesses:

      The manuscript would benefit from a small number of additional analyses of the existing data and modest refinements to the presentation. The discrete-channel interpretation of the intensity histogram in Figure 6C, the open probability distribution in Figure 8C, and the assignment of the "silent" channel population are all interesting and likely correct, but each rests on assumptions that the authors are well positioned to test directly using data already in hand. Brief additional discussion of the dynamic range of JF646-BAPTA in situ and of how the temporal resolution of the recordings shapes the inferred kinetic model would also help readers calibrate the findings.

      None of these points challenges the central claims of the paper, and none requires new experiments.

    3. Reviewer #2 (Public review):

      Summary:

      Dhillon and Lewis use the enhanced brightness of the new calcium indicator dye JF646-BAPTA attached to Orai1-bound HaloTag to identify single CRAC channel events detected as [Ca2+]i fluctuations rather than currents. This enables them to detect Orai1single channel kinetics of permeation, overcoming the currently unmeasurable single channel CRAC conductances (~ 20-40 fS). TIRF microscopy narrows the z-section and improves calcium event localization.

      JF646-BAPTA reversibly blinks between fluorescent and non-fluorescent states, complicating single-channel detection. Blinking occurs both in permeabilized cells with saturating Ca2+ and in intact cells at physiological [Ca2+]i. Using voltage clamp and TIRF imaging, CRAC gating events were distinguished from blinking by analyzing fluorescence responses to voltage changes.

      Hyperpolarization (-100 mV) increases fluorescence, indicating channel opening. Responses blocked by La3+ confirm specificity for Orai1, while minimum fluorescence at +30 mV corresponds to closed channels. Dynamic range and response kinetics help differentiate genuine gating from blinking artifacts. Long channel openings (seconds to tens of seconds) are observed, with most open times around 1.2 seconds. Longer openings (tens of seconds) are present but difficult to sample. Silent channels constitute 11% of puncta.

      The paper carefully examines a new method to sample CRAC kinetics, which should enable further mechanistic studies of STIM control of ORAI and modulation by other signaling components such as calcineurin. Development of bright nonblinking dyes or dyes whose blink rates are directly correlated with a calcium-binding site will enhance this route of investigation.

      Comments:

      This is an excellent methodological study, rigorous and thorough. I wondered whether La3+ alone could alter JF646-BAPTA blinking, but the authors show that JF646-BAPTA exhibits reversible transitions to a non-fluorescent state (blinking) under both Ca2+-saturated and physiological conditions, independent of channel activity or the presence of La3+.

      Strengths:

      A novel method providing additional tools to study store-depletion induced Ca currents mediated by Stim-Orai family members.

      Weaknesses:

      Limited by blinking dyes, the only ones currently sensitive enough to measure the calcium fluxes through single channels.

    4. Reviewer #3 (Public review):

      Summary:

      Previous work from the Cahalan lab used fluorescent Genetically Encoded Ca2+ Indicators (GECI), like GCaMP6f, tethered to the N- or C- terminus of Orai1 to monitor CRAC channel optical signals (Dynes et al., PNAS 2016 PMID: 26712003; J Gen Physiol 2020 PMID: 32589186; PNAS 2023 PMID: 37729200). In this study from the Lewis lab, the HaloTag system enables C-terminal labeling of Orai1 with a reactive JF646-BAPTA loaded into cells. The article raises two key issues with the Ca2+ indicator probe that may limit potential applications: probe loading conditions and blinking.

      Making Sense of Probe Probe-lems:

      This is a three-component system: the hexameric Orai1 channel, the Halo tag, and the Ca2+ indicator (four components if you count the GFP- or mCherry-tagged STIM1 in the endoplasmic reticulum membrane that activates the plasma membrane Orai1 channel). The Orai1 channel, tagged with the Halo protein, appears to function normally, judging from the characteristic inwardly rectifying Ca2+ current first observed in T lymphocytes (Lewis and Cahalan, Cell Regulation 1989 PMID: 2519622). One problem is to find a condition for indicator dye loading that results in complete and uniform labeling with the covalently linked JF646 indicator. JF646-BAPTA is a far-red fluorescent indicator related to BAPTA, with a Kd of ~150 nM. The esterified form can be loaded into cells, as is routinely done for Ca2+ indicators like fura-2 or fluo-4. Ideally, to monitor local Ca2+ in the cytosolic nanodomain of the Orai1 channel, the indicator should react with each and every Halo tag of the hexameric channel. The authors assessed published methods by varying the exposure time to the JF646-BAPTA-esterified probe. The authors then used green JF552 labeling following red JF646-BAPTA loading to assess the completeness of labeling. Even overnight incubation of Halo-tagged cells was not sufficient. The addition of Pluronic treatment for 1 hr improved labeling, and a standard condition was adopted. Under this condition, no additional labeling with the green JF552 was seen, implying complete labeling with JF646-BAPTA. However, even with complete labeling, several additional effects might reduce the effective signal-to-noise, which is lower in these studies than expected from in vitro measurements - for example, if the JF646-BAPTA molecules are incompletely de-esterified, or if there is quenching between the closely spaced probes attached to the channel hexamer.

      A second, more serious problem analyzed by this article is that the JF646-BAPTA probe blinks on and off spontaneously, making it problematic to monitor true single-channel events in which the channel open state is assessed by the fluorescent probe. The authors distinguish blinking from channel-gating events by carefully noting the residual level of fluorescence in the absence of Ca2+ influx. Blinking events occur in bursts that reduce fluorescence transiently to zero, whereas the closed channel labeled with JF646-BAPTA retains a low level of fluorescence (~20%). To circumvent the blinking issue, the authors use whole-cell patch recording, in conjunction with optical recording (Patch-TIRF). This allows channel-gating events to be identified by step-wise changes in fluorescence due to Ca2+ entry upon hyperpolarization to -100 mV, above a baseline level of fluorescence at +30 mV, which the authors presume represents the closed channel level of fluorescence. Irreversible photobleaching is an additional issue, limiting the recording times to less than 1 minute.

      Visualizing Orai1 Single-Channels:

      With the blinking problem circumvented, at least in part, the authors uncovered a wide variety of single-channel events. Cells with low expression levels of Orai1 revealed 0-3 active Orai1 channels per STIM1 puncta. The range of gating behavior at the single-channel level is one of the revelations in this study. A substantial fraction (11%) of puncta contained "silent" channels that did not open (detected by the non-zero level of baseline fluorescence for closed channels). At the other extreme, some channels remained open for tens of seconds. On average, channels that opened and closed stochastically exhibited a bi-exponential distribution of bright states (open channels), with a major component of fast events (92 ms) and a minor component of slower ones (1190 ms), as well a single-exponential distribution of dark states (closed channels), and open probabilities >0.7. Channel open/closed times and the high open probability of active Orai1 channels seen here reinforce previous work based on analysis of CRAC current fluctuations in whole-cell recording, and optical single-channel recording using a different genetically encoded Ca2+ indicator, G-GECO1, tethered to Orai1 (Prakriya and Lewis, J Gen Physiol 2006 PMID: 16940559; Dynes et al., PNAS 2016 PMID: 26712003).

      Expression levels for single-channel optical recording must be low; accordingly, puncta contained only 0-3 active channels. However, under conditions of high STIM1 and Orai1 expression, conventionally used to investigate channel function, as in Figure 1, cells with large currents express many thousands of active channels. The number of active channels per cell can be calculated by dividing the peak current (~-100 pA) by the voltage (-100 mV); this corresponds to a whole-cell conductance (G) of ~1 nS (conductance is measured in Siemens). The single channel conductance (gamma, too low to detect electrically) is estimated by noise analysis to be 20-40 fS. Thus, the number of active channels is given by G / gamma corresponding to a range of > 25,000 - 50,000 open channels per cell. Under similar conditions of high STIM1/Orai1 co-expression in HEK cells, individual Orai1 channels were visualized at high density in puncta by freeze-fracture electron microscopy (Perni et al., PNAS 2015 PMID: 26351694), revealing puncta packed with Orai1 particles corresponding to hundreds to >1000 channels per punctum. Measuring the center-to-center distances between particles in puncta revealed two peaks in a distribution of inter-particle lengths: 9 nm (consistent with the approximate width of the Orai1 channel hexamer) and 15 nm (possibly due to two adjacent Orai1 channels held together by intervening STIM1 dimers).

      Strengths:

      The authors do an excellent job of analyzing and discussing probe artifacts that can confound measurements at the single-channel level. On the technical side, we thank the authors for including a photon 'budget' for their imaging experiments by including: the conversion factor from camera intensity units (c.u.) to photoelectrons, cell background fluorescence levels, and nominally Ca2+ free single channel fluorescence levels. One parameter missing from the list is the size of the region of interest used for channel recording. We expect the intensity measurements provided in the channel traces to correspond to mean ROI intensity levels. Upon knowing the ROI size in pixels, the magnitude of fluorescent signals could then be calculated in photons. Taken together, these values will aid comparisons to previous work and help guide subsequent researchers doing their own optical recording.

      The most important finding of this study is the ability to analyze single-channel properties of active Orai1 channels using the HaloTag approach. By direct measurement, the authors confirm previous work that there are at least two open states and that the CRAC channel open probability is greater than 0.7.

      Like any good study, this work suggests opportunities for further work. At the chemistry level, one focus should be the development of new probes that don't blink and have lower affinity for Ca2+ to circumvent unwanted responses to global Ca2+ signaling. Far-red probes like JF646-BAPTA have the advantage of reduced scattering for in vivo imaging applications. At the level of channel molecular function, the results pave the way for unraveling mechanisms of channel gating, such as the requirement for STIM1 binding to activate sub-states of Orai1, and how the channel undergoes Ca2+-dependent inactivation. At the cellular physiology level, localized Ca2+ probes should help to clarify mechanisms that couple to changes in gene expression and reveal Ca2+ signaling in subcellular structures, including dendritic spines. As a nice proof of principle, Halo-tagging enabled Ca2+ signals to be measured in primary cilia (Deo et al., J Am Chem Soc 2019 PMID: 31430138). Future users of HaloTag and GECI Ca2+ indicators will need to confront the issues (probe-lems) at the single-channel level that are carefully raised and analyzed in this article.

      Weaknesses:

      The major confounding issue identified here is probe blinking. The authors find a way to circumvent the issue, but not to prevent it. Is it triggered by high laser light intensity? Do the six JF646-BAPTA molecules tagging a single Orai1 channel exhibit quenching or correlated blinking?

      Which type of probe is better for understanding more about the CRAC channel function? It is difficult to evaluate the pros and cons of the HaloTag and GECI approaches without a side-by-side comparison under identical conditions (except for the probe, obviously). With respect to Ca2+ affinities, higher Kd values (lower affinity) are probably better. JF646-BAPTA has a relatively low Kd value (150 nm) compared to Orai1-GCaMP6f (620 nM in situ), which may account for the saturation of optical signals at potentials more negative than -75 mV in this study. In contrast, saturation did not occur at negative potentials with Orai1-GCaMP6f in the study by Dynes et al., 2020. Lower affinity also makes the probe more resistant to unwanted signals from global increases in Ca2+. With respect to response kinetics, the finding that JF646-BAPTA has faster Ca2+ binding and unbinding kinetics than GECIs in Deo et al., 2019, occurred before publication of the jGCaMP8 series indicators in Y. Zhang et al., Nature 2023. Kinetic measurement of Orai1-jGCaMP8f fusions was reported in Dynes et al., PNAS 2023, and these measurements were performed using the same patch-TIRF approach as the present manuscript. While photoinactivation of jGCaMP8f fused to Orai1 interfered with kinetic measurements, Orai1-jGCaMP8f V203Y (a mutant with greatly reduced photoinactivation) exhibited a tauon of 10 ms and tauoff of 15 ms, roughly twice as fast as the values reported for Orai1-HaloTag-JF646-BAPTA in the present manuscript. The manuscript text comparing Halo-Tag kinetics with GECI should be revised accordingly.

      The authors suggest that single-channel events reported previously for Piezo1 channels (Bertaccini et al., Nat Comm 2025 PMID: 40593468) may be due to probe blinking. However, that study included two critical controls that demonstrate that signals reflect bona fide channel activity rather than blinking artifacts. Notably: (1) treatment with channel activator Yoda1 increased bright-state occupancy (Figure 3C - 3G), and (2) increasing channel open probability by administering a mechanical stimulus increased bright-state occupancy (Supplementary Figure 13).

    1. eLife Assessment

      This study presents a valuable finding that perception of a material's properties and hardness during brief touches can be altered using only vibrotactile feedback. The user studies show that vibration energy can influence judgements of material hardness, but the evidence is incomplete to support the broader claim made by the authors that spectral energy is the dominant feature governing hardness perception.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript deals with the ability to identify material hardness from the vibrations induced by single light taps on that surface. Psychophysical tests of human perception under varying conditions of modified fingertip compliance and/or externally imposed vibrations demonstrated that total spectral energy was the main determinant of perceived hardness and that perception of increased hardness can be induced by adding external vibration at the time of contact.

      Strengths:

      The experiments are well-reported and the data potentially useful, but much narrower than is implied by the (provisional) title and abstract. Their potential application to tactile perception in virtual reality seems promising, but the largely unexplored need for synchronization with physical contact and modulation with velocity and force of that contact seems likely to complicate proposed applications to prosthetics and telerobots.

      Weaknesses:

      (1) The authors have confused discriminability with perception. The sense of touch is derived from several different types of mechanoreceptors and processed into several dimensions of haptic perception. The fact that subjects can rank surface material hardness correctly when requested to focus on that alone does not mean that they rely on total spectral energy normally or that total spectral energy is normally perceived as surface material hardness, as opposed to other aspects of materials, such as their surface texture. They have not considered the effects of more complex features of most surfaces, such as curvature, lamination or other exploratory movement strategies besides light taps.

      (2) Discussion section. Lines 262-264 are overstated. Dynamic spectral energy can be used to modify perceived hardness when exploratory movements are limited to taps that are unlikely to generate any other useful cues, such as skin deformation or proprioception. The authors have not explored what happens if there actually are conflicting cues in non-vibratory modalities. There are many different examples from sensory psychophysics of percepts that arise from taking the mean of conflicting cues (e.g. stereophonic sound localization) and others that arise from a dominant modality (e.g. self-motion perception from visual flow fields, vestibular signals and proprioception).

      The authors have ignored the substantial literature on artificial tactile sensors and their ability to identify texture, hardness and other haptic properties of materials. These have emphasized the importance of the many types and parameters of exploratory movements, which were loosely specified and not quantified in these studies.

      See:

      Li, Q., Kroemer, O., Su, Z., Veiga, F. F., Kaboli, M., & Ritter, H. J. (2020). A Review of Tactile Information: Perception and Action Through Touch. Ieee Transactions on Robotics, 36(6), 1619-1634. doi:10.1109/tro.2020.3003230.

      Fishel, J. A., & Loeb, G. E. (2012). Bayesian exploration for intelligent identification of textures. Frontiers in Neurorobotics, 6(4). doi:10.3389/fnbot.2012.00004

      Fishel, J. A., & Loeb, G. E. (2012). Sensing Tactile Microvibrations with the BioTac - Comparison with Human Sensitivity. Paper presented at the IEEE/RAS-EMBS International Conference on Biomedical Robotics and Biomechatronics, Rome.

      (3) Introduction (lines 23-31) and Discussion (lines 296-298). The notion that tactile receptors are "frequency tuned" is something of a straw man. Different receptor types are preferentially sensitive to different broad spectral bands, but it has long been known that they can be driven by larger stimuli outside those bands and that humans have very limited ability to discriminate actual frequency of tactile vibration (as opposed to auditory pitch), particularly for frequencies greater than the maximal one-to-one firing rate of neurons (~200-300 Hz). Conversely, fine onset timing of spikes in tactile afferents appears to be available from brief contact taps to identify features other than hardness; see:

      Johansson, R. S., & Flanagan, J. R. (2009). Coding and use of tactile signals from the fingertips in object manipulation tasks. Nature Reviews Neuroscience, 10, 345-359.

      Pruszynski, J. A., Flanagan, J. R., & Johansson, R. S. (2018). Fast and accurate edge orientation processing during object manipulation. eLife, 7, e31200.

      (4) Methods section. The Lofelt L5 actuator used to apply vibrations to the fingernail is rather large for use on multiple fingers of a haptic display. Do the authors know of any more compact technology with the requisite power and frequency response? One of the most useful contributions of this paper is to suggest that those details matter relatively little, which opens up more compact technologies such as piezoelectric actuators.

      (5) Methods section. It is good that headphones were used to block and mask audible tapping sounds, which are known to be capable of generating tactile illusions (Jousmäki, Veikko, and Riitta Hari. "Parchment-skin illusion: sound-biased touch." Current biology 8.6 (1998): R190-R191). But that suggests that hardness might be signalled by precisely timed acoustic stimuli, which would be much easier to deliver than fingertip vibration.

    3. Reviewer #2 (Public review):

      This paper aimed to demonstrate that total spectral energy alone is sufficient to drive hardness perception and material identification. Through five user studies, they tested materials ranging in stiffness and with covered fingers to support their claim. Using a spectral energy compensation framework, they concluded that total spectral energy alone, regardless of frequency content, was sufficient to support material hardness percepts. However, it should be noted that all experiments used a tapping procedure, which is not the standard exploratory procedure when judging material hardness. A tapping method also selectively enhances vibratory feedback while limiting others. This fundamentally limits the scope of their work, and assessing their claim on generalizability would require further experimentation.

      Some additional clarification and extension on the experiments are also suggested:

      (1) According to Lederman and Klatzky (1987), pressure, and not tapping, is the exploratory procedure humans use to judge hardness. And during tapping instead (as used in all experiments), it is expected that the dominant cue available to the user comes from vibrations, as other mechanical cues, such as skin stretch, are limited. These vibrations could serve as a proxy for hardness, as claimed by the authors, but it is unclear if the participants are basing their evaluations on perceived hardness or vibration intensity. A more fundamental question that needs to be answered to support the paper's claim is whether a single tap is sufficient for conveying a material's hardness. To better support their claim, I recommend that the authors include an experiment using participants' bare fingers with materials of the same modulus but different damping coefficients. These materials would produce different vibration signals when tapped, but are equivalent in hardness.

      (2) The setup text for experiment 4 does not match the results. Results suggest that a finger covered with a bubble and touching a soft material was used (i.e. dual compliance), but the setup describes otherwise. The authors should clarify this and confirm that this is different from experiment 2.

      (3) As silicone, foam, and rubber can have very similar or different hardness depending on the specific material used, please report the hardness of each material tested (Shore or Young's modulus) to better understand the range of stiffness tested.

      (4) In the "materials grouping and selection" section, it states that a pilot study suggested hard materials tended to be perceptually similar while softer materials were easily distinguishable. However, this contradicts the results in experiment 1. The authors should expand on the details of the pilot study and address the inconsistency between its findings and experiment 1.

      (5) The methods section suggests that individual recordings for each material were performed before the experiment. Please clarify if this is correct, or if a single signal for each texture was used across all participants. Additionally, were the participants' tap pressure controlled during either the recordings or in the experiments? If not, how do the authors account for the difference in intensity that would be generated due to different tapping pressures across participants and trials?

    1. eLife Assessment

      This important study developed a novel theory to account for various aspects of dopamine signals, particularly dopamine ramps. The authors propose that dopamine reward prediction error (RPE) signals are generated by a dual-process learning system in which values inferred by a model-based system enter the RPE asymmetrically into the update target but not the prediction. The results are well-presented and convincing, and make a contribution that is of importance to the field. This work will be of interest to those studying dopamine specifically or brain learning computations and systems more broadly.

    2. Reviewer #1 (Public review):

      Summary:

      This study develops a novel theory to account for various aspects of dopamine signals, particularly dopamine ramps. They propose that dopamine reward prediction error (RPE) signals are generated by a dual-process learning system in which values inferred by a model-based system enter the RPE asymmetrically into the update target but not the prediction (equation 6). The work offers specific, mechanistic explanations of Krausz et al. (2023) and Guru et al. (2020), Kim et al. (2020) by maintaining an RPE interpretation, and presents an alternative to the state-uncertainty account in Mikhael et al. (2022) that doesn't require the asymmetric uncertainty assumption Mikhael needs, using Campbell et al. (2025) in a thoughtful way. The asymmetric-RPE idea is clean and well presented. Overall, this study makes an important contribution to the field.

      Strengths:

      The theory is relatively simple and intuitive. It addresses a long-standing controversy or mystery in the field of dopamine.

      Weaknesses:

      (1) The biggest outstanding question is what V_TD does - letting V_MB drive everything would seem to produce much of the same outcomes in the settings discussed here. The discussion suggests that in situations where there is little contribution of the model-based system, the backpropagating bump is a feature (e.g. Amo et al.). It would be interesting to see if this is a true outcome of the model, potentially by varying the arbitration parameter k. This is an interesting alternative account from eligibility trace explanations of the lack of backpropagating bump in some experimental settings.

      (2) The model-based accounts are quite simplistic, and this should probably be acknowledged - it does help delineate their contribution, but in the model, only the goal-reward value is updated; everything else is a known computation. Perhaps engage more deeply with Sagiv et al?

      (3) The application of Campbell et al. (2025) to push back on Mikhael (lines 253-259) is interesting: if striatum to VTA implements TD via synaptic delays such that V(s_t) is a delayed copy of V(s_{t+1}), then state uncertainty is necessarily shared between the two terms in the RPE, defeating Mikhael's required asymmetry.

      But the same circuit logic creates tension for the dual-process model. It seems they are proposing that the frontal cortex projects V_MB into VTA dopamine neurons (as proposed in 3.1 and the Discussion) and adds to the prediction error derived from the biphasic filtering of value. But the biphasic idea (and data of Campbell et al.) implies that the V(t+1) and -V(t) come from the same source and are proportional. Adding the V_MB term is akin to adding a positive bias, breaking the optimality of the TD error for predicting value and predicting over-learning of cached value. It is worth considering whether V_MB passes through a similar filter - I am not sure if it is fatal if V_MB contributes somewhat to the negative term of the update error.

      (4) A few places where the predicate of the conclusion needs more care. The "normative" framing throughout 3.2 and the Discussion is normative conditional on the architecture already including a separate cached system that needs to converge to the true value function and on a system in which the model based is learnt much faster - see comments about learning rate parameter later.

      (5) Kim et al. is cited heavily as a data source for Figure 4, but is never engaged with as a theoretical alternative, even though Kim et al. explicitly argued that an appropriate state representation makes standard TD compatible with ramps and the teleport responses. That is, Kim et al. is already a TD account of these phenomena, and doesn't require a second learning system. The introduction and Mikhael discussion treat the field as if the choice were between "dopamine = value" (Hamid, Howe, Mohebi) and dopamine = RPE-with-special-conditions (Mikhael, Kato-Morita), but Kim et al.'s framework is also dopamine = RPE. Two specific places this matters: (i) Figure 4 currently demonstrates that the dual-process model reproduces the Kim teleport results, but Kim et al.'s framework also reproduces them - the figure doesn't distinguish the two, and I am not sure the figure gives this message cleanly. (ii) Kim et al. report that ramps develop with training over days; the manuscript should address whether the dual-process model has an alternative explanation for this, especially given the contrast with the Guru result (ramps diminishing with training over a longer timescale).

      (6) The arbitration parameter k is fixed at 0.5 throughout, and the paper acknowledges this is for simplicity, but a supplementary panel sweeping k ∈ {0, 0.2, 0.5, 0.8, 1.0} on the key figures (Figure 1B convergence, Figure 2D ramp dynamics, Figure 3D Krausz updating) would be informative. At k = 0, the model reduces to standard TD; at k = 1, it's effectively V_MB-driven. I think these would be easy to add and help clarify the work this assumption is doing.

      (7) Learning-rate asymmetry needs justification. The story relies on α_MB >> α_TD throughout (α_MB = 0.50, α_TD = 0.01 - a 50× ratio). With α_MB = 0.5, a single rewarded trial moves R[goal] halfway to the new value, which would predict strong dependence of dopamine ramp amplitude on the previous trial's outcome. This is testable in existing data (Krausz et al. should have enough trials to fit the exponential decay constant for trial-history dependence; Guru's swap-session data likewise), and the paper would be strengthened by explicitly deriving and checking that prediction.

      (8) α_MB is dropped to 0.10 specifically for the Krausz simulation without justification in the text - Why? Either the value should be the same as elsewhere, or the paper should explain why Krausz's task requires slower MB learning. It would be good to check the robustness of the Krausz simulation - the test phase is a single set of three trials (t-2 = omission, t-1 = reward, then t = 50% rewarded) after training on a single set of 500 simulated trials (believe only one random seed is used - given the high alpha, varying this set of simulated trials seems important). Also, do they get the other result in Krausz (t-2 = reward, t-1 = omission, t = 50% rewarded)?

      (9) It might be possible to fit the alpha to the Guru and Krausz simulations - this might be informative to show the range over which it varies.

      (10) The Kato and Morita account is cited in the introduction but never really discussed again - it would be good to engage with this a bit more in the discussion. The rejection of the value-based accounts seems to rely primarily on Kim et al., where the value and TDRPE accounts differ, but this could be directly acknowledged, rather than absorbing credit for this into their model.

    3. Reviewer #2 (Public review):

      Summary:

      This paper offers a novel theoretical account of dopamine ramps. The key idea is that the reward prediction error (putatively signaled by dopamine) uses a partially model-based estimate for future value (the prediction target). Because the model-based value estimate emerges more rapidly than the model-free estimate, it inflates the RPE, and this inflation increases with reward proximity - hence ramps. The authors show that this account can explain many aspects of existing data on dopamine ramps across several different studies.

      Strengths:

      Overall, I liked this paper. The idea is interesting and plausible. The paper is well-written and clearly argued. The modeling has been done rigorously.

      Weaknesses:

      My major comments are: (1) it's not always clear which phenomena are uniquely well-explained by this new account vs. earlier accounts; and (2) the limitations of the account are not entirely transparent.

      (1) The paper models some of the studies reported by Kim et al (2020). As was already shown in that paper, a standard TD error could explain the results (although a major limitation of that treatment was that it did not model the recursive effect of RPEs on learning, as discussed in the Mikhael paper). It's not clear if there's additional explanatory value provided by this new account, though, of course, it's good to know that those results are captured by the new account. Likewise, Mikhael et al (2022) already offered an account of their data (somewhat more complex than the standard TD model). Again, it's not clear if there's additional explanatory value provided by the new account (and again, it's nice to see that the model can capture these results). Finally, I found myself wondering whether the Guru et al (2020) result couldn't be explained by a more standard TD model (assuming the value function is sufficiently convex). I don't think it's essential that the new account provides additional explanatory value in every case, but I think it's important to convey to readers what's new and what's not, as well as what aspects of the data require particular kinds of mechanisms to explain. It would be really helpful to see the predictions of alternative TD models in order to make this clearer.

      (2) The Mikhael model was motivated by the puzzle that ramping is observed in navigation tasks (with sensory cues) but typically not in classical conditioning tasks lacking sensory cues. The correction term, derived from normative considerations, explained this discrepancy. It's not clear to me if/how the new account can explain the discrepancy.

    4. Reviewer #3 (Public review):

      Summary:

      This work presents a new hypothesis for why dopamine signals have sometimes been observed to "ramp up" in spatial tasks as rodents approach a location associated with reward. In essence, the hypothesis is that value estimates (i.e., predictions about future rewards) from a model-based system, which may be able to more quickly form such estimates via an inference-like process, can be used to speed up the (relatively slow) learning of such estimates by a model-free system. This is suggested to occur by including the model-based estimate as part of the target towards which model-free estimates are updated in the course of temporal-difference (TD) learning. The early discrepancy between these estimates can be expected to give rise to systematic TD errors - putatively represented in dopaminergic activity - that give rise to dopamine ramps, which are expected to diminish over time as the estimates of both systems converge. The authors show that a model that implements this idea makes predictions about dopamine activity that are a good qualitative match to data from a number of recent experimental studies.

      Strengths:

      The work suggests a normative account for a phenomenon that has persistently troubled the canonical theory of dopamine function. The account is appealing in its elegance and simplicity, and the authors present compelling evidence that it can capture the empirical observations of key recent papers. Another strength of the account is that it readily suggests avenues for future theory development and experimental test, including what the 'best' target estimate should be at any given time, how rapidly one might expect ramps to develop or diminish, and the neural implementation of the proposed algorithm. This is likely to stimulate further theoretical and experimental work in the field.

      Weaknesses:

      One aspect of dopamine "ramps" that was troubling from a theoretical standpoint was their apparent persistence over time. Given the authors' prediction that these would disappear over time in a stable environment and the supporting evidence they cite (from Guru et al., 2000), the reader might be left confused about the state of evidence about whether dopamine ramps persist or not. Perhaps relatedly, the issue of how the activity of dopamine cells and dopamine release are related is not discussed, which may be relevant given that early studies (e.g., Howe et al., 2013) used voltammetry to measure extracellular dopamine concentrations.

    1. eLife Assessment

      This important study advances methods for improved analyses of wide-field optical imaging of mice expressing the genetically encoded calcium indicator GCaMP6f in different neocortical layers through registering to layer-specific cortical atlases and deconvolution to account for depth-dependent light scattering. However, the key underlying assumption of the work, that widefield signals originate in somata, and not in their superficial axonal and dendritic compartments, remains untested. Similarly, other signal sources like intrinsic optical signals and hemodynamic occlusion are incompletely considered. This study is likely to be of interest to neuroscientists carrying out wide-field optical imaging of the mouse neocortex.

    2. Reviewer #1 (Public review):

      Summary:

      The authors develop alignment methods for layer-specific widefield calcium imaging in the mouse cortex. Under the assumption that the majority of the widefield signal originates at the level of the cell bodies, different cortical layers will appear at different locations in a top-down view as a function of the curvature of the mouse cortex. The authors develop software tools to correct for this, as well as depth-dependent source blurring. Finally, they apply these tools to investigate functional connectivity differences of different neuron types and find only subtle differences.

      Strengths:

      The work is technically strong, the experiments well executed, and the presentation clear.

      Weaknesses:

      One concern I have is that the central assumption underlying the rationale for the depth correction, namely that the source of the majority of the widefield signal is the cell body, may be incorrect. Layer 5 neurons have a dense axo-dendritic plexus very close to the surface of the cortex. Given the attenuation length of visible light in tissue, as well as our own measurements (https://elifesciences.org/articles/71476#fig6s1), I suspect that the majority of the widefield calcium signal originates in the superficial axo-dendritic plexus. The authors acknowledge this possibility, but there are a few simple measurements they could make to address this more directly. If indeed, as I suspect, the majority of the calcium signal originates in the first 50 um of tissue (even when imaging layer 5 neurons), the curvature correction is counterproductive, of course. The authors could test the effect of adding brain slices of varying thicknesses on top of e.g., a layer 2/3 widefield recording. If the authors are correct, and most of the signal is from cell bodies, this should, at most, attenuate the layer 2/3 recording to the level of a layer 5 recording. Anecdotally, while doing the measurements for the figure referenced above, we have done this experiment with a 100 um thick slice, and no quantifiable calcium responses remained.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript by Lorenzo and colleagues presents wide-field cortical imaging data obtained from experiments conducted with three triple-transgenic mouse lines that specifically express the calcium sensor GCaMP6f in neurons of layers 2/3, 5, and 6 of the neocortex, respectively.

      It first includes a methodological contribution aimed at optimizing the analysis of the acquired signals, taking into account both the geometry of the neocortex and photon scattering in the cortical tissue, which affect fluorescence signals differentially depending upon their cortical depth of origin.

      In particular, they built upon the work previously published in eLife by Waters in 2024, which, based on a simulation of photon scattering using a Monte Carlo random-walk model, provided an estimate of the tissue volumes contributing to the fluorescence signals measured from the surface in several mouse lines expressing Gcamp in a layer-specific manner.

      The authors here additionally performed empirical measurements of the point spread function at different cortical depths to determine spatial kernels to be used to deconvolve wide-field imaging data acquired from their three-layer-specific GCaMP6f-expressing mouse lines. They assess the added value of this deconvolution approach based on recordings of the cortical responses evoked by whisker stimulation in the barrel cortex, using lightly anesthetized, layer 2/3 and layer 5 GCaMP6f-expressing mice.

      Altogether, these proposed methods aim at optimizing the registration of recorded signals on a common reference frame, allowing to compare cortical spatiotemporal dynamics recorded from distinct layer-specific GCaMP-expressing mice.

      The manuscript further contains a more neurophysiological contribution, directly utilizing the proposed methods to perform a comparative layer-specific functional connectivity analysis from data collected with the 3 different mouse lines, while the mice were head-fixed below the macroscope.

      Strengths:

      Wide-field 1-photon functional optical imaging, which allows recording cortical spatiotemporal dynamics over a large portion of the dorsal neocortex in mice, has become a tool of choice to study how activity over a wide range of cortical areas is orchestrated in various behavioral contexts. The ever-increasing availability of transgenic mice exhibiting pan-cortical calcium- or voltage-dependent sensors within specific neuronal populations is generating a growing interest in these approaches among the neuroscientific community.

      Nowadays, it is possible to image specifically the activity of excitatory neurons whose cell bodies are located in given cortical layers. However, interpreting fluorescence signals recorded from the surface while originating from deep layers proves difficult due to photon scattering, which reduces image definition, as previously established by Waters et al. (2024).

      The ability to correct for this blurring effect and to place the recorded signals within a common frame of reference is therefore essential not only for comparing activity across layers but also for integrating findings across studies, thereby advancing our collective understanding of neocortical physiology.

      In this sense, this work by Lorenzo and colleagues is definitely both timely and valuable.

      Overall, the manuscript is clearly structured and well-written, and the figures are of excellent graphic quality.

      The proposed approach to correct the blurring of the fluorescent signals, which increases with depth, by means of empirical measurements of point spread functions and deconvolution, seems pertinent and efficient.

      Finally, the authors have collected evoked and spontaneous dynamics of calcium signals from 3 different layer-specific GCaMP mice, which in itself represents a substantial experimental effort, not least because of the need to generate the animals. Out of these data, they provide a unique comparative analysis of layer-specific functional connectivity.

      Weaknesses:

      To fully benefit a large community, some aspects of the proposed methodological advances need to be more detailed in the manuscript and potentially refined. For instance, it is very difficult to evaluate, given the tiny confocal images provided in Figure 1, the potential contribution of GCaMP signal from apical dendrites of layer V neurons in Rbp4-GCaMP6f mice. It is also difficult for the reader to assess the added value of the layer-specific reference maps, given that functional image registration relies on nonlinear transformations and limited detail is provided regarding the procedure used to realign the functional data with these maps (lines 465-467). It is not really clear how the illustrated "composite maps" and the "five functional spots" used for the registration are computed. In addition, one could question the choice of the large time windows used to generate these composite maps/functional landmarks. Since the early component of the evoked responses is more likely to reflect the location of the initial thalamocortical inputs, restricting the analysis to the early phase of the responses might improve the accuracy of primary cortical area identification. This concern regarding the time window used to define specific cortical representation areas may also be relevant to Figure 4, which illustrates the results of the proposed deconvolution approach used to correct for photon scattering (although the time windows used for these analyses are not specified).

      With regard to Figure 4, the reader might wonder why the results are not illustrated similarly for the layer 6 mice. It would therefore be useful to clearly indicate whether these data are not shown because they were not collected, or because it proved impossible to identify single whisker representations, despite the proposed deconvolution procedure.

      Regarding the analysis of layer specificity in terms of functional connectivity, the authors extensively use the term "resting-state" to describe the behavioral context of data collection, given that the animals were not engaged in a goal-directed task. However, because the mice were experiencing head fixation beneath a functional epifluorescence macroscope for only the second time, it is questionable whether this state can truly be classified as "resting." As indicated by the global quantification of body movements, the animals most likely alternated between quiet wakefulness and more active phases.

      To allow the reader to accurately interpret the reported functional connectivity differences, the authors should at least provide a quantification of the time animals spent in the quiet versus active states, and assess whether these proportions were comparable between the different mouse lines. Another way to address this issue would be to perform functional connectivity analyses after splitting the data according to these two states based on body movement quantification, although it is difficult to assess the feasibility of this approach without knowing the temporal distribution of these states within the dataset.

      This seems particularly important since differences in neural cross-regional correlation patterns have been linked to arousal levels, with a comparable optical imaging approach, by Shahsavarani and colleagues (Cell Reports, 2023), who compared initial and prolonged resting periods. In addition, the authors report here that layer differences in functional connectivity are more pronounced in regions associated with the default mode network, whose activity is likely to differ between quiet and active wakefulness.

      Finally, given the richness of the dataset, it would be very interesting to assess how the proposed deconvolution approach affects PCA-ICA-based functional parcellation of spontaneous cortical activity (Reidl et al., NeuroImage, 2007; Makino et al., Neuron, 2017) and whether it enables cross-layer comparisons of independent cortical modules. Such supplementary analyses would substantially increase the impact of this work.

    4. Reviewer #3 (Public review):

      This paper provides valuable technical and theoretical validation of layer-specific wide-field imaging. Here, the authors use specific transgenic lines that provide layer-specific cell body expression (and some superficial dendrites). They then use deconvolution approaches and potentially more accurate atlases based on depth-dependent features to register and resolve what are layer-specific functional GCaMP signals.

      In general, the work is extremely well done, and I have little specific criticism. I think the author should be commended for their creative solutions, including using the light source at different depths to measure apparent scattering and blurring, allowing them to incorporate the deconvolution approach.

      Throughout the manuscript, they refer to the signals as layer-specific and, for the most part, conclude similar functional connectivity as in different layers with some noted exceptions. This is an outstanding resource for the community.

      Major Comment:

      I think they should add some caveats that the lines that they employ do contain dendrites that are in more superficial cortices. Could they make some estimates of signal contribution from these, say, layer 6 neuron superficial dendrites versus the deep somata? This clarification should be included in the abstract; maybe they could call these apparent somatic signals? Another way of doing this would be a Soma-targeted deep indicator, but this is probably beyond the scope of the paper.

      Alternatively, how much of the layer 5 signal would be expected to be recovered?

    1. eLife Assessment

      This study characterizes the heterogeneity and developmental origins of macrophages in the thymus and offers tantalizing evidence of their potential involvement in the first step of T cell selection. The macrophage characterisation is interesting, although the evidence for the specific involvement of macrophages in beta-selection is incomplete, as alternative explanations have not been ruled out. These results provide an important advance that further our understanding of thymus biology, especially in view of the contribution of heterogenous thymic macrophage subpopulations.

    2. Reviewer #1 (Public review):

      Summary:

      The current manuscript characterizes in detail the macrophages in the thymus. The authors identify two distinct populations of thymic macrophages and describe their surface marker expression and transcriptional signatures. They also explore their ontology and kinetics of settling and persistence in the thymus and find that the TIMD4+ macrophages are derived from embryonic progenitors and self-maintain in the thymus, while the TIMD4- macrophages are derived from monocytes. Most importantly, the authors test the functional importance of thymic macrophages for T cell development using an in vitro depletion system, from which they conclude that macrophages are important for one of the earliest selection steps in T cell development - the beta selection.

      Strengths:

      The authors use state-of-the-art techniques, such as multiple genetically modified mice, multi-color flow cytometry, single-cell RNA sequencing, genetic fate mapping, and fetal thymic organ culture (FTOC) combined with depletion. Their work is in good agreement with prior published studies on the subject, such as Tacke et al. (PMID: 26091486) and Zhou et al. (PMID: 36449334). In addition to reproducing prior knowledge, the authors uncover novel and unexpected facets of thymic macrophage biology, such as their SpiC independence and the fact that TIMD4- thymic macrophages depend on CCR2 (Tacke et al. have shown that the overall thymic macrophage compartment is normal in CCR2-/- mice). Most surprisingly, the authors claim that thymic macrophages control an early checkpoint in T cell development, the beta selection. This has not been reported before, as beta selection is usually considered a cell-autonomous process in thymocytes that does not require input from other cells.

      Weaknesses:

      The thymic macrophage depletion experiments are not well controlled, and the authors' interpretation of the results is a stretch. First, the treatment depletes other cell types, most notably dendritic cells (DCs), which have well-known roles in thymic selection (though not specifically in beta selection). The authors' reasoning that macrophages are abundant in the cortex, where beta selection occurs, while DCs are enriched in the medulla, seems questionable, as the embryonic thymus typically lacks (or has very small) medulla. A second salient point is that the authors haven't ruled out direct toxicity of the dimerizer drug AP20187 on thymocytes (specifically DN cells) in MAFIA mice.

      Altogether, this is a solid manuscript that largely confirms the previously established ontogeny and heterogeneity of thymic macrophages. However, the participation of thymic macrophages in beta selection needs stronger evidence.

    3. Reviewer #2 (Public review):

      This manuscript from Zuniga-Pflucker laboratory describes that thymic macrophages are heterogeneous in flow cytometric and transcriptomic profiles, containing two major populations characterized by TIMD4 and CX3CR1 expression. These macrophage populations are both parenchymal in the thymus but are unequal in developmental ontogeny, Flt3 expression history, and CCR2 dependency. The manuscript further reports the interesting findings that the depletion of thymic macrophages impairs thymocyte development at the DN3 beta-selection checkpoint. These results provide an important advance for further understanding of thymus biology, especially in view of the contribution of heterogenous thymic macrophage subpopulations.

      However, Zhou et al. previously reported essentially similar heterogeneity in thymic macrophages. It was demonstrated that TIMD4+ macrophages and CX3CR1+ macrophages have distinct origins and are different in developmental characteristics (27). The authors should better clarify what was previously demonstrated and what is newly described in this study. Zhou, et al. also demonstrated that TIMD4+ macrophages are localized in the cortex whereas CX3CR1+ macrophages distribute in the medullary region. Whether or not these previous findings are reproduced and supported in the present study is important in view of the new finding that thymic macrophages are important for beta-selection, which is presumed to occur in the thymic cortex. The authors may be able to suggest more strongly that TIMD4+ macrophages regulate beta-selection in the thymic cortex through phagocytic efferocytosis. (Indeed, the Figure 1 legend states that frozen thymic sections were used for immunofluorescent staining to identify the localization of thymic macrophages, without showing the results.)

    1. eLife Assessment

      On the basis of convincing computational, biophysical, and cell-based evidence, this study reports the important finding that the dynamin inhibitor Dyngo-4a broadly affects lipid packing and plasma membrane dynamics, independently of its action on dynamin. The evidence, obtained by a wide range of methods including a newly developed assay visualizing internalized caveolae, provides solid support for the authors' main claim on the role of lipid packing in caveolae internalization. This work will be of significant interest to cell biologists, biophysicists, and chemists interested in membrane remodeling and drug-membrane interactions.

      [Editors' note: this paper was reviewed by Review Commons.]

    2. Reviewer #1 (Public review):

      Summary:

      The authors use Dyngo-4a, a known Dynamin inhibitor to test its influence on caveolar assembly and surface mobility. They investigate whether it incorporates into membranes with Quartz-Crystal Microbalance, they investigate how it is organized in membranes using simulations. Finally, they use lipid-packing sensitive dyes to investigate lipid packing in the presence of Dyngo-4a, membrane stiffness using AFM and membrane undulation using fluorescence microscopy. They also use a measure they call "caveola duration time" to claim that something happens to caveolae after Dyngo-4a addition and using this parameter, they do indeed see an increase in it in response to Dyngo-4a, which is reduced back to the baseline after addition of cholesterol.

      Overall, the authors claim: 1) Dyngo-4a inserts into the membrane and this 2) results in "a dramatic dynamin-independent inhibition of caveola scission". 3) Dyngo-4a was inserted and positioned at the level of cholesterol in the bilayer and 4) Dyngo-4a-treatment resulted in decreased lipid packing in the outer leaflet of the plasma membrane 5) but Dyngo-4a did not affect caveola morphology, caveolae-associated proteins, or the overall membrane stiffness 6) acute addition of cholesterol counteracts the block in caveola scission caused by Dyngo-4a.

      Overall, in this reviewers opinion, after the additional experiments in the review process, all claims are now well-supported by the presented data from electron and live cell microscopy, QCM-D and AFM.

      Significance:

      A number of small molecule inhibitors for the GTPase dynamics exist, that are commonly used tools in the investigation of endocytosis. This goes as far that the use of some of these inhibitors alone is considered in some publications as sufficient to declare a process to be dynamin-dependent. However, this is not always correct, as there are considerable off-target effects, including the inhibition of caveolar internalization by a dynamin-independent mechanism. This is important, as for example the influence of dynamin small molecule inhibitors on chemotherapy resistance is currently investigated (see for example Tremblay et al., Nature Communications, 2020).

      The investigation of the true effect of small molecules discovered as and used as specific inhibitors and their offside effects is extremely important and this reviewer applauds the effort. It is important that inhibitors are not used alone, but other means of targeting a mechanism are exploited as well in functional studies. The audience here thus is besides membrane biophysicists interested in the immediate effect of the small molecule Dyngo-4a also cell biologists and everyone using dynamic inhibitors to investigate cellular function.

      Comments on revised version.

      Overall, in this reviewer's opinion, after the additional experiments in the review process, all claims are now well-supported by the presented data from electron and live cell microscopy, QCM-D and AFM.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors probe the mechanisms by which Dyngo-4a, a dynamin inhibitor used to block endocytosis, impact caveolae dynamics. They provide compelling evidence that Dyngo-4a inhibits caveolae dynamics and endocytosis (as well as several other aspects of plasma membrane dynamics) by a dynamin-independent mechanism. They also provide strong computational and experimental data showing that Dyngo-4a inserts into membranes and decreases lipid packing in the outer leaflet of the plasma membrane. Finally, they demonstrate that the addition of excess cholesterol to cells reverses the effects of Dyngo-4a on caveolae dynamics, presumably by reversing lipid packing defects. Based on these findings they conclude that lipid packing regulates caveolae dynamics and endocytosis in a cholesterol-dependent manner.

      This work should be of value to cell biologists interested in plasma membrane remodeling and membrane trafficking, biophysicists that study small molecule/membrane interactions and membrane remodeling processes, and chemists interested in designing drugs to target membrane trafficking machinery and pathways.

      Strengths and weaknesses:

      This work addresses the important topic of how a widely used endocytic inhibitor actually works. In the process of addressing this question, the authors uncover unexpected connections between how lipids are packed in cell membranes and membrane dynamics. The methods are appropriate and many of the claims made in this work are well supported by data.

      The authors have also been responsive to comments raised during review by including additional experimental evidence that Dyngo-4a inhibits caveolae endocytosis as well as documenting the effects of Dyngo-4a on caveolae morphology.

      The work also raises some interesting questions for the future. As one example, the authors note that in addition to inhibiting caveolar dynamics, Dyngo-4a inhibits generalized plasma membrane mobility, transferrin uptake, and fusion of fusogenic liposomes to the plasma membrane. More work will be required to determine whether these events are mediated by a common, lipid packing-dependent mechanism.

    4. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment:

      This study reports the important finding that the dynamin inhibitor Dyngo-4a broadly affects lipid packing and plasma membrane dynamics, independently of its action on dynamin. While solid computational, biophysical, and cell-based evidence supports this conclusion, there is incomplete support for the authors' main claim on the role of lipid packing in caveolae internalization, as the causal relationship remains unclear and direct analyses are lacking. With stronger evidence, this work would be of significant interest to cell biologists, biophysicists, and chemists interested in membrane remodeling and drug-membrane interactions.

      We are thankful for the very positive feedback and enthusiasm for our work and sincerely thank all the reviewers for their time, their constructive criticism and valuable comments. Based on this, we have revised our manuscript as detailed below in the point-by-point response where the responses to reviewers’ comments are indicated in blue font. Text edits in the revised manuscript are indicated in red font.

      We agree that providing sufficient evidence for inhibition of caveolae endocytosis by Dyngo-4a is critical and have therefore worked hard on identifying suitable assays that enable conclusive experiments as described below. We have now added a new figure with data that we think firmly supports our statement that caveolae internalization is restricted by Dyngo-4a. Additionally, EM images and quantifications of caveola morphology with or without treatment has been added within the same figure. Taken together, we believe that we have provided strong data to support this main claim and challenged this hypothesis as far as current methodology allows. Therefore, we hope that the revised manuscript warrants a new eLife assessment and we would like this to be the version of accord for the publication in eLife.

      Point-by-point response to reviewers comments

      Reviewer #1 (Public review):

      The authors use Dyngo-4a, a known Dynamin inhibitor to test its influence on caveolar assembly and surface mobility. They investigate whether it incorporates into membranes with Quartz-Crystal Microbalance, they investigate how it is organized in membranes using simulations. Finally, they use lipid-packing sensitive dyes to investigate lipid packing in the presence of Dyngo-4a, membrane stiffness using AFM and membrane undulation using fluorescence microscopy. They also use a measure they call "caveola duration time" to claim that something happens to caveolae after Dyngo-4a addition and using this parameter, they do indeed see an increase in it in response to Dyngo-4a, which is reduced back to the baseline after addition of cholesterol. 

      Overall, the authors claim: 1) Dyngo-4a inserts into the membrane and this 2) results in "a dramatic dynamin-independent inhibition of caveola scission". 3) Dyngo-4a was inserted and positioned at the level of cholesterol in the bilayer and 4) Dyngo-4a-treatment resulted in decreased lipid packing in the outer leaflet of the plasma membrane 5) but Dyngo-4a did not affect caveola morphology, caveolae-associated proteins, or the overall membrane stiffness 6) acute addition of cholesterol counteracts the block in caveola scission caused by Dyngo-4a. 

      Overall, in this reviewers opinion, claims 1, 3, 4, 5 are well-supported by the presented data from electron and live cell microscopy, QCM-D and AFM.

      We thank the reviewer for these positive and encouraging words and believe that the new experiments added to the manuscript has provided strong evidence that caveola internalization is greatly inhibited by Dyngo-4a (see below).

      However, there is no convincing assay for caveolar endocytosis presented besides the "caveola duration" which although unclearly described seems to be the time it takes in imaging until a caveolae is not picked up by the tracking software anymore in TIRF microscopy. Since the main claim of the paper is a mechanism of caveolar endocytosis being blocked by Dyngo-4a, a true caveolar internalization assay is required to make this claim. This means either the intracellular detection of not surface connected caveolar cargo or the quantification of caveolar movement from TIRF into epifluorescence detection in the fluorescence microscope. Otherwise, the authors could remove the claim and just claim that caveolar mobility is influenced.

      We thank the reviewer and agree that this is a very important point to verify. Therefore, we have worked hard to quantify the endocytosis of caveolae in thin sections of MEF cells using transmission electron microscopy. By incubating cells with externally added HRP for two-minutes followed by washing, vesicles internalized during this period can be contrasted and distinguished from surface associated vesicles. Sections were quantified by counting both surface-associated and internalized caveolae and CCVs (see figure below). Surface associated caveolae and CCVs can be distinguished based on size and shape for CCV the presence of a coat, but the number of vesicles per image is very low because a cross section has to go right through the vesicle. Furthermore, although internalized caveolae and CCVs can be differentiated by size, it is much harder to separate these from other vesicles, tubules and tubular endosomes positive for HRP.  We detect an approximate 50% reduction in internalized caveolae and CCVs (ie. containing the internalized marker) in Dyngo-4a cells, which confirms that internalization is impaired following Dyngo-4a treatment. Yet, CCV endocytosis was simultaneously confirmed by Tfn uptake assay to be reduced by a greater extent, approximately 95%. We believe that this discrepancy in numbers is due to the low frequency of counted vesicles per section and the difficulties in distinguishing different internalized vesicles and endosomal tubules making a robust quantification of endocytic events difficult. It is also important to note that the EM assay relies on structural criteria to identify only the budded CCVs and caveolae containing the internalized marker, in transit to the early endosome. Other labeled structures are excluded. In contrast, uptake of Tfn into endosomes would also be measured by the light microscopy assay. Therefore, we have chosen not to include these data in the revised manuscript.

      Author response image 1

      Instead, we have developed a new assay in which we can quantify internalization in whole cells and clearly separate internalized caveolae from those that are surface associated or have fused with endosomal structures. For this we use the HeLa FlpIn Cav1-GFP cells which are induced to express Cav1-GFP at endogenous levels to label caveolae. The cells are incubated for five minutes with fluorescent CTxB known to be internalized by caveolae (but also via other mechanisms). To be able to separate internalized caveolae from early endosomes, cells were fixed and labelled with antibodies against the marker EEA1.  Cells were analyzed by fluorescence microscopy and confocal z-stacks of entire cells were recorded. The data was analyzed by software to identify only the caveolae that were positive for CTxB but negative for EEA1. The results from quantification showed a very clear inhibition in the number of internalized caveolae in Dyngo-4a treated cells in comparison to control cells. These data have been included in the manuscript as an important new figure 2 together with TEM data where we quantify the morphology of surface associated caveolae with or without Dyngo-4a treatment. We have also extensively edited the text in the results section to describe these new data and to convey that Dyngo-4a indeed affects internalization. We are very happy to have established means to address this important point by extending the current methodology and tools. Together with the TIRF data and FRAP data we believe that we have provided strong data for this claim and challenged our hypothesis as far as current methodology allows.

      Significance: 

      A number of small molecule inhibitors for the GTPase dynamics exist, that are commonly used tools in the investigation of endocytosis. This goes as far that the use of some of these inhibitors alone is considered in some publications as sufficient to declare a process to be dynamin-dependent. However, this is not correct, as there are considerable off-target effects, including the inhibition of caveolar internalization by a dynamin-independent mechanism. This is important, as for example the influence of dynamin small molecule inhibitors on chemotherapy resistance is currently investigated (see for example Tremblay et al., Nature Communications, 2020). The investigation of the true effect of small molecules discovered as and used as specific inhibitors and their offside effects is extremely important and this reviewer applauds the effort. It is important that inhibitors are not used alone, but other means of targeting a mechanism are exploited as well in functional studies. The audience here thus is besides membrane biophysicists interested in the immediate effect of the small molecule Dyngo-4a also cell biologists and everyone using dynamic inhibitors to investigate cellular function. 

      Thank you for the comments. We very much appreciate the interest and enthusiasm of the reviewer for our work. This has inspired and supported us to perform additional work for the revision of our manuscript.

      Reviewer #2 (Public review): 

      In this manuscript, the authors probe the mechanisms by which Dyngo-4a, a dynamin inhibitor used to block endocytosis, disrupts caveolae dynamics. They provide compelling evidence that Dyngo-4a inhibits caveolae dynamics and endocytosis (as well as several other aspects of plasma membrane dynamics) by a dynamin-independent mechanism. They also provide strong computational and experimental data showing that Dyngo-4a inserts into membranes and decreases lipid packing in the outer leaflet of the plasma membrane. Finally, they demonstrate that the addition of excess cholesterol to cells reverses the effects of Dyngo-4a on caveolae dynamics, presumably by reversing lipid packing defects. Based on these findings they conclude that lipid packing regulates caveolae dynamics and endocytosis in a cholesterol-dependent manner. 

      This work should be of value to cell biologists interested in plasma membrane remodeling and membrane trafficking, biophysicists that study small molecule/membrane interactions and membrane remodeling processes, and chemists interested in designing drugs to target membrane trafficking machinery and pathways. 

      This work addresses the important topic of how a widely used endocytic inhibitor actually works. In the process of addressing this question, the authors uncover unexpected connections between how lipids are packed in cell membranes and membrane dynamics. The methods are appropriate and many of the claims made in this work are well supported by data.

      We very much appreciate the thorough review and very positive feedback constructive critique and thank the reviewer for the time spent on our manuscript.

      Weaknesses: 

      I appreciate that the manuscript has already gone through one round of revisions and that many of the concerns from the previous reviewers appear to have been addressed. However, as an interested reader, I would like to offer several additional comments for the authors to consider. 

      (1) It is not clear based on the data presented whether the effects of Dyngo-4a on lipid packing give rise to defects in caveolae dynamics or if these effects are merely correlated. To show this more definitively, one might expect additional experimental approaches to be used to perturb lipid packing. I appreciate this is probably beyond the scope of the current study. However, it seems important for the manuscript to be clear about how far this interpretation can be pushed in the absence of additional independent lines of evidence.

      We are very proud of the direct experimental support of the effect on lipid packing that we have performed using incorporation of extra cholesterol to the membrane which supports these effects are not merely correlated. Unfortunately, specifically perturbing lipid packing in other ways and conclusively interpreting such data is not uncomplicated. We agree that data and conclusions should be further challenged but we believe that this goes beyond the scope of this manuscript.

      (2) On a related note, it is not obvious how changes in lipid packing in the outer leaflet could impact caveolae dynamics. It would be helpful to include a cartoon illustrating how this might work.

      Thank you for pointing out this important aspect. We have elaborated on this within the discussion and referred to our recently published perspective article in Nature Cell Biology ('A lipid-centric view of endocytosis by caveolae' Parton, Kozlov and Lundmark DOI: 10.1038/s41556-026-01945-5) where this topic is extensively discussed. In short, insertion of the 8S disc in the inner leaflet of the PM replaces approximately 250 lipids and spans the entire thickness of the leaflet. The insertion of the flat, hydrophobic phase of the 8S disc, that faces the outer leaflet, results in a differential contact energy favoring the uneven packing of lipids and preferred accumulation of cholesterol in the PM of mammalian cells. Increased cholesterol content in the PM leads to more tilt and splay and hence curvature generation and, if not constrained by EHD2, scission. Thus, the distinct lipid packing of cholesterol and sphingomyelin opposite the Cav1 complex is key to drive curvature generation and internalization of caveolae.

      We agree that a schematic figure could be nice to illustrate how packing affects caveolae internalization. However, we realized that providing a comprehensible concept this would require an extensive figure with vast discussions in the text. Therefore, we have chosen not to include this here, but refer to the figures in Parton et al. Nature Cell Biology DOI: 10.1038/s41556-026-01945-5

      (3) The authors note that Dyngo-4a inhibits several dynamic processes including generalized plasma membrane mobility (Fig 4A&B), transferrin uptake (Fig S4C), and fusion of fusogenic liposomes (Fig S4G). This clearly indicates there is a major disruption of the plasma membrane going on here that is not limited to caveolae. They go on to show that the addition of cholesterol reverses the effects of Dyngo-4a on caveolae dynamics. However, they do not discuss whether adding back cholesterol has similar effects on plasma membrane mobility and transferrin uptake. This information could help to further pinpoint whether the mechanisms of action are shared, and if the role of cholesterol is more general in controlling these events or is instead specific to caveolae. 

      Yes, this is correct, and we agree that this important finding leads to many follow up questions on the mechanism of action of Dyngo-4a on cellular processes. Yet, to dissect the mechanism for all these processes goes way beyond the scope and our resources for this manuscript.

      (4) In Fig 4C, the morphology of the neck region of the Dyngo-4a treated caveolae structure appears to be "pinched" compared to the control. I appreciate that more EM studies are underway. It would be useful to specifically compare the morphology of the caveolae as part of those studies.

      Thanks, this is a relevant and interesting question. In the revised manuscript, we have therefore performed and included extra quantitative EM data addressing the morphology of caveolae. Based on this we conclude that there is no statistically significant difference in the height, width or neck diameter of caveolae treated with Dyngo-4a in comparison to control cells. When analyzing the ratio of height, width and neck diameter of each caveolae, there is a trend in that neck diameter is increased in Dyngo-4a-treated cells. These data have been included in the new figure 2 A-B and discussed in the text.

      (5) In Line 91, a statement is made that 8S complex formation requires cholesterol. This is debatable, as they appear to form in E. coli in the absence of cholesterol (reference 14).

      Thank you, we have clarified that this statement is referring to mammalian cells.

      Some minor spelling errors include: 

      Line 66 generrating

      Line 182 signigicantly 

      Line 197 treatmend 

      Line 347 succefully 

      These errors have been corrected

    1. eLife Assessment

      This study presents analyses of single neuron activity in the subthalamic nucleus (STN) of monkeys performing a decision-making task that manipulates both perceptual evidence and reward. The study shows convincing evidence of distinct subpopulations of neurons in STN that differ in their representations of key quantities related to decision formation. These findings reveal important functional heterogeneity within the STN that helps provide new insights into its contributions to decision processing.

    2. Reviewer #1 (Public review):

      [Editors' note: this version has been assessed by the Reviewing Editor without further input from the original reviewers. The authors have addressed the comments raised in the previous round of review.]

      Summary:

      This manuscript offers a careful and technically impressive dissection of how subpopulations within the subthalamic nucleus (STN) support reward-biased perceptual decision-making. The authors recorded STN neurons in monkeys performing an asymmetric-reward visual motion discrimination task, then combined single-unit analyses, regression modeling, and drift-diffusion model (DDM) fitting to identify functionally distinct neuronal clusters. Each subpopulation shows unique relationships to computational decision variables - evidence accumulation rate, decision bound, and non-decision time - as well as to post-decision evaluative signals including choice accuracy and reward expectation. The revised manuscript substantially strengthens the original submission by improving both the objectivity of neuron selection and the robustness of the clustering solution.

      Strengths:

      The asymmetric-reward paradigm cleanly separates perceptual and motivational contributions to STN activity, allowing the authors to characterize how neurons blend these distinct sources of information. The dataset is extensive and well-controlled, and the behavioral and neural analyses are tightly integrated. Relating cluster-specific activity to DDM parameters provides an interpretable computational link between population signals and behavior. The clustering solution is now validated across two algorithms, two monkeys, and subsets of trials - establishing that the three-cluster structure is robust. The new Figure 9 offers a conceptually useful, if necessarily speculative, synthesis connecting the identified subpopulations to distinct basal-ganglia pathways (hyperdirect versus indirect). The new Figure 8 documenting the anatomical intermingling of subpopulations is also important, as it directly informs the interpretation of prior and future STN stimulation studies.

      Weaknesses:

      The inferred relationships between neural clusters and DDM parameters remain correlational - the authors now appropriately flag this throughout, and the causal inference gap is acknowledged in the Discussion with concrete proposals for future targeted perturbation strategies. While a generative multi-cluster model would further strengthen mechanistic interpretation, the conceptual framework in Figure 9 provides a reasonable intermediate step given the scope of the study and the absence of simultaneous population recordings, which preclude direct inter-cluster covariation analyses. These remaining limitations are inherent to the experimental design rather than analytical oversights.

      Comments on the previous version:

      The authors have responded thoroughly and constructively to all of my concerns. The revised clustering pipeline - incorporating finer temporal resolution, objective neuron selection, outlier removal, a second clustering algorithm, cross-monkey validation (Rand indices of 0.94 and 1.0 for the two monkeys), and trial-subset stability analysis - substantially increases confidence in the three-cluster solution. The correlational nature of the DDM-activity relationships is now clearly stated, and the Discussion appropriately contextualizes the causal inference gap while suggesting feasible future directions. The new Figure 9 provides the conceptual synthesis I had hoped for, within the realistic scope of the present study. I am satisfied with the authors' responses and have no further requests.

    3. Reviewer #2 (Public review):

      This study uses monkey single-unit recordings to examine the role of the STN in combining noisy sensory information with reward bias during decision-making between saccade directions. Using multiple linear regressions and clustering approaches, the authors overall show that a highly heterogeneous activity in the STN reflects almost all aspects of the task, including choice direction, stimulus coherence, reward context and expectation, choice evaluation, and their interactions. The authors report in particular how three classes of neurons map to different decision processes evaluated via the fitting of a drift-diffusion model. Overall, the study provides evidence for functionally diverse and anatomically intermingled populations of STN neurons, supporting multiple roles in perceptual and reward-based decision-making.

      This study follows up on work conducted in previous years by the same team and complements it. Extracellular recordings in monkeys trained to perform a complex decision-making task remain a remarkable achievement, particularly in brain structures that are difficult to target, such as the sub-thalamic nucleus. The authors conducted numerous analyses of STN activities, using sophisticated statistical approaches and functional computational modeling.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #2 (Public review):

      One criticism that I would still make in the revised version of the paper concerns the description of the behavior of the two monkeys which is still minimal, while acknowledging differences in their choice and RT performance that reflect "individual differences in sensitivity to motion stimulus and a common heuristic-based satisficing strategy". This sentence is not clear to me. Moreover, the potential consequences of these differences on neuronal activity are only considered in the cluster analysis done for each of the two animals separately and for which it turns out there is no notable difference.

      We have revised the text to emphasize the key, common feature of their behavior and refer readers interested in variability across sessions and individuals to our previous study: “Both monkeys showed consistent biases toward the large-reward choice (Figure 1B, C). Details of their performance, including variations across sessions and individuals, have been reported in a previous study (Fan et al., 2018).”

      Given that both monkeys’ choices and RT showed clear and consistent coherence and reward dependencies, and that the clustering analysis were consistent across the two monkeys, we believe that our analyses presented here are appropriate. Future work is needed to examine if and how STN contributes to more nuanced aspects of behavioral variability.

      Compared to the first version of the paper, the cluster analysis in this revised version yields three distinct populations instead of the previous four. While the authors suggest that these subpopulations play important roles in encoding different aspects of decision-making, the identification of three rather than four subpopulations seems to me an important update that warrants discussion.

      The clustering results are slightly different because, following suggestions from the first round of reviews, we now use more principled approaches for selecting neurons and computing the clusters. The primary difference is that Clusters 1 and 3 in the original manuscript have mostly been merged into one cluster (new Cluster 3). We updated the text to note that our use of three clusters depends on our choice of clustering cutoff and continue to emphasize that the clusters are consistent across monkeys and clustering techniques: In Results: “Inspection of the dendrogram (hierarchical cluster tree) suggested that our STN samples can be reasonably grouped into three clusters, although other groupings are possible using different clustering cutoffs (Figure 5-S1).” In Discussion: “Furthermore, our clustering analysis aimed to identify common activity profiles in the STN population, while leaving behind many neurons that either did not show consistent task-related modulation or had less common activity profiles (e.g., those that were far from others in the vector space and those with too infrequent occurrence to form detectable clusters). More work is needed to continue to refine our understanding of the specific computational contributions of the STN to decision formation.”

      Finally, I think it would have been interesting to identify the level of collinearity in the model proposed by the authors (equation 7). Indeed, one can expect significant collinearity between some of the proposed explanatory factors of neuronal activity, such as choice and coherence level, for example.

      The reviewer is correct that choice and coherence are correlated with the formulation of Eq. 7. However, such collinearity does not seem to bias the regression results (Author response image 1). We have performed simulations with different modulation strengths and noise levels (A and C) and observed generally good recoverability of the ground-truth regression coefficients (red: unity-slope lines), despite the strong correlation between choice and coherence for one choice (B).

      Author response image 1.

      Similarly, for the analysis relating neuron activity to decision evaluation signals (p 16), firing rates calculated using sliding averages with 1-ms steps are compared, but the method does not specify controls for multiple comparisons or for non-independent data.

      We have made multiple comparison corrections using the Benjamini and Hochberg procedure and updated the relevant text in Methods, Results, and Abstract accordingly.

    1. eLife Assessment

      This study presents a valuable RNA velocity method which predicts the transcription rate linearly based on the expression of RNA levels of transcription factors with addition of comprehensive analyses. The evidence supporting the claims of the authors is solid, although inclusion of a full simulation would have strengthened the study. The work will be of interest to scientists working in the field of RNA biology and precision medicine.

    2. Reviewer #1 (Public review):

      Summary:

      In the paper, the authors propose a new RNA velocity method, TSvelo, which predicts the transcription rate linearly based on the expression of RNA levels of transcription factors. This framework is an extension of its recent work TFvelo by including unspliced reads and designing a coherent neuralODE framework. Improved performance was demonstrated in six diverse datasets.

      Strengths:

      Overall, this method introduces innovative solutions to link cell differentiation and gene regulation, with a balance between model complexity (neuralODE) and interpretability (raw gene space).

      Comments on revised version:

      The authors have added comprehensive analyses in this revision, and all of my concerns have been very well addressed. Here, I just want to re-emphasize the original points 1 and 3.

      (1) The analysis and clarification are very helpful - thanks! I found that Fig. R1 and R2 are very insightful, as DoRothEA-only returns much worse performance. Please consider adding these two figures to the supp figure and possibly highlighting your setting for edge pruning (down-weights); therefore, the model is more likely to be affected by false negatives than false positives in the TF-target prior.

      (3) Please consider adding some discussion on the challenges in capturing cell cycle transitions.

    3. Reviewer #3 (Public review):

      Despite the abundance of RNA velocity tools, there are still major limitations, and there is strong skepticism about the results these methods lead to. In this paper, the authors try to address some limitations of current RNA velocity approaches by proposing a unified framework to jointly infer transcriptional and splicing dynamics. The method is then benchmarked on 6 real datasets against the most popular RNA velocity tools.

      Comments on revised version.

      The Authors addressed all my comments suitably. I'd like to thank them for the time they spent addressing them: the revised paper is much more convincing.

      I have 2 very minor follow-up concerns:

      (1) I appreciated the simulation study, however, no null simulation is present.<br /> We know RNA velocity tools are inclined to provide false positives: trajectories even when the data doesn't have any.<br /> I'd be helpful to add null simulations where the data has no trajectories and see if methods erroneously identify any.

      (2) Several of the novel analyses are only reported in the Supplementary material and only references in the main text (e.g., "A validation of TSvelo on simulated data is provided in Fig. S1 and Fig. S2 in the Supplementary Information."). This is pity!

      If allowed, I'd add some comments about the new analyses (simulations, computational benchmarks, etc...) also in the main text.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In the paper, the authors propose a new RNA velocity method, TSvelo, which predicts the transcription rate linearly based on the expression of RNA levels of transcription factors. This framework is an extension of its recent work TFvelo by including unspliced reads and designing a coherent neuralODE framework. Improved performance was demonstrated in six diverse datasets.

      Strengths:

      Overall, this method introduces innovative solutions to link cell differentiation and gene regulation, with a balance between model complexity (neuralODE) and interpretability (raw gene space).

      We thank the reviewer for the positive evaluation of our work and for recognizing the novelty of the proposed framework. We appreciate the reviewer’s summary highlighting that TSvelo extends our previous method TFvelo by incorporating unspliced reads and introducing a coherent neuralODE framework to model transcription dynamics.

      We are encouraged that the reviewer recognizes the potential of our approach to link cell differentiation with gene regulatory mechanisms, while maintaining a balance between model expressiveness and interpretability in the gene expression space. In the revised manuscript, we have further clarified several methodological details and strengthened the presentation to better highlight these aspects.

      Weaknesses:

      While it seems to provide convincing results, there are multiple technical concerns for the authors to clarify and double-check.

      (1) The authors should clarify and discuss the TF-target map: here, the TF-target genes map is predefined by the TF binding's ChIP-seq data. This annotation is largely incomplete and mostly compiled from a set of bulk tissues. Therefore, for a certain population, the TF-target relation may change. This requires clarification and discussion, possibly exploring how to address this in the model. In addition, a regulon database could be added, e.g., DoRothEA?

      We thank the reviewer for this important comment. The TF–target maps used in TSvelo (e.g., derived from ChIP-seq-based resources such as ENCODE) reflect aggregated TF binding evidence collected across diverse bulk cell types and experimental conditions. As such, they are inherently incomplete and do not capture fully context-specific regulatory activity in a given primary tissue. In TSvelo, we therefore do not treat these annotations as fixed or cell-type-specific ground truth regulatory relationships. Instead, they are used as a permissive prior that encodes a broad set of potential regulatory interactions.

      Within the TSvelo framework, the contribution of each TF–target interaction is learned from data through weight estimation, allowing the model to down-weight or effectively ignore prior edges that are inconsistent with the observed single-cell expression dynamics. This design enables TSvelo to remain robust even when the prior TF–target map is noisy, incomplete, or derived from heterogeneous bulk contexts.

      Following the reviewer’s suggestion, we additionally incorporated the DoRothEA regulon database as an alternative prior with confidence-level filtering. We further performed ablation studies on the pancreas dataset and the gastrulation erythroid dataset using different TF–target resources, including ChEA, ENCODE, and their combinations with DoRothEA.

      The results on the pancreas dataset and the gastrulation erythroid dataset are shown in Figure S13 and Figure S14 respectively, which come up with the same conclusion. We observed highly consistent results across most TF–target prior combinations, including ChEA, ENCODE, ChEA+ENCODE, ChEA+DoRothEA, ENCODE+DoRothEA, and ChEA+ENCODE+DoRothEA. Using the pancreas dataset as example, the mean velocity consistency ranged from 0.985 to 0.995, the mean in-cluster coherence ranged from 0.983 to 0.992, and the mean cross-boundary direction correctness ranged from 0.719 to 0.740 across all settings. These consistently high and tightly bounded metrics indicate that TSvelo is largely insensitive to the specific choice of TF–target prior.

      The only configuration showing reduced stability was the use of DoRothEA alone, particularly in terms of cross-boundary direction correctness. This is likely due to its comparatively limited coverage of TF–target interactions. For instance, in the pancreas dataset, only 81 out of 2000 highly variable genes (HVGs) could be associated with TFs based on DoRothEA, corresponding to 102 TF–target links in total, which may restrict downstream regulatory modeling. In contrast, ChEA covered 1793 genes with 13,976 TF–target links, and ENCODE covered 1854 genes with 33,076 links. These results further suggest that integrating multiple TF–target resources could improve performance, likely due to increased coverage and complementary regulatory information.

      We further acknowledge that regulatory interactions are inherently context-dependent, and that no static TF–target resource can fully capture tissue-specific regulatory programs. In the revised Discussion, we explicitly clarify this limitation and highlight that incorporating context-specific regulatory data (e.g., single-cell chromatin accessibility or perturbation-based regulatory maps) represents an important direction for future improvement.

      (2) The authors should clarify how example genes are selected. This is particularly unclear in Figure 2d.

      We thank the reviewer for raising this point. The example genes shown in Fig. 2d were selected to illustrate representative scenarios where our method provides advantages, particularly cases in which the unspliced–spliced 2D phase portrait exhibits mixed or overlapping patterns that are difficult to model using conventional RNA velocity approaches. These examples are therefore intended to demonstrate the types of transcriptional dynamics that TSvelo is designed to better capture.

      To avoid the impression of selective presentation, we note that our conclusions are based on systematic evaluation across all genes and datasets. Additional visualizations for a broader set of genes on this dataset are provided in Fig. S3. We have clarified the example gene selection criteria in the revised manuscript.

      (3) The authors should clarify confidence in the statement in lines 179-180, that ANXA4 should initially decrease. This is particularly concerning, as TSvelo didn't capture the cell cycle transitions well during the initial part.

      We thank the reviewer for raising this point. The statement that ANXA4 initially decreases is based on the observed expression pattern in the dataset rather than on cell-cycle–related dynamics inferred by the model. Specifically, ANXA4 shows higher expression in Ductal cells compared to Ngn3 EP cells, and Ductal represents an earlier stage in the developmental trajectory. Therefore, along the Ductal to Ngn3 EP transition, ANXA4 naturally exhibits an initial decrease in expression. We have clarified this point in the revised manuscript.

      (4) A support reference should be added for the statement in line 260 that "neuron migrations are inside-out manner". There is no reference supporting this, and this statement is critical for the model assessment.

      We thank the reviewer for this suggestion. This pattern has been reported in previous studies [1,2], which have been added into the revised manuscript.

      To Improve clarity, we have also revised the statement in the manuscript as follows:

      “During cortical development, neurons follow an inside-out layering pattern in which earlier-born neurons populate the deep cortical layers, whereas later-born neurons migrate past them to occupy more superficial layers.”

      (1) Nadarajah, B., Parnavelas, J. Modes of neuronal migration in the developing cerebral cortex. Nat Rev Neurosci 3, 423–432 (2002).

      (2) Li, C., Virgilio, M.C., Collins, K.L. et al. Multi-omic single-cell velocity models epigenome–transcriptome interactions and improves cell fate prediction. Nat Biotechnol 41, 387–398 (2023).

      (5) The comparison to scMultiomics data is particularly interesting, as MultiVelo uses ATAC data to predict the transcription rate. It would be very insightful to add a direct comparison of the estimated transcription rate between using ATAC and directly using TFs' RNA expressions.

      We thank the reviewer for suggesting this highly interesting comparison between ATAC-derived regulatory activity and TF RNA-based proxies for transcription rate estimation.

      We have conducted the requested analysis by computing gene-wise chrome accessibility rate used in MultiVelo and the learned transcription rate from TSvelo, and evaluated their correlation across genes. As shown in Figure S15, the two estimates exhibit almost no global correlation across genes, indicating that they capture substantially different aspects of regulatory information.

      This discrepancy is not unexpected and reflects the fundamental differences between these modalities. scATAC-seq measures chromatin accessibility, which provides a proxy for cis-regulatory potential of genomic regions. However, ATAC signals are inherently sparse and often exhibit a near-binary structure, limiting their ability to directly capture fine-grained temporal regulatory dynamics. In contrast, TF RNA expression reflects downstream transcriptional output, which is shaped by multiple regulatory layers, including post-transcriptional regulation, protein activity, temporal delays, and indirect regulation through intermediate transcriptional or signaling pathways. As a result, these two modalities are expected to capture complementary but not directly comparable aspects of gene regulation.

      Overall, this result suggests that ATAC-based and TF RNA-based signals capture distinct aspects of gene regulation. This further implies that integrating both modalities may be beneficial for future models that aim to more comprehensively characterize transcriptional regulation. We have added this discussion to the supplementary information.

      (6) In Figure 6g, it should be clarified how the lineage was determined. Did the authors use the LARRY barcodes, predicted cell fate, or any other methods? Here, the best way is probably using the LARRY barcodes for individual clones.

      We thank the reviewer for this suggestion. The lineage assignment used in Fig. 6g is described in the Methods section (“Lineage segmentation and pseudotime initialization”). Briefly, lineages are inferred from the transcriptomic structure of the data by performing Leiden clustering followed by PAGA-based connectivity analysis. Starting from an initial Leiden cluster, the filtered PAGA graph defines the shortest paths to other clusters, which are considered as the detected lineages, and diffusion pseudotime (DPT) is then used to initialize pseudotime along each lineage. Thus, in this analysis lineages are determined from the expression-derived trajectory structure. We have clarified this point in the revised manuscript and refer readers to the Methods section.

      Reviewer #2 (Public review):

      Summary:

      Li et al. propose TSvelo, a computational framework for RNA velocity inference that models transcriptional regulation and gene-specific splicing using a neural ODE approach. The method is intended to improve trajectory reconstruction and capture dynamic gene expression changes in scRNA-seq data. However, the manuscript in its current form falls short in several critical areas, including rigorous validation, quantitative benchmarking, clarity of definitions, proper use of prior knowledge, and interpretive caution. Many of the authors' claims are not fully supported by the evidence.

      We thank the reviewer for the careful evaluation of our manuscript and for the constructive comments. We appreciate the concerns regarding validation, benchmarking, methodological clarity, and interpretation. In the revised manuscript, we have carefully addressed these points by adding additional analyses, clarifying methodological details, and moderating several claims to ensure they are fully supported by the data. Detailed responses to each comment are provided below.

      Major comments:

      (1) Modeling comments

      (a) Lines 512-513: How does the U-to-S delay validate the accuracy of pseudotime? Using only a single gene as an example is not sufficient for "validation."

      We thank the reviewer for this important clarification. In the revised manuscript, we have rephrased this part to clarify that Fig. 1a serves only as an illustrative example showing the U-to-S delay for a single gene. Accordingly, we have corrected our statement to indicate that the U-to-S delay is used to infer trajectory orientation, rather than to validate the accuracy of pseudotime.

      In addition, we have expanded the description to explain that U-to-S delay signals are aggregated across all genes to provide a more robust and comprehensive assessment for this purpose. Additional analysis is provided in our response to the next comment.

      (b) Lines 512-518: The authors propose a strategy for selecting the initial state, but do not benchmark how accurate this selection procedure is, nor do they provide sufficient rationale. While some genes may indeed exhibit U-to-S delay during lineage differentiation, why does the highest U-to-S delay score indicate the correct initiation states? Please provide mathematical justification and demonstrate accuracy beyond using a single gene example. Maybe a simulation with ground truth could help here, too.

      We thank the reviewer for this insightful comment. In the revised manuscript, we have clarified both the intuition and justification of this approach. Briefly, along a correctly oriented trajectory, unspliced (U) expression is expected to precede spliced (S) expression due to transcriptional dynamics. Ideally, this U-to-S delay would be observable at the level of individual genes. However, due to the high noise inherent in scRNA-seq data, such delays are often not consistently detectable on a per-gene basis. To address this, we aggregate U-to-S delay signals across all genes and determine the lineage orientation by maximizing a global delay score. Under this criterion, the cluster from which all outgoing lineages exhibit the highest aggregated U-to-S delay is inferred to correspond to the initial state.

      We emphasize that this approach relies on genome-wide aggregation rather than any single gene. Moreover, the same strategy is applied uniformly across all six datasets using identical parameter settings, demonstrating its robustness and stability. To further address the reviewer’s concern, we additionally present the U-to-S delay scores for each Leiden cluster when treated as the initial state across all datasets (Author response image 1). The results on all datasets suggest that the highest U-to-S delay scores can be used to detect the initial cluster.

      Author response image 1.

      The U-to-S delay scores for each Leiden cluster when treated as the initial state across all datasets.

      Following your suggestions, we also add a simulation study. We generated synthetic single-cell RNA velocity datasets using a mechanistic transcriptional dynamics model with one or multiple developmental branches. The system included 200 genes, among which 30 were designated as transcription factors (TFs).

      For each branch, we independently sampled a TF–target regulatory matrix W ϵ R<sup>30×200</sup> from a standard normal distribution to simulate distinct GRN structures. Gene expression dynamics were modeled using a coupled ordinary differential equation (ODE) system describing unspliced and spliced RNA abundances:

      where u and s denote unspliced and spliced RNA levels, respectively. The transcription rate α was computed as a nonlinear function of TF expression, defined as a weighted sum of spliced TF abundance, followed by clipping to ensure bounded activation.

      Each branch is initialized from the same randomly sampled initial condition drawn from a gamma distribution, allowing controlled divergence of trajectories driven solely by branch-specific regulatory programs.

      To simulate observed sequencing counts, we introduced technical noise by scaling latent expression levels with cell-specific library sizes drawn from a log-normal distribution. The resulting expression counts were generated using a negative binomial sampling model:

      where θ controls over dispersion, with smaller values corresponding to higher noise levels. The final datasets consist of paired unspliced (U) and spliced (S) count matrices with realistic transcriptional stochasticity and branching gene regulatory dynamics. For each branch, cells were further divided into three developmental stages for downstream analysis.

      We evaluated TSvelo on multiple simulated datasets with varying numbers of branches and noise levels. There are two or three branches start from the same root cell groups in these datasets (Branch 1: stage 0 - stage 1 - stage 2. Branch 2: stage 0 - stage 3 - stage 4. Branch 3: stage 0 - stage 5 - stage 6). The results of initial state identification based on the unspliced-to-spliced (U-to-S) delay, along with the corresponding 2D velocity stream visualizations, are presented in Supplementary Figure S1. These results demonstrate that the U-to-S delay–based initialization is robust and consistently identifies cells corresponding to the earliest developmental stage (“stage 0”) across different simulation settings. All additional results have been included in the Supplementary Information.

      (c) Equation (8): The formulation looks to be incorrect. If $$W \in \mathbb{R}^{G\times G}$$ and $$W' - \Gamma' \in \mathbb{R}^{K\times K}$$, how can they be aligned within the same row? Please clarify.

      We thank the reviewer for pointing this out. This was a typographical error in the manuscript. In the third line of Equation (8), the term should be W’ instead of W. We have corrected this in the revised manuscript to ensure dimensional consistency.

      (d) The use of prior knowledge graphs from ENCODE or ChEA to constrain regulation raises concerns. Much of the regulatory information in these databases comes from cell lines. How can such cell-line-based regulation be reliably applied to primary tissues, as is done throughout the manuscript? Additional experiments are needed to test the robustness of TSvelo with respect to prior knowledge.

      We thank the reviewer for this important comment. In TSvelo, TF–target networks from resources such as ENCODE and ChEA are incorporated as priors that guide the model toward biologically plausible regulatory structures. Importantly, the contribution of each TF–target interaction is learned from the data, allowing the model to down-weight or override potentially inaccurate or context-mismatched regulatory links. By aggregating signals across a large number of genes, the model further reduces sensitivity to noise and incompleteness in any single prior network.

      To evaluate robustness with respect to prior knowledge, we incorporated the DoRothEA regulon resource as an alternative TF–target prior with confidence-level filtering. We further performed ablation studies on the pancreas dataset and the gastrulation erythroid dataset using different TF–target resources, including ChEA, ENCODE, and their combinations with DoRothEA.

      The results on the pancreas dataset and the gastrulation erythroid dataset are shown in Figure S13 and Figure S14 respectively, which come up with the same conclusion. We observed highly consistent results across most TF–target prior combinations, including ChEA, ENCODE, ChEA+ENCODE, ChEA+DoRothEA, ENCODE+DoRothEA, and ChEA+ENCODE+DoRothEA. Using the pancreas dataset as example, the mean velocity consistency ranged from 0.985 to 0.995, the mean in-cluster coherence ranged from 0.983 to 0.992, and the mean cross-boundary direction correctness ranged from 0.719 to 0.740 across all settings. These consistently high and tightly bounded metrics indicate that TSvelo is largely insensitive to the specific choice of TF–target prior. Notably, these results further suggest that even when the underlying regulatory resources differ in origin (e.g., cell-line-derived vs. curated or aggregated datasets), the inferred dynamics remain stable.

      The only configuration showing reduced stability was the use of DoRothEA alone, particularly for cross-boundary direction correctness. This is likely due to its comparatively limited coverage of TF–target interactions. For instance, in the pancreas dataset, only 81 out of 2000 highly variable genes (HVGs) could be associated with TFs based on DoRothEA, corresponding to 102 TF–target links in total, which may limit downstream regulatory modeling. In contrast, ChEA covered 1793 genes with 13,976 TF–target links, and ENCODE covered 1854 genes with 33,076 links. These results further suggest that integrating multiple TF–target resources can improve performance, likely due to increased coverage and complementary regulatory information.

      We agree that regulatory interactions derived from resources such as ENCODE and ChEA may not fully generalize to primary tissues due to their context-dependent nature. In the revised Discussion, we explicitly clarify this limitation, particularly their inability to capture tissue-specific regulatory programs. We further highlight that incorporating context-specific regulatory data, such as single-cell chromatin accessibility or perturbation-based regulatory maps, represents an important direction for future improvement.

      (e) Lines 579-580: How is the grid search performed? More methodological details are required. If an existing method was used, please provide a citation.

      The grid search for the time step means that the model evaluates the loss in equation (10) across all candidate values of t<sub>step</sub> in the set {0,1,2,...,999}. This strategy was originally adopted in scVelo for optimizing the time step parameter. We have now added the corresponding citation to scVelo in the revised manuscript.

      (2) Application on pancreatic endocrine datasets

      (a) Lines 140-141: What is the definition of the final pseudotime-fitted time t or velocity pseudotime?

      There is no distinction between “final pseudotime”, “fitted time t” and “velocity pseudotime”. All of them refer to the same quantity in our framework. To eliminate any potential ambiguity, we have standardized the terminology by replacing “final pseudotime” with “pseudotime”.

      (b) Lines 143-144: The use of the velocity consistency metric to benchmark methods in multi-lineage datasets is incorrect. In multi-lineage differentiation systems, cells (e.g., those in fate priming stages) may inherently show inconsistency in their velocity. Thus, it is difficult to distinguish inconsistency caused by estimation error from that arising from biological signals. Velocity consistency metrics are only appropriate in systems with unidirectional trajectories (e.g., cell cycling). The abnormally high consistency values here raise concerns about whether the estimated velocities meaningfully capture lineage differences.

      We thank the reviewer for raising this important point regarding the use of the velocity consistency metric in multi-lineage systems. Velocity consistency was initially introduced by scVelo [1] and implemented as scvelo.velocity_confidence() in its package. Velocity consistency provides one of the few widely adopted quantitative criteria for benchmarking RNA velocities [2]. We agree that it is especially suitable for single-lineage processes. For datasets with clear multi-lineage differentiation (Fig. 5 and Fig. 6), we do not use this metric, precisely to avoid the issue highlighted by the reviewer.

      However, the pancreatic endocrine dataset (Fig. 2) exhibits minimal branching, making velocity consistency be more appropriate. As introduced by veloVI study, RNA velocities are supposed to change smoothly over the phenotypic manifold [3]. Higher consistency indicates that neighboring cells show compatible velocity directions, reflecting stable and coherence of the inferred velocity field. Additionally, multiple previous studies used velocity consistency to evaluate model performance on this pancreas dataset [2,3,4], providing a standard point of comparison.

      To better address your concerns, we have replaced the corresponding panel in Fig. 2 of the main text with an evaluation of cell-type separability in both the traditional 2D (unspliced–spliced) phase portrait and the learned 3D (α–unspliced–spliced) phase portrait by TSvelo (Author response image 4 in our response to your subsequent question). We appreciate your suggestions, as the comparison more clearly highlights the novelty and contribution of TSvelo and helps explain its improved performance. Now, the velocity consistency panel has been moved to the Supplementary Information. In addition, we have added a clearer explanation of the cross-boundary correctness metric in the revised manuscript.

      (1) Bergen, V., Lange, M., Peidli, S., Wolf, F. A., & Theis, F. J. (2020). Generalizing RNA velocity to transient cell states through dynamical modeling. Nature Biotechnology, 38(12), 1408-1414.

      (2) Luo, Y., Ren, J., Yang, Q. ... & Li, Q. (2026). Benchmarking RNA velocity methods across 17 independent studies, Cell Reports Methods, 101367.

      (3) Gayoso, A., Weiler, P., Lotfollahi, M., Klein, D., Hong, J., Streets, A., ... & Yosef, N. (2024). Deep generative modeling of transcriptional dynamics for RNA velocity analysis in single cells. Nature Methods, 21(1), 50-59.

      (4) Li, J., Pan, X., Yuan, Y., & Shen, H. B. (2024). TFvelo: gene regulation inspired RNA velocity estimation. Nature Communications, 15(1), 1387.

      (c) The improvement of TSvelo over other methods in terms of cross-boundary direction correctness looks marginal; a statistical test would help to assess its significance.

      We thank the reviewer for this insightful comment. In the revised manuscript, we have added statistical tests for evaluated metrics, including velocity consistency, cross-boundary direction correctness, and in-cluster coherence.

      As shown in Author response image 2, TSvelo significantly outperforms all baseline methods in terms of velocity consistency across both datasets. For in-cluster coherence, TSvelo achieves significantly better performance on the gastrulation (erythroid) dataset, while on the pancreas dataset it performs comparably to the best-performing baselines (UniTVelo and TFvelo) and significantly outperforms several competing methods, including CellDancer, Dynamo, and scVelo.

      For cross-boundary direction correctness, TSvelo shows consistent improvements in mean performance on the pancreas dataset (Author response image 3), and significantly outperforms Dynamo and scVelo on the gastrulation dataset. Although not all pairwise comparisons on cross-boundary direction correctness reach statistical significance, this is likely influenced by the limited number of independent samples (n = 7 and n = 4 for the two datasets, respectively), which reduces statistical power for detecting differences. Importantly, TSvelo still achieves the best average performance among all methods, indicating a consistent overall trend in favor of TSvelo.

      We have added these results into the revised manuscript.

      Author response image 2.

      The quantitative comparison between TSvelo and baseline approaches on the pancreas dataset (panel a) and the gastrulation erythroid dataset (panel b). In each plot, methods are ranked in descending order of their mean values. Numbers at the bottom indicate the sample size for each metric. Significance is determined using a one-sided Mann–Whitney U test. *****, ***, ** and * represent p < 0.00001, 0.0001 ≤ p < 0.001, 0.001 ≤ p < 0.01, and 0.01 ≤ p < 0.05, respectively.

      Author response image 3.

      The comparison of mean cross-boundary direction correctness on the pancreas dataset.

      (d) Lines 177-178: Based on the figure, TSvelo does not appear to clearly distinguish cell types. A quantitative metric, such as Adjusted Rand Index (ARI), should be provided.

      We thank the reviewer for this helpful suggestion. To quantitatively assess whether TSvelo can distinguish cell types, we evaluated the separability of cell-type labels in both the 2D (unspliced–spliced) phase portrait adopted by previous RNA velocity approaches, and the 3D (α–unspliced–spliced, α denotes the transcriptional rate) phase portrait introduced by TSvelo.

      Specifically, we evaluated how well the embedding preserves cell-type information using a k-nearest neighbors (kNN) classification accuracy with 5-fold cross-validation. Given an embedding matrix in 2D or 3D space (X 𝛜 ℝ<sup>n*d</sup>, where n is the number of cells and d is 2 or 3) and corresponding cell-type labels (y 𝛜 {1, … ,C}, we partition the data into five folds. For each fold (k), a kNN classifier with K = 5, denoted asf<sup>(k)</sup>, is trained on the training subset and evaluated on the held-out test subset. The classification accuracy for the k-th fold is defined as ℝ

      where n<sub>k</sub> is the number of samples in the test set and 1(.)is the indicator function. The final score is obtained by averaging across all folds:

      This metric directly assesses whether cells of the same type are positioned close to each other in the embedding space, and is widely used to quantify representation quality.

      Using this evaluation, we observed that the 3D phase portrait consistently achieves significantly higher accuracy than the 2D phase portrait (Author response image 4). The improvement is highly statistically significant (one-sided Mann–Whitney U test, p-value = 4.37 × 10<sup>-10</sup>), demonstrating that the 3D representation provides substantially better separation of cell types.

      We have added these quantitative results to the revised manuscript to complement the visual evidence and to clarify that TSvelo effectively distinguishes cell types in the learned representation.

      Author response image 4.

      The evaluation of the separability of cell-type labels in both the 2D (unspliced–spliced) phase portrait and the 3D (α–unspliced–spliced) phase portrait for the pancreas dataset.

      (e) Lines 179-183: The claim that traditional methods cannot capture dynamics in the unspliced-spliced phase portrait is vague. What specific aspect is not captured-the fitted values or something else? Evidence is lacking. Please provide a detailed explanation and quantitative metrics to support this claim.

      We thank the reviewer for this important comment. We have revised the text to more clearly illustrate this point using representative example genes as follows: “For instance, ANXA4 shows higher expression in Ductal cells compared to Ngn3 low EP cells, which mean its expression pattern exhibits an initial decrease followed by an increase. Such dynamics are not easily captured in the conventional unspliced–spliced phase portrait used by previous approaches, as many baseline methods implicitly assume a decreasing–then–increasing expression pattern. By comparison, TSvelo can still fit such expression pattern by using additional information from the 3D phase portrait.”

      In addition, we also clarify that the 2D u–s representation has limited capacity to separate heterogeneous dynamic cell states, which can affect downstream velocity field estimation. In the conventional 2D u–s phase portrait, cells from different dynamic regimes may overlap in the same region of the embedding space. This overlap reduces the identifiability of underlying transcriptional states and makes the inferred local dynamics more ambiguous. In contrast, TSvelo introduces an additional latent variable α, forming a 3D (α, u, s) phase portrait, which helps disentangle these mixed trajectories and yields a more structured and separable representation of cell dynamics. We have provided quantitative evidence in the previous response (Author response image 4). Briefly, the proposed 3D representation achieves consistently higher kNN classification accuracy (5-fold cross-validation, k=5) for cell state identification compared to the 2D u–s embedding.

      (3) Application to gastrulation erythroid datasets

      (a) Lines 191-194: The observation that velocity genes are enriched for erythropoiesis-related pathways is trivial, since the analysis is restricted to highly variable genes (HVGs) from an erythropoiesis dataset. This enrichment is expected and therefore not informative.

      We thank the reviewer for this comment and agree that such enrichment is expected given the use of HVGs from an erythropoiesis dataset. This analysis was included only as a preliminary sanity check to support the plausibility of the inferred velocity genes, rather than as a main result. We have accordingly simplified the description and clarified that this analysis serves only as a preliminary check in the revised manuscript.

      (b) Lines 227-228: It remains unclear how TSvelo "accurately captures the dynamics." What is the definition of dynamics in this context? Figure 3g shows unspliced/spliced vs. fitted time plots and phase portraits, but without a quantitative definition or measure, the claim of superiority cannot be supported. Visualization of a single gene is insufficient; a systematic and quantitative analysis is needed.

      We thank the reviewer for this important comment. We have revised the text to more clearly illustrate this point using representative example genes as follows: “For HSP90AB1, which exhibits a counter-clockwise pattern in the unspliced–spliced phase portrait, in contrast to the clockwise dynamics typically assumed by most baseline approaches, it is difficult for previous methods to capture this behavior, whereas TSvelo can still faithfully model such patterns. For genes such as RPS26, which have critical roles in the development in blood progenitors to erythroid40, the unspliced-spliced data is so noisy that cells of different types overlap in phase portrait. TSvelo can still captures the gene dynamics and reveals differences in transcription rates across cell types.”

      In addition, we explicitly emphasize the role of the 3D (α, u, s) phase portrait, which provides a more structured and separable representation of transcriptional states compared to the conventional 2D u–s space. This improved representation is the key factor underlying the advantages of TSvelo in modeling transcriptional processes. In the conventional 2D u–s phase portrait, cells from different transcriptional states may overlap, leading to reduced separability. In contrast, introducing the latent variable α expands the representation to a 3D space, which helps disentangle these mixed states and yields a clearer phase structure. Similar to our previous response in Author response image 4, we provide quantitative evidence on this gastrulation erythroid dataset in Figure S7, showing that the 3D representation achieves consistently higher kNN classification accuracy for cell state separation compared to the 2D u–s embedding (one-sided Mann–Whitney U test, p-value = 0.002).

      (4) Application to the mouse brain and other datasets

      (a) Lines 280-281: The authors cannot claim that velocity streams are smoother in TSvelo than in Multivelo based solely on 2D visualization. Similarly, claiming that one model predicts the correct differentiation trajectory from a 2D projection is over-interpretation, as has been discussed in prior literature see PMID: 37885016.

      We thank the reviewer for this important comment. Consistent with other RNA velocity studies, TSvelo employs the 2D UMAP stream plot for visualizing the results. We agree that conclusions based solely on 2D visualizations may lead to over-interpretation. Our intention was to provide an intuitive visualization rather than a rigorous quantitative comparison. Accordingly, we have revised the text to avoid making definitive claims about smoothness or correctness of differentiation trajectories based solely on 2D projections.

      (b) Lines 304-306: Beyond transcriptional signal estimation, how is regulation inferred solely from scRNA-seq data validated, especially compared with scATAC-seq data? Are there cases where transcriptome-based regulatory inference is supported by epigenomic evidence, thereby demonstrating TSvelo's GRN inference accuracy?

      We thank the reviewer for this important question regarding the validation of regulatory inference derived from scRNA-seq data and its comparison to scATAC-seq-based evidence.

      We would like to first clarify the scope of TSvelo. Similar to existing RNA velocity methods, the primary goal of TSvelo is to model transcriptional dynamics and accurately infer cell state transitions and cell fate trajectories. In this context, gene regulatory information is not inferred de novo from data, but incorporated as prior knowledge from curated TF–target databases to guide and constrain the dynamics modeling process, as described in our Introduction.

      We have conducted the requested analysis by computing gene-wise chrome accessibility rate used in MultiVelo and the learned transcription rate from TSvelo, and evaluated their correlation across genes. As shown in Figure S15, the two estimates exhibit almost no global correlation across genes, indicating that they capture substantially different aspects of regulatory information.

      This discrepancy is not unexpected and reflects the fundamental differences between these modalities. scATAC-seq measures chromatin accessibility, which provides a proxy for cis-regulatory potential of genomic regions. In contrast, TF RNA expression reflects downstream transcriptional output, which is shaped by multiple regulatory layers, including post-transcriptional regulation, protein activity, temporal delays, and indirect regulation through intermediate transcriptional or signaling pathways. As a result, these two modalities are expected to capture complementary but not directly comparable aspects of gene regulation.

      We acknowledge that scATAC-seq provides valuable complementary information on chromatin accessibility and regulatory potential, and will consider incorporating matched multi-omics data in future work. In the revised manuscript, we further clarify that TSvelo is an RNA velocity method that incorporates prior knowledge from curated TF–target databases, and we have added a discussion on the potential use of scATAC-seq data for future extension of our framework.

      (c) The claim that TSvelo can model multi-lineage datasets hinges on its use of PAGA for lineage segmentation, followed by independent modeling of dynamics within each subset. However, the procedure for merging results across subsets remains unclear.

      We thank the reviewer for pointing out that the merging step was not sufficiently described. After modeling dynamics independently within each lineage-specific subset, TSvelo integrates the results via a weighted aggregation procedure at the cell level.

      For each cell and each inferred quantity (e.g., velocity or other dynamic variables), we collect the estimates obtained from different lineage-specific models and combine them using a weighted average. The weights are defined by the size of each lineage, reflecting its statistical support. We have clarified details about this merging procedure in the Methods section.

      This aggregation reconciles multiple lineage-specific estimates for the same cell into a single value and mitigates discontinuities that could arise from directly combining independent lineage analyses. The resulting values define a unified set of dynamics for each cell across lineages.

      Reviewer #3 (Public review):

      Despite the abundance of RNA velocity tools, there are still major limitations, and there is strong skepticism about the results these methods lead to. In this paper, the authors try to address some limitations of current RNA velocity approaches by proposing a unified framework to jointly infer transcriptional and splicing dynamics. The method is then benchmarked on 6 real datasets against the most popular RNA velocity tools.

      While the approach has the potential to be of interest for the field, and may present improvements compared to existing approaches, there are some major limitations that should be addressed, particularly concerning the benchmark (see major comment 1).

      Major comments:

      (1) My main criticism concerns the benchmarking: real data lack a ground truth, and are absolutely not ideal for comparing methods, because one can only speculate what results appear to be more plausible.

      A solid and extensive simulation study, which covers various scenarios and possibly distinct data-generating models, is needed for comparing approaches. The authors should check, for example, the simulation studies in the BayVel approach (Section 4, BayVel: A Bayesian Framework for RNA Velocity Estimation in Single-Cell Transcriptomics). Clearly, all methods should be included in the simulation.

      Following your recommendation, we have added the simulation analysis to compare TSvelo with existing RNA velocity approaches. We generated synthetic single-cell RNA velocity datasets using a mechanistic transcriptional dynamics model with one or multiple developmental branches. The system included 200 genes, among which 30 were designated as transcription factors (TFs).

      For each branch, we independently sampled a TF–target regulatory matrix W ϵ ℝ<sup>30×200</sup> from a standard normal distribution to simulate distinct GRN structures. Gene expression dynamics were modeled using a coupled ordinary differential equation (ODE) system describing unspliced and spliced RNA abundances:

      where u and s denote unspliced and spliced RNA levels, respectively. The transcription rate α was computed as a nonlinear function of TF expression, defined as a weighted sum of spliced TF abundance, followed by clipping to ensure bounded activation.

      Each branch is initialized from the same randomly sampled initial condition drawn from a gamma distribution, allowing controlled divergence of trajectories driven solely by branch-specific regulatory programs.

      To simulate observed sequencing counts, we introduced technical noise by scaling latent expression levels with cell-specific library sizes drawn from a log-normal distribution. The resulting expression counts were generated using a negative binomial sampling model:

      where θ controls over dispersion, with smaller values corresponding to higher noise levels. The final datasets consist of paired unspliced (U) and spliced (S) count matrices with realistic transcriptional stochasticity and branching gene regulatory dynamics. For each branch, cells were further divided into three developmental stages for downstream analysis.

      We evaluated TSvelo and those splicing-based RNA velocity approaches on multiple simulated datasets with varying numbers of branches and noise levels. There are one, two or three branches start from the same cell group in these datasets (Branch 1: stage 0 - stage 1 - stage 2. Branch 2: stage 0 - stage 3 - stage 4. Branch 3: stage 0 - stage 5 - stage 6). We primarily assessed performance using the cross-boundary direction correctness (CBDir) metric, as it directly evaluates inferred trajectories against ground-truth cell stage annotations, which have been widely adopted in RNA velocity studies such as VeloAE and UniTvelo. In detail, Cross-boundary direction correctness assesses the accuracy of transitions from a source cluster to a target cluster by examining the boundary cells, and requires ground truth annotations. We directly run the function unitvelo.evaluate() provided in UniTVelo to obtain the Cross-boundary direction correctness. In detail, the CBDir is calculated as follows:

      where θ controls over dispersion, with smaller values corresponding to higher noise levels. The final datasets consist of paired unspliced (U) and spliced (S) count matrices with realistic transcriptional stochasticity and branching gene regulatory dynamics. For each branch, cells were further divided into three developmental stages for downstream analysis.

      where C<sub>A</sub> denotes the set of cells in the target cluster A, and N(c) represents the neighboring cells of a given cell c v<sub>c</sub> and x<sub>c</sub> denote the low-dimensional velocity and state vectors of cell c, respectively, and x<sub>c’</sub> denotes the state vector of its neighboring cell.

      As shown in Figure S2, TSvelo consistently achieves the highest accuracy across all simulation settings, particularly in scenarios with complex branching structures, which pose significant challenges for baseline methods.

      (2) Related to the above: since a ground truth is missing, the real data analyses need to be interpreted with caution. I recommend avoiding strong statements, such as "successfully captures the correct gene dynamics", or "accurately infer", in favour of milder statements supported by the data, such as "... aligns with the biological processes described" (as in page 12), or "results are compatible with current biological knowledge", etc...

      We thank the reviewer for this helpful comment. We agree that analyses on real datasets should be interpreted with appropriate caution because definitive ground truth is typically unavailable. Following the reviewer’s suggestion, we have revised the wording throughout the manuscript to avoid overly strong claims. For example, statements such as “successfully captures the correct gene dynamics” and “accurately infer” have been replaced with more cautious descriptions such as “consistent with known biological processes”.

      (3) Many methods perform RNA velocity analyses. While there is a brief description, I think it'd be useful to have a schematic summary (e.g., via a Table) of the main conceptual, mathematical, and computational characteristics of each approach.

      We thank the reviewer for this insightful suggestion. We agree that a structured summary of existing RNA velocity methods would improve clarity and accessibility. We have added a new summary table (Table S1) that systematically compares representative RNA velocity approaches in the supplementary information.

      (4) Related to the above: I struggled to identify the main conceptual novelty of TSvelo, compared to existing approaches. I recommend explaining this aspect more extensively.

      We thank the reviewer for this insightful comment. We agree that the conceptual novelty of TSvelo can be more clearly articulated.

      In the revised manuscript, we have expanded the discussion at the beginning of the Results section to explicitly highlight the key distinctions between TSvelo and existing approaches. Specifically, we now clarify that most existing RNA velocity methods predominantly focus on splicing dynamics and typically operate in a gene-wise manner, without capturing coordinated dynamics across genes. In contrast, TSvelo models the full cascade of transcriptional regulation, transcription, and splicing within a unified framework, and estimates RNA velocity jointly across all genes, thereby capturing their coordinated dynamics at the system level.

      (5) A computational benchmark is missing; I'd appreciate seeing the runtime and memory cost of all methods in a couple of datasets.

      We thank the reviewer for this helpful suggestion regarding computational benchmarking. In the revised manuscript, we have added a systematic comparison of runtime and GPU memory usage across TSvelo and ba methods using simulated datasets of increasing scale (600, 1200, and 1800 cells) on our NVIDIA GeForce RTX 3090 device with 24 GB memory.

      Table S2 shows differences in computational efficiency and resource requirements among methods. Specifically, classical methods such as scVelo and Dynamo exhibit very fast runtimes (10–24 seconds) and do not rely on GPU acceleration, reflecting their relatively lightweight modeling strategies. In contrast, deep learning–based approaches, including UniTVelo, cellDancer, and TSvelo, have higher computational costs due to their increased model complexity.

      TSvelo exhibits a stable GPU memory footprint (~1.26 GB) across different dataset sizes, indicating that its memory usage is primarily determined by model architecture rather than the number of cells. This level of memory consumption is well within the capacity of modern GPUs and does not pose practical limitations. In terms of runtime, TSvelo scales approximately linearly with dataset size. The higher computational cost of TSvelo is mainly due to its EM-style optimization procedure, where each M-step also involves multiple optimization updates to infer gene regulatory effects in a global model. This design enables TSvelo to explicitly incorporate regulatory priors and jointly model gene interactions, which is not supported by these baseline methods.

      To further improve runtime efficiency, TSvelo allows flexible control of the number of EM iterations. As shown in Figure S16 and Table S3, we evaluated performance under different iteration settings on the simulation dataset. The early stopping strategy employed in the EM framework of TSvelo, which will stop modeling if the loss is not further reduced in the last 3 iterations. Results show that convergence is typically achieved within 3 iterations for this dataset, and increasing the maximum number of iterations beyond this does not further change the results. Notably, even a single iteration already yields competitive performance, likely benefiting from the strong initialization based on unspliced-to-spliced temporal delay.

      Overall, these results highlight a trade-off between computational efficiency and modeling expressiveness. While TSvelo is more computationally demanding than classical approaches, it provides a more flexible framework for incorporating regulatory information and capturing complex gene interactions, which we believe justifies the additional computational cost in scenarios requiring accurate dynamical inference.

      (6) I think BayVel (mentioned above) should be added to the list of competing methods (both in the text and in the benchmarks). The package can be found here: https://github.com/elenasabbioni/BayVel_pkgJulia.

      We thank the reviewer for suggesting BayVel and for providing the repository link. We carefully review the available resources, including both the BayVel_pkgJulia and the BayVel_notebooks, and we appreciate the authors’ efforts in making their code and data publicly available.

      We note that BayVel repositories primarily provide scripts and data for reproducing the figures and results reported in their manuscript. However, at present, the available resources do not yet provide a complete guideline or standardized pipeline for applying BayVel to new datasets. To ensure a fair and reproducible comparison, we therefore tend to use BayVel results officially provided by the authors. We are grateful that the BayVel results on the pancreas dataset is released at BayVel_notebooks page: https://github.com/elenasabbioni/BayVel_notebooks/tree/main/real%20data/Pancreas/moments/output.

      Based on these results, we conducted comparisons across all methods on the pancreas dataset, with quantitative evaluations shown in Author response image 55. In each plot, methods are ranked in descending order of their mean values. Numbers at the bottom indicate the sample size for each metric. Statistical significance is assessed using a one-sided Mann–Whitney U test, where *****, ***, **, and * denote p < 0.00001, 0.0001 ≤ p < 0.001, 0.001 ≤ p < 0.01, and 0.01 ≤ p < 0.05, respectively.

      BayVel has now been included in the Introduction, and corresponding comparisons have been added in the revised manuscript.

      Author response image 5.

      The quantitative comparison between TSvelo and baseline approaches on the pancreas dataset. In each plot, methods are ranked in descending order of their mean values. Numbers at the bottom indicate the sample size for each metric. Significance is determined using a one-sided Mann–Whitney U test. *****, ****,***, ** and * represent p < 0.00001, 0.00001 ≤ p < 0.0001, 0.0001 ≤ p < 0.001, 0.001 ≤ p < 0.01, and 0.01 ≤ p < 0.05, respectively.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Please carefully proofread the text. Some typos:

      (1) Line 110: differentia -> differential.

      (2) Line 280: ".," to be corrected.

      (3) Line 566: optimize -> optimizes.

      We thank the reviewer for carefully proofreading the manuscript and for pointing out these typographical errors. We have corrected the identified typos in the revised manuscript.

      Reviewer #3 (Recommendations for the authors):

      (1) Regarding Major Comment 1 in the Public Review, I contacted BayVel authors, who told me that they'll upload all their scripts here within a few days: https://github.com/elenasabbioni/BayVel_notebooks

      Thank you very much for reaching out to the BayVel authors. We sincerely appreciate the BayVel authors’ efforts to make their scripts and results publicly available through BayVel_notebooks. We believe this is a valuable contribution that will greatly benefit the community.

      We have followed the repository and have now included BayVel in the revised manuscript, with corresponding comparisons added to both the main text and the benchmarking results.

      (2) Page 9 mentions "consistency", "coherence", and "correctness". Instead of these qualitative (and potentially subjective) evaluations, I'd appreciate using quantitative metrics or visual descriptions when differences are visually clear.

      We thank the reviewer for this insightful comment. The terms “velocity consistency,” “in-cluster coherence,” and “cross-boundary correctness” used in our manuscript are not intended as subjective descriptions. They correspond to commonly used evaluation criteria in this field and have been adopted as quantitative metrics in previous studies, such as VeloAE[1] and UniTVelo[2]. We have incorporated the following updated definition into the Methods section.

      (1) Velocity consistency (VCon). We used the scvelo.velocity_confidence() function from scVelo to evaluate velocity consistency, interpreting the results as a measure of how consistent velocities are within neighboring cells. Velocity consistency is especially suitable for evaluating the RNA velocity modeling on single lineage. For each cell , the velocity consistency is calculated as follows:

      Where N (c) represents the neighboring cells of a given cell c v<sub>c</sub> v<sub>c’</sub> denote the low-dimensional velocity vectors of cell cand its neighboring cell c’.

      (2) Cross-boundary direction correctness (CBDir). Cross-boundary direction correctness assesses the accuracy of transitions from a source cluster to a target cluster by examining the boundary cells, and requires ground truth annotations. We directly run the function unitvelo.evaluate() provided in UniTVelo to obtain the Cross-boundary direction correctness. In detail, the CBDir is calculated as follows:

      Where C<sub>A</sub> denotes the set of cells in the target cluster A, and represents the neighboring cells of a given cell c v<sub>c</sub> v<sub>c’</sub> denote the low-dimensional velocity and state vectors of cell cand its neighboring cell c’.

      (3) Within-cluster velocity coherence (ICCoh). Within-cluster velocity coherence measures the coherence of velocities within a single cluster using a cosine similarity score between cell velocities. We applied the function unitvelo.evaluate() provided by UniTVelo to directly compute the within-cluster velocity coherence. Using the same notation as defined above, the CBDir is calculated as follows:

      (1) Qiao, C. & Huang, Y. Representation learning of RNA velocity reveals robust cell transitions. Proceedings of the National Academy of Sciences 118, e2105859118 (2021).

      (2) Gao, M., Qiao, C. & Huang, Y. UniTVelo: temporally unified RNA velocity reinforces single-cell trajectory inference. Nature Communications 13, 6586 (2022).

      (3) At page 3, some objects are not defined after formula (3):

      ReLU finction, and w_gi

      Additionally, parenthesis of ReLU function should be bigger.

      We thank the reviewer for pointing this out. In the revised manuscript, we have explicitly defined the ReLU activation function and clarified that w<sub>gi</sub> represents the regulatory weight of TF i on the target gene g. In addition, we have adjusted the formatting of Eq. (3) by enlarging the parentheses in the ReLU function to improve readability.

    1. eLife Assessment

      The authors present a solid study in the unique conditions of weightlessness providing evidence that movements carried out in 0g are underactuated. They further provide a thorough discussion based on computational modelling to address the question as to whether the CNS underestimates mass when programming movements in weightlessness. In all cases, the persistence of the observed effects in weightlessness has important implications for theories of motor adaptation.

    2. Reviewer #1 (Public review):

      The authors have conducted substantial additional analyses to address the reviewers' comments. However, several key points still require attention. I was unable to see the correspondence between the model predictions and the data in the added quantitative analysis. In the rebuttal letter, the delta peak speed time displays values in the range of [20, 30] ms, whereas the data were negative for the 45{degree sign} direction. Should the reader directly compare panel B of Figure 6 with Figure 1E? The correspondence between the model and the data should be made more apparent in Figure 6. Furthermore, the rebuttal states that a quantitative prediction was not expected, yet it subsequently argues that there was a quantitative match. Overall, this response remains unclear.

      A follow-up question concerns the argument about strategic slowing. The authors argue that this explanation can be rejected because the timing of peak speed should be delayed, contrary to the data. However, there appears to be a sign difference between the model and the data for the 45{degree sign} direction, which means that it was delayed in this case. Did I understand correctly? In that regard, I believe that the hypothesis of strategic slowing cannot yet be firmly rejected and the discussion should more clearly indicate that this argument is based on some, but not all, directions. I agree with the authors on the importance of the mass underestimation hypothesis, and I am not particularly committed to the strategic slowing explanation, but I do not see a strong argument against it. If the conclusion relies on the sign of the delta peak speed, then the authors' claims are not valid across all directions, and greater caution in the interpretation and discussion is warranted. Regarding the peak acceleration time, I would be hesitant to draw firm conclusions based on differences smaller than 10 ms (Figures R3 and 6D).

      The authors state in the rebuttal that the two hypotheses are competing. This is not accurate, as they are not mutually exclusive and could even vary as a function of movement direction. The abstract also claims that the data "refutes" strategic slowing, which I believe is too strong. The main issue is that, based on the authors' revised manuscript, the lack of quantitative agreement between the model and the data for the mass underestimation hypothesis is considered acceptable because a precise quantitative match is not expected, and the predictions overall agree for some (though not all) directions and phases (excluding post-in). That is reasonable, but by the same logic, the small differences between the model prediction and the strategic slowing hypothesis should not be taken as firm evidence against it, as the authors seem to suggest. In practice, I recommend a more transparent and cautious interpretation to avoid giving readers the false impression that the evidence is decisive. The mass underestimation hypothesis is clearly supported, but the remaining aspects are less clear, and several features of the data remain unexplained.

      Comments on revised version.

      The authors have reworked the sections of the text where the narrative was too strong or binary wrt alternative interpretations. The result is well balanced. No further recommendation.

    3. Reviewer #3 (Public review):

      Summary:

      The authors describe an interesting study of arm movements carried out in weightlessness after a prolonged exposure to the so-called microgravity conditions of orbital spaceflight. Subjects performed radial point-to-point motions of the fingertip on a touch pad. The authors note a reduction in movement speed in weightlessness, which they hypothesize could be due to either an overall strategy of lowering movement speed to better accommodate the instability of the body in weightlessness or an underestimation of body mass. They conclude for the latter, mainly based on two effects. One, slowing in weightlessness is greater for movement directions with higher effective mass at the end effector of the arm. Two, they present evidence for increased number of corrective sub movements in weightlessness. They contend that this provides conclusive evidence to accept the hypothesis of an underestimation of body mass.

      Strengths:

      In my opinion, the study provides a valuable contribution, the theoretical aspects are well presented through simulations, the statistical analyses are meticulous, the applicable literature is comprehensively considered and cited and the manuscript is well written.

      Weaknesses:

      I nevertheless am of the opinion that the interpretation of the observations leaves room for other possible explanations of the observed phenomenon, thus weakening the strength of the arguments.

      I raised the following points in my original review, but I find that the authors have judiciously addressed these points through their various revisions.

      I believe that the article constitutes a valuable contribution and that the results and conclusions are certainly worthy of consideration by the human motor control community.

      (1) The authors model the movement control through equations that derive the input control variable in terms of the force acting on the hand and treating the arm as a second-order low pass filter (Eq. 13). Underestimation of the mass in the computation of a feedforward command would lead to a lower-than-expected displacement to that command. But it is not clear if and how the authors account for a potential modification of the time constants of the 2nd order system. The CNS does not effectuate movements with pure torque generators. Muscles have elastic properties that depend on their tonic excitation level, reflex feedback and other parameters. Indeed, Fisk et al.* showed variations of movement characteristics consistent with lower muscle tone, lower bandwidth and lower damping ratio in 0g compared to 1g. Could the variations in the response to the initial feedforward command be explained by a misrepresentation of the limbs damping and natural frequency, leading to greater uncertainty to the consequences of the initial command. This would still be an argument for un-adapted feedforward control of the movement, leading to the need for more corrective movements. But it would not necessarily reflect an underestimation of body mass.

      *Fisk, J. O. H. N., Lackner, J. R., & DiZio, P. A. U. L. (1993). Gravitoinertial force level influences arm movement control. Journal of neurophysiology, 69(2), 504-511.

      While the authors attempt to differentiate their study from previous studies where limb neuromechanical impedance was shown to be modified in weightlessness by emphasizing that in the current study the movements were rapid and the initial movement is "feedforward". But this incorrectly implies that the limb's mechanical response to the motor command is determined only by active feedback mechanisms. In fact:

      (a) All commands to the muscle pass through the motor neurons. These neurons receive descending activations related not only to the volitional movement, but also to the dynamic state of the body and the influence of other sensory inputs, including the vestibular system. A decrease in descending influences from the vestibular organs will lower the background sensitivity to all other neural influences on the motor neuron. Thus, the motor neuron may be less sensitive to the other volitional and reflexive synaptic inputs that it may receive.

      (b) Muscle tone plays a significant role in determining the force and the time course of the muscle contraction. In a weightless environment, where tonic muscle activity is likely to be reduced, there is the distinct possibility that muscles will react more slowly and with lower amplitude to an otherwise equivalent descending motor command, particularly in the initial moments before spinal reflexes come into play. These, and other neuronal mechanisms could lead to the "under-actuation" effect observed in the current study, without necessarily being reflective of an underestimation of mass per se.

      (2) The subject's body in weightless is much more sensitive to reaction forces in interactions with the environment in the absence of the anchoring effect of gravity pushing the body into the floor and in the absence of anticipatory postural adjustments that typically accompany upper-limb motions in Earth gravity in order to maintain an upright posture. The authors dismiss this possibility because the taikonauts were asked to stabilize their bodies with the contralateral hand. But the authors present no evidence that this was sufficient to maintain the shoulder and trunk at a strictly constant position, as is supposed by the simplified biomechanical model used in their optimal control framework. Indeed, a small backward motion of the shoulder would result in a smaller acceleration of the fingertip and a smaller extent of the initial ballistic motion of the hand with respect to the measurement device (the tablet), consistent with the observations reported in the study. Note that stability of the base might explain why 45º movements were apparently less affected in weightlessness, according to many of the reported analyses, including those related to corrective movements (Fig. 5 B, C, F; Fig. 6D), than the other two directions. If the trunk is being stabilized by the left arm, the same reaction forces on the trunk due to the acceleration of the hand will result in less effective torque on the trunk, given that the reaction forces act with a much smaller moment arm with respect to the left shoulder (the hand movement axis passes approximately through the left shoulder for the 45º target) compared to either the forward or rightward motions of the hand.

      (3) The above is exacerbated by potential changes in the frictional forces between the fingertip and the tablet. The movements were measured by having the subjects slide their finger on the surface of a touch screen. In weightlessness, the implications of this contact can be expected to be quite different than on the ground. While these forces may be low on Earth, the fact is that we do not know what forces the taikonauts used on orbit. In weightlessness, the taikonauts would need to actively press downward to maintain contact with the screen, while on Earth gravity will do the work. The tangential forces that resist movement due to friction might therefore be different in 0g. . Indeed, given the increased instability of the body and the increased uncertainty of movement direction of the hand, taikonauts may have been induced to apply greater forces against the tablet in order to maintain contact in weightlessness, which would in turn slow the motion of the finger on the table and increase the reaction forces acting on the trunk. This could be particularly relevant given that the effect of friction would interact with the limb in a direction-dependent fashion, given the anisotropy of the equivalent mass at the fingertip evoked by the authors.

      I feel that the authors have done an admirable job of exploring the how to explain the modifications to movement kinematics that they observed on orbit within the constraints of the optimal control theory applied to a simplified model of the human motor system. While I fully appreciate the value of such models to provide insights into question of human sensorimotor behaviour, to draw firm conclusions on what humans are actually experiencing based only on manipulations of the computational model, without testing the model's implicit assumptions and without considering the actual neurophysiological and biomechanical mechanisms, can be misleading. One way to do this could be to examine these questions through extensions to the model used in the simulations (changing activation dynamics of the torque generators, allowing for potential motion backward motion of the shoulder and trunk, etc.). A better solution would be to emulate the physiological and biomechanical conditions on Earth (supporting the arm against gravity to reduce muscle tone, placing the subject on a moveable base that requires that the body be stabilized with the other hand) in order to distinguish the hypothesis of an underestimation of mass vs. other potential sources of under-actuation and other potential effects of weightlessness on the body.

      In sum, my opinion is that the authors are relying too much on a theoretical model as a ground truth and thus overstate their conclusions. But to provide a convincing argument that humans truly underestimate mass in weightlessness, they should consider more judiciously the neurophysiology and biomechanics that fall outside the purview of the simplified model that they have chosen. If a more thorough assessment of this nature is not possible, then I would argue that a more measured conclusion of the paper should be 1) that the authors observed modifications to movement kinematics in weightlessness consistent with an under-actuation for the intended motion, 2) that a simplified model of human physiology and biomechanics that incorporates principles of optimal control suggest that the source of this under-actuation might be an underestimation of mass in the computation of an appropriate feedforward motor command, and 3) that other potential neurophysiological or biomechanical effects cannot be excluded due to limitations of the computational model.

    4. Author response:

      The following is the authors’ response to the original reviews.

      General recommendations (from the Reviewing Editor):

      The reviewers discussed the revision at length, and all were appreciative of the revisions to the paper. Nonetheless, they agreed that the evidence against alternative hypotheses was not yet decisive, and it may not be possible to provide the evidence needed given the difficulty of acquiring this data. Thus they feel that a more nuanced interpretation of the data and tempering of the conclusions is necessary. These points are described in more detail in the reviewer-specific comments in the Public reviews.

      We thank the editor and the reviewers for their constructive discussion. In this revision, we have adopted these recommendations: we have tempered our conclusions and removed binary framing, taking into consideration that other alternative explanations might exist. We have also expanded the Discussion to consider additional potential mechanisms and added corresponding limitations. We also changed the paper title to avoid strong inference; the new title is “Evidence that humans underestimate body mass in microgravity: kinematic signatures in reaching movements during spaceflight”.

      Public Reviews:

      Reviewer #1 (Public review):

      The authors have conducted substantial additional analyses to address the reviewers' comments. However, several key points still require attention. I was unable to see the correspondence between the model predictions and the data in the added quantitative analysis. In the rebuttal letter, the delta peak speed time displays values in the range of [20, 30] ms, whereas the data were negative for the 45{degree sign} direction. Should the reader directly compare panel B of Figure 6 with Figure 1E? The correspondence between the model and the data should be made more apparent in Figure 6. Furthermore, the rebuttal states that a quantitative prediction was not expected, yet it subsequently argues that there was a quantitative match. Overall, this response remains unclear.

      We thank the reviewer raising the question about Figure 6B. We would like to clarify that the phrase "quantitative match" in the summary of our previous rebuttal letter was a wording error; in fact, the subsequent detailed responses consistently and correctly described the comparison as qualitative. We apologize for the confusion this may have caused, and address this point below.

      First, we have revised the manuscript to clarify this point. We have added the following statement: "We note that these correlations evaluate the directional trend rather than the absolute magnitude of the effects; a precise quantitative match is not expected given the simplifications of the two-joint arm model." in the main text.

      Second, we have replaced Figure 6 with a revised version that presents model-predicted Δ values and experimentally observed Δ values side by side, allowing for a more intuitive visual comparison. As shown in the updated figure, the directional trends are broadly consistent amplitude changes and timing shifts are rank-ordered by movement direction in both model and data while the absolute magnitudes do not precisely match. We believe this layout makes the intended comparison more transparent.

      As discussed in our previous response, as noted above, a precise quantitative match is not expected given our model's simplifications, and this level of qualitative comparison is consistent with established practice in similar modeling studies (e.g., Gaveau et al., 2016).

      Regarding the negative Δ peak speed time at 45°: as shown in our statistical analyses (Figure 4A, Figure 5F), there was no significant timing change at 45°. The negative value reflects a small, non-significant mean difference. The key pattern that timing advance increases for directions associated with higher effective inertia holds for the 90° and 135° directions, which is the directional trend our analysis was designed to capture.

      A follow-up question concerns the argument about strategic slowing. The authors argue that this explanation can be rejected because the timing of peak speed should be delayed, contrary to the data. However, there appears to be a sign difference between the model and the data for the 45{degree sign} direction, which means that it was delayed in this case. Did I understand correctly? In that regard, I believe that the hypothesis of strategic slowing cannot yet be firmly rejected and the discussion should more clearly indicate that this argument is based on some, but not all, directions.

      I agree with the authors on the importance of the mass underestimation hypothesis, and I am not particularly committed to the strategic slowing explanation, but I do not see a strong argument against it. If the conclusion relies on the sign of the delta peak speed, then the authors' claims are not valid across all directions, and greater caution in the interpretation and discussion is warranted. Regarding the peak acceleration time, I would be hesitant to draw firm conclusions based on differences smaller than 10 ms (Figures R3 and 6D).

      The authors state in the rebuttal that the two hypotheses are competing. This is not accurate, as they are not mutually exclusive and could even vary as a function of movement direction. The abstract also claims that the data "refutes" strategic slowing, which I believe is too strong. The main issue is that, based on the authors' revised manuscript, the lack of quantitative agreement between the model and the data for the mass underestimation hypothesis is considered acceptable because a precise quantitative match is not expected, and the predictions overall agree for some (though not all) directions and phases (excluding post-in). That is reasonable, but by the same logic, the small differences between the model prediction and the strategic slowing hypothesis should not be taken as firm evidence against it, as the authors seem to suggest. In practice, I recommend a more transparent and cautious interpretation to avoid giving readers the false impression that the evidence is decisive. The mass underestimation hypothesis is clearly supported, but the remaining aspects are less clear, and several features of the data remain unexplained.

      We thank the reviewer for this critical assessment. We acknowledge that our previous framing was too binary, and we agree that strategic slowing and mass underestimation are not mutually exclusive. We would like to clarify our view: we did not find evidence supporting strategic slowing (e.g., slower reaction times, symmetric velocity/acceleration peaks), whereas we did find evidence supporting mass underestimation (asymmetric peaks, unchanged reaction times, more sub movements). This is not a case of rejecting one hypothesis to affirm the other; our data simply do not support one while providing positive evidence for the other. We do not rule out the possibility that both mechanisms could operate together, though we note that our data did not reveal evidence supporting strategic slowing in the current reaching task.

      We also agree that the lack of significant timing changes at 45° limits the scope of our argument against strategic slowing in that direction. However, the null result at 45° likewise cannot serve as positive evidence for strategic slowing either. As discussed in our previous revision and in Discussion, this null effect may arise because 45° reaches are predominantly single-joint (evidenced by curvature patterns characteristic), making them less suitable for modeling with a simplified two-link arm model than the 90° and 135° directions.

      In line with these considerations, we have made the following revisions to the manuscript:

      (1) We have removed binary framing throughout, replacing claims of mutual exclusivity or outright rejection of strategic slowing with more measured language. For example, "refutes" in the abstract has been changed to "These findings provide support for the body mass underestimation hypothesis while being inconsistent with the strategic slowing hypothesis." The two hypotheses are no longer presented as mutually exclusive, and strategic slowing is now characterized as insufficient to fully explain the direction-dependent pattern, rather than ruled out entirely.

      (2) We have revised the conclusion. The concluding paragraph no longer presents an either-or outcome. We describe the direction-dependent under-actuation pattern, note that it strongly supports mass underestimation while not being readily explained by a uniform strategic adjustment, and acknowledge that other factors may also contribute. A new limitation paragraph discusses the simplified nature of our model and acknowledges that other neurophysiological and biomechanical factors cannot be excluded.

      Reviewer #2 (Public review):

      This study explores the underlying causes of the generalized movement slowness observed in astronauts in weightlessness compared to their performance on Earth. The authors argue that this movement slowness stems from an underestimation of mass rather than a deliberate reduction in speed for enhanced stability and safety.

      Overall, this is a fascinating and well-written work. The kinematic analysis is thorough and comprehensive. The design of the study is solid, the collected dataset is rare, and the model adds confidence to the proposed conclusions.

      Compared to the previous version, the authors have thoroughly addressed my concerns. The model is now clear and well-articulated, and alternative hypotheses have been ruled out convincingly. The paper is improved and suitable for publication in my opinion, making a significant contribution to the field.

      Strengths:

      Comprehensive analysis of a unique data set of reaching movement in microgravity

      Use of a sensible and well-thought experimental approach

      State-of-the-art analyses of main kinematic parameter

      Computational model simulations of arm reaching to test alternative hypotheses and support the mass underestimation one

      This work has no major weakness as it stands, and the discussion provides a fair evaluation of the findings and conclusions.

      We thank the reviewer for the supportive feedback, and we are grateful for the earlier comments that helped us improve the manuscript.

      Reviewer #3 (Public review):

      Summary:

      The authors describe an interesting study of arm movements carried out in weightlessness after a prolonged exposure to the so-called microgravity conditions of orbital spaceflight. Subjects performed radial point-to-point motions of the fingertip on a touch pad. The authors note a reduction in movement speed in weightlessness, which they hypothesize could be due to either an overall strategy of lowering movement speed to better accommodate the instability of the body in weightlessness or an underestimation of body mass. They conclude for the latter, mainly based on two effects. One, slowing in weightlessness is greater for movement directions with higher effective mass at the end effector of the arm. Two, they present evidence for increased number of corrective submovements in weightlessness. They contend that this provides conclusive evidence to accept the hypothesis of an underestimation of body mass.

      Strengths:

      In my opinion, the study provides a valuable contribution, the theoretical aspects are well presented through simulations, the statistical analyses are meticulous, the applicable literature is comprehensively considered and cited and the manuscript is well written.

      Weaknesses:

      I nevertheless am of the opinion that the interpretation of the observations leaves room for other possible explanations of the observed phenomenon, thus weakening the strength of the arguments.

      To strengthen the conclusions, I feel that the following points would need to be addressed:

      We thank the reviewer for the insightful critique and constructive suggestions. Following the reviewer's advice, we have re-framed our Introduction and Discussion to present mass underestimation as a plausible mechanism identified by our simplified model, while explicitly acknowledging other potential factors. Below we address each point in detail.

      (1) The authors model the movement control through equations that derive the input control variable in terms of the force acting on the hand and treating the arm as a second-order low pass filter (Eq. 13). Underestimation of the mass in the computation of a feedforward command would lead to a lower-than-expected displacement to that command. But it is not clear if and how the authors account for a potential modification of the time constants of the 2nd order system. The CNS does not effectuate movements with pure torque generators. Muscles have elastic properties that depend on their tonic excitation level, reflex feedback and other parameters. Indeed, Fisk et al.* showed variations of movement characteristics consistent with lower muscle tone, lower bandwidth and lower damping ratio in 0g compared to 1g. Could the variations in the response to the initial feedforward command be explained by a misrepresentation of the limbs damping and natural frequency, leading to greater uncertainty to the consequences of the initial command. This would still be an argument for un-adapted feedforward control of the movement, leading to the need for more corrective movements. But it would not necessarily reflect an underestimation of body mass.

      *Fisk, J. O. H. N., Lackner, J. R., & DiZio, P. A. U. L. (1993). Gravitoinertial force level influences arm movement control. Journal of neurophysiology, 69(2), 504-511.

      While the authors attempt to differentiate their study from previous studies where limb neuromechanical impedance was shown to be modified in weightlessness by emphasizing that in the current study the movements were rapid and the initial movement is "feedforward". But this incorrectly implies that the limb's mechanical response to the motor command is determined only by active feedback mechanisms. In fact:

      (a) All commands to the muscle pass through the motor neurons. These neurons receive descending activations related not only to the volitional movement, but also to the dynamic state of the body and the influence of other sensory inputs, including the vestibular system. A decrease in descending influences from the vestibular organs will lower the background sensitivity to all other neural influences on the motor neuron. Thus, the motor neuron may be less sensitive to the other volitional and reflexive synaptic inputs that it may receive.

      (b) Muscle tone plays a significant role in determining the force and the time course of the muscle contraction. In a weightless environment, where tonic muscle activity is likely to be reduced, there is the distinct possibility that muscles will react more slowly and with lower amplitude to an otherwise equivalent descending motor command, particularly in the initial moments before spinal reflexes come into play. These, and other neuronal mechanisms could lead to the "under-actuation" effect observed in the current study, without necessarily being reflective of an underestimation of mass per se.

      The reviewer raises an important point that the observed underactuation may not necessarily reflect mass underestimation per se. It could also arise from changes in the time constants of the control system, tonic muscle activation levels, vestibular descending inputs, or altered spinal reflex gains. We agree that our simplified model does not capture these neuromuscular factors, and we have made several revisions to address this concern.

      In the Discussion (paragraph 4), we have added a new substantive section discussing how reduced tonic muscle activity, diminished vestibular inputs to motor neurons, and altered muscle activation dynamics (Fisk et al., 1993) may contribute to the observed under-actuation independently of mass misestimation. We argue that while these factors likely affect motor output, they would be expected to produce a relatively uniform effect across movement directions, as tonic muscle activation and vestibular descending inputs are not specific to a particular reaching direction. In contrast, the direction-dependent pattern of our results with greater effects for directions involving higher effective mass is more naturally explained by a misrepresentation of inertial properties than by a uniform change in neuromuscular excitability. Nevertheless, we explicitly acknowledge that these mechanisms may act in concert with mass underestimation, and that our current data cannot fully disentangle them.

      Additionally, the paragraph discussing proprioceptive mechanisms (paragraph 6 of Discussion) now opens with the conditional framing "If mass underestimation contributes to the observed underactuation," and closes by noting that the same proprioceptive degradation could affect motor output through other pathways such as reducing tonic muscle activation or altering spinal reflex gains independent of any explicit misrepresentation of body mass.

      We have also added a new limitation (the fourth in the Limitations section) explicitly acknowledging that our model treats muscles as ideal torque generators and does not capture potential changes in muscle activation dynamics, damping, or reflex gains that may occur in microgravity. Future studies combining detailed musculoskeletal modeling with direct measurements of muscle activation, joint impedance, and trunk kinematics would be needed to distinguish between mass underestimation and other sources of underactuation.

      That said, the assumption of relatively preserved muscle properties is partly supported by the available evidence. A systematic review of simulated microgravity studies found that upper limb maximal voluntary contraction remained mostly unchanged for up to 45 days of unloading, and that upper limb muscles declined substantially more slowly than lower limb and trunk muscles (Winnard et al., 2019). A more recent review similarly reported that upper limb muscle outcomes are less affected by microgravity exposure (Bosutti et al., 2025). This is also consistent with our own unpublished observations in Chinese astronauts, which did not indicate an obvious decline in upper limb force output. While these findings do not rule out subtler changes in muscle tone or activation dynamics, they suggest that gross alterations in upper limb neuromuscular capacity are unlikely to be the primary driver of the underactuation we observed.

      Refs.

      Winnard, A., Scott, J., Waters, N., Vance, M., & Caplan, N. (2019). Effect of time on human muscle outcomes during simulated microgravity exposure without countermeasures—systematic review. Frontiers in physiology, 10, 1046.

      Bosutti, A., Ganse, B., Maffiuletti, N. A., Wüst, R. C., Strijkers, G. J., Sanderson, A., & Degens, H. (2025). Microgravity‐induced changes in skeletal muscle and possible countermeasures: What we can learn from bed rest and human space studies. Experimental Physiology.

      (2) The subject's body in weightless is much more sensitive to reaction forces in interactions with the environment in the absence of the anchoring effect of gravity pushing the body into the floor and in the absence of anticipatory postural adjustments that typically accompany upper-limb motions in Earth gravity in order to maintain an upright posture. The authors dismiss this possibility because the taikonauts were asked to stabilize their bodies with the contralateral hand. But the authors present no evidence that this was sufficient to maintain the shoulder and trunk at a strictly constant position, as is supposed by the simplified biomechanical model used in their optimal control framework. Indeed, a small backward motion of the shoulder would result in a smaller acceleration of the fingertip and a smaller extent of the initial ballistic motion of the hand with respect to the measurement device (the tablet), consistent with the observations reported in the study. Note that stability of the base might explain why 45º movements were apparently less affected in weightlessness, according to many of the reported analyses, including those related to corrective movements (Fig. 5 B, C, F; Fig. 6D), than the other two directions. If the trunk is being stabilized by the left arm, the same reaction forces on the trunk due to the acceleration of the hand will result in less effective torque on the trunk, given that the reaction forces act with a much smaller moment arm with respect to the left shoulder (the hand movement axis passes approximately through the left shoulder for the 45º target) compared to either the forward or rightward motions of the hand.

      The reviewer raises an important point about the potential influence of reaction forces on trunk and shoulder stability in microgravity. We have revised the relevant Discussion paragraph to address this concern more thoroughly.

      We would like to clarify that, in addition to stabilizing the body with the left hand grasping a fixed bar, the taikonauts’ feet were also constrained with foot straps, providing multi-point stabilization. Furthermore, the reviewer's trunk displacement hypothesis predicts that the 45° direction should be systematically less affected across all kinematic measures. However, while 45° did not show significant changes in the timing of kinematics peaks, it did show significant changes in movement duration, peak acceleration, and peak speed comparable to the other directions. This dissociation is difficult to reconcile with a uniform trunk displacement artifact, but is consistent with a direction-dependent inertial effect.

      We acknowledge that we did not directly measure trunk or shoulder kinematics, highlight that we did our best to provide multi-point stabilization in our setup, and we have added this as a limitation in the revised Discussion.

      (3) The above is exacerbated by potential changes in the frictional forces between the fingertip and the tablet. The movements were measured by having the subjects slide their finger on the surface of a touch screen. In weightlessness, the implications of this contact can be expected to be quite different than on the ground. While these forces may be low on Earth, the fact is that we do not know what forces the taikonauts used on orbit. In weightlessness, the taikonauts would need to actively press downward to maintain contact with the screen, while on Earth gravity will do the work. The tangential forces that resist movement due to friction might therefore be different in 0g. . Indeed, given the increased instability of the body and the increased uncertainty of movement direction of the hand, taikonauts may have been induced to apply greater forces against the tablet in order to maintain contact in weightlessness, which would in turn slow the motion of the finger on the table and increase the reaction forces acting on the trunk. This could be particularly relevant given that the effect of friction would interact with the limb in a direction-dependent fashion, given the anisotropy of the equivalent mass at the fingertip evoked by the authors

      We agree that in microgravity, taikonauts must actively press on the screen to maintain contact, potentially altering normal forces and thus friction compared to ground conditions. We have acknowledged this point in the revised Discussion. However, we note several reasons why friction is unlikely to be the dominant factor. First, the tablet uses a capacitive touchscreen, which registers touch through changes in electrical capacitance and does not require substantial normal force to maintain contact. Second, typical tangential friction forces during touchscreen interaction range from 0.1 to 0.5 N (Ayyildiz et al., 2018), which are small compared to the 10–15 N required to accelerate the arm during reaching. Third, touchscreen performance has been shown to be largely unaffected during long-duration spaceflight (Holden et al., 2022). Lastly but importantly, the friction hypothesis does not readily account for the direction-specific pattern of effects we observed. While we cannot exclude a contribution of altered friction, particularly in interaction with the direction-dependent effective mass, its magnitude makes it unlikely to account for the observed kinematic changes.

      Ref:

      Ayyildiz, M., Scaraggi, M., Sirin, O., Basdogan, C., & Persson, B. N. J. (2018). Contact mechanics between the human finger and a touchscreen under electroadhesion. Proceedings of the National Academy of Sciences of the United States of America, 115(50), 12668–12673.

      Holden, K., Greene, M., Vincent Cross, E., Sandor, A., Thompson, S., Feiveson, A., & Munson, B. (2023). Effects of long-duration microgravity and gravitational transitions on fine motor skills. Human Factors, 65(6), 1046-1058.

      I feel that the authors have done an admirable job of exploring the how to explain the modifications to movement kinematics that they observed on orbit within the constraints of the optimal control theory applied to a simplified model of the human motor system. While I fully appreciate the value of such models to provide insights into question of human sensorimotor behaviour, to draw firm conclusions on what humans are actually experiencing based only on manipulations of the computational model, without testing the model's implicit assumptions and without considering the actual neurophysiological and biomechanical mechanisms, can be misleading. One way to do this could be to examine these questions through extensions to the model used in the simulations (changing activation dynamics of the torque generators, allowing for potential motion backward motion of the shoulder and trunk, etc.). A better solution would be to emulate the physiological and biomechanical conditions on Earth (supporting the arm against gravity to reduce muscle tone, placing the subject on a moveable base that requires that the body be stabilized with the other hand) in order to distinguish the hypothesis of an underestimation of mass vs. other potential sources of under-actuation and other potential effects of weightlessness on the body.

      In sum, my opinion is that the authors are relying too much on a theoretical model as a ground truth and thus overstate their conclusions. But to provide a convincing argument that humans truly underestimate mass in weightlessness, they should consider more judiciously the neurophysiology and biomechanics that fall outside the purview of the simplified model that they have chosen. If a more thorough assessment of this nature is not possible, then I would argue that a more measured conclusion of the paper should be 1) that the authors observed modifications to movement kinematics in weightlessness consistent with an under-actuation for the intended motion, 2) that a simplified model of human physiology and biomechanics that incorporates principles of optimal control suggest that the source of this under-actuation might be an underestimation of mass in the computation of an appropriate feedforward motor command, and 3) that other potential neurophysiological or biomechanical effects cannot be excluded due to limitations of the computational model.

      We appreciate the reviewer's thoughtful assessment. We fully agree that a simplified computational model should not be treated as ground truth, and that the neurophysiology and biomechanics beyond the computational model must be carefully considered.

      As detailed in our responses above, we have substantially revised the Discussion to address each of these concerns—including new discussions of neuromuscular factors, more balanced treatment of trunk stability and friction, conditional framing of the mass underestimation interpretation, and a new limitation on model simplifications. The conclusion has been restructured following the reviewer's recommended framework.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      If possible and allowed, the authors are strongly encouraged to consider sharing this unique dataset. Making the data publicly available alongside the paper could foster future studies and further accelerate research in this area.

      We sincerely thank the reviewer for this suggestion. The ground control data and all analysis code will be made publicly available alongside the Version of Record.

      However, unfortunately, the raw in-flight data from the taikonaut cohort cannot be made publicly available due to confidentiality regulations of China's manned space program; access for scientific research requires approval from the China Astronaut Research and Training Center and can be requested through the corresponding author.

    1. eLife Assessment

      This study combines mathematical models and experimental data to analyse the emergence of heterogeneity within clonal NK cell responses during antigen-specific cell expansion. It comprises different experimental data and extensively explores various mathematical models, to study NK cell turnover during acute immune responses and homeostatic turnover within murine cytomegalovirus infection (MCMV). This solid study presents valuable findings and provides relevant insights on heterogeneous NK cell development.

    2. Reviewer #1 (Public review):

      Summary:

      The objective of this study was to infer the population dynamics (rates of differentiation, division and loss) and lineage relationships of NK cell subsets during an acute immune response and under homeostatic conditions.

      Strengths:

      A rich dataset and a detailed analysis of a particular class of stochastic models.

      Weaknesses: (relating to initial submission)

      The stochastic models used are quite simple; each population is considered homogeneous with first-order rates of division, death, and differentiation. In Markov process models such as these there is no dependence of cellular behavior on its history of divisions. In recent years models of clonal expansion and diversification, in the settings of T and B cells, have progressed beyond this picture. So I was a little surprised that there was no mention of the literature exploring the role of replicative history in differentiation (e.g. Bresser Nat Imm 2022), nor of the notion of family 'division destinies' (either in division number, or the time spent proliferating, as described by the Cyton and Cyton2 models developed by Hodgkin and collaborators; e.g. Heinzel Nat Imm 2017). The emerging view is that variability in clone (family) size arises may arise predominantly from the signals delivered at activation, which dictate each precursor's subsequent degree of expansion, rather than from the fluctuations deriving from division and death modeled as Poisson processes.

      As you pointed out, the Gerlach and Buchholz Science papers showed evidence for highly skewed distributions of family sizes, and correlations between family size and phenotypic composition. Is it possible that your observed correlations could arise if the propensity for immature CD27+ cells to differentiate into mature CD27- cells increases with division number? The relative frequency of the two populations would then also be impacted by differences in the division rates of each subset - one would need to explore this. But depending on the dependence of the differentiation rate on division number, there may be parameter regimes (and timepoints) at which the more differentiated cells can predominate within large clones even if they divide more slowly than their immature precursors. One might not then be able to rule out the two-state model. I would like to see a discussion or rebuttal of these issues.

      Comments on revised version.

      I am happy with the latest revisions that the authors have made.

    3. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The objective of this study was to infer the population dynamics (rates of differentiation, division and loss) and lineage relationships of NK cell subsets during an acute immune response and under homeostatic conditions.

      Strengths:

      A rich dataset and a detailed analysis of a particular class of stochastic models.

      Weaknesses: (relating to initial submission)

      The stochastic models used are quite simple; each population is considered homogeneous with first-order rates of division, death, and differentiation. In Markov process models such as these there is no dependence of cellular behavior on its history of divisions. In recent years models of clonal expansion and diversification, in the settings of T and B cells, have progressed beyond this picture. So I was a little surprised that there was no mention of the literature exploring the role of replicative history in differentiation (e.g. Bresser Nat Imm 2022), nor of the notion of family 'division destinies' (either in division number, or the time spent proliferating, as described by the Cyton and Cyton2 models developed by Hodgkin and collaborators; e.g. Heinzel Nat Imm 2017). The emerging view is that variability in clone (family) size arises may arise predominantly from the signals delivered at activation, which dictate each precursor's subsequent degree of expansion, rather than from the fluctuations deriving from division and death modeled as Poisson processes.

      As you pointed out, the Gerlach and Buchholz Science papers showed evidence for highly skewed distributions of family sizes, and correlations between family size and phenotypic composition. Is it possible that your observed correlations could arise if the propensity for immature CD27+ cells to differentiate into mature CD27- cells increases with division number? The relative frequency of the two populations would then also be impacted by differences in the division rates of each subset - one would need to explore this. But depending on the dependence of the differentiation rate on division number, there may be parameter regimes (and timepoints) at which the more differentiated cells can predominate within large clones even if they divide more slowly than their immature precursors. One might not then be able to rule out the two-state model. I would like to see a discussion or rebuttal of these issues.

      Comments on revisions:

      (1) The authors have put in a lot of effort to address the reviews and have explored alternative models carefully.

      We appreciate the reviewers’ comments.

      (2) In the sections relating to homeostasis and the endogenous response, as far as I can tell you are estimating net growth rates (the k parameters) throughout - this is to be expected if you're working with just cell numbers and no information relating to proliferation. In these sections there are many places where you refer to proliferation rates and death rates when I think you just mean net positive or net negative growth rates. It's important to be precise about this even if the language can get a bit repetitive. (These net rates of growth or loss relate to clonal rather than cellular dynamics, which may be worth explaining). Later, you do use data relating to dead cells, which in principle can be used to get independent measures of death rates, but these data were not used in the fitting.

      We have modified the main text to address the comment.

      (3) There is so much evidence that T and B cell differentiation are often contingent on division that it would be very reasonable to consider it as a possibility for NK cells too. (Differentiation could be asymmetric, as you explored, or simply symmetric with some probability per division). These processes can be cast into simple ODE models but no longer allow you to aggregate division and death rates - so for parameter estimation you need to add measures of proliferation (Ki67 or similar) or death. This may be worth some discussion?

      We have modified the main text (lines 242-245) to address the comment.

      Reviewer #2 (Public review):

      Summary:

      Wethington et al. investigated the mechanistic principles underlying antigen-specific proliferation and memory formation in mouse natural killer (NK) cells following exposure to mouse cytomegalovirus (MCMV), a phenomenon predominantly associated with CD8+ T cells. Using a stochastic modeling approach, the authors aimed to develop a quantitative model of NK cell clonal dynamics during MCMV infection. Starting from a single immature Ly49+CD27+ NK cell, a two-state linear model (with a death variant) explained the negative correlation between clone size at 8 dpi and the CD27+ fraction, but failed to reproduce the first and second moments of CD27+ and CD27− NK cell populations at 8 dpi. To address this limitation, the authors added an intermediate maturation state, yielding a three-stage model (CD27+Ly6C− → CD27−Ly6C− → CD27−Ly6C+) that fits the first and second moments under two constraints: CD27+ NK cells proliferate faster than CD27− NK cells, and clone size is negatively correlated with the CD27+ fraction (upper bound of −0.2). The model predicts high proliferation in the intermediate state and high death in mature CD27−Ly6C+ cells, and it was validated using Adams et al. (2021) NK reporter mice tracking CD27+/− populations after tamoxifen, allowing discrimination between bone marrow-derived and pre-existing peripheral NK cells. To test the prediction that mature CD27− NK cells have a higher death rate, the authors measured Ly49H+ NK cell viability in the mouse spleen at different time points post-MCMV infection. Data confirmed lower viability of mature (CD27−) than immature (CD27+) cells during days 4-8 post-infection, and a model variant supported that higher CD27− death increases their proportion in the dead cell compartment. Altogether, the authors propose a three-stage quantitative model of antigen-specific expansion and maturation of naïve Ly49H+ NK cells with the trajectory CD27+Ly6C− (immature) → CD27−Ly6C− (mature I) → CD27−Ly6C+ (mature II), highlighting high proliferation in the mature I state and increased death in the mature II state.

      Strengths:

      Models explaining correlations and first and second moments, supported by analytical investigations, stochastic simulations, and model selection, identify key processes in antigen-specific NK expansion and maturation. The work distinguishes expansion, contraction, and memory in NK cells from CD8+ T cells and informs NK therapy development.

      Weaknesses (relating to initial submission):

      The conclusions of this paper are largely supported by the available data. However, a comparative analysis with more recent works in the field would be desirable. Clarifications:

      (1) Initial Conditions and Grassmann Data: The Grassmann data is used solely as a constraint, while the simulated values of CD27+/CD27− cells could have been directly fitted to the Grassmann data, which assumes a 1:1 ratio of CD27+/CD27− at t = 0. This would allow an alternative initial condition rather than starting from a single CD27+ cell.

      (2) Correlation Coefficients in the Three-State Model: Although the parameter scan of the three-stage model (Figure 2) demonstrates the potential for negative correlations between colony size and the fraction of CD27+ cells, the calculated correlation coefficients using the fitted parameter values are not shown. Including these would validate that the fitted parameters lie in the negative-correlation regime.

      (3) Viability Dynamics and Adaptive Response: The authors measured the time evolution of CD27+/− dynamics and viability over 30 days post-infection (Figure 4). It would be valuable to test whether the three-state model can reproduce the adaptive response of CD27− cells to MCMV infection, particularly the observed drop in CD27− viability at 5 dpi and its rebound at 8 dpi. Demonstrating this would test whether the model can simultaneously explain viability dynamics and moment dynamics, and would enable sensitivity analysis of CD27− viability with respect to model parameters.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Minor points:

      (1) line 175 - Here I think you have only ruled out the two state model with no death, and not the two state model in general?

      Edited the sentence to address the comment.

      (2) Figures 2 and 5 - the phenotypes (CD27+ Ly6C-, etc.) should be clearly labeled above each cell type. Fig 1 could be improved in the same way.

      Done.

    1. eLife Assessment

      By investigating spine nanostructure and dynamics across multiple genetic mouse models for neurodevelopmental disorders, this important study has the potential to uncover convergent or divergent synaptic phenotypes that may be specifically associated with autism versus schizophrenia risk. The imaging and overall breadth of the methods are convincing. The purely in vitro nature of the study slightly limits the generalisability of the findings.

    2. Reviewer #1 (Public review):

      Summary:

      Kashiwagi et al. undertook a population analysis of dendritic spine nanostructure applied to the objective grouping of 8 mouse models of neuropsychiatric disorders. They report that spine morphology in cultured hippocampal neurons shows a higher similarity among schizophrenia mouse models (compared with autism spectrum disorder (ASD) mouse models) and identify an effect of Ecrg4 (encoding small secretory peptides) on spine dynamics and shape in these models.

      Strengths:

      The study developed a method for objectively comparing spine properties in primary hippocampal neuron cultures from 8 mouse models of psychiatric disorders at the population level using high-resolution structured illumination microscopy (SIM) imaging. This novel technique identified two distinct groups of mouse models according to the population-level spine properties: those with ASD-related gene mutations and those with schizophrenia-related gene mutations. Functional studies, including gene knockdown and overexpression experiments, identified an effect of Ecrg4 on the spine phenotype of the schizophrenia model mice.

      Weaknesses:

      The main weakness is that the study is wholly in vitro, using cultured hippocampal neurons. The authors present this as an advantage, however, arguing that spine morphology as measured in a reduced culture system can demonstrate direct effects of gene mutations on neuronal phenotypes in the absence of indirect influences from nonneuronal cells or specific environments.

    3. Reviewer #2 (Public review):

      Okabe and colleagues build on a super-resolution-based technique they have previously developed in cultured hippocampal neurons, improving the pipeline and using it to analyze spine nanostructure differences across 8 different mouse lines with mutations in autism or schizophrenia (Sz) risk genes/pathways. It is a worthy goal to try to use multiple models to examine potential convergent (or not) phenotypes, and the authors have made a good selection of models. They identify some key differences between the autism versus the Sz risk gene models, primarily that dendritic spines are smaller in Sz models and (mostly) larger in autism risk gene models. They then focus on three models (2 Sz - 22q11.2 deletion, Setd1a; 1 ASD - Nlgn3) for timelapse imaging of spine dynamics, and together with computational modelling provide a mechanistic rationale for the smaller spines in Sz risk models. Bulk RNA sequencing of all 8 model cultures identifies several differentially expressed genes which they go on to test in cultures, finding that ecgr4 is upregulated in several Sz models and its misexpression recapitulates spine dynamics changes seen in the Sz mutants, while knockdown rescues spine dynamics changes in the Sz mutants. Overall, these have the potential to be very interesting findings and useful for the field. My major concerns from the initial manuscript, especially regarding cherry picking and circularity have been addressed with revised analytical approaches. I have some remaining minor comments.

      (1) The comparison between two wild-type samples versus wild-type-mutant samples is helpful - I think this could be added to the manuscript.

      (2) For results of timelapse imaging - please spell out in the results section the direction of change (lines 270 - 277).

      (3) Using linear mixed effect models for statistical analysis is a significant improvement. While a sample size (n) of mice = 3 is not ideal, I think given the multiple different mouse lines used and intensity of analysis, this is probably the best that can be done, although further validation in larger samples eventually is to be hoped for.

      (4) The revised text is much improved, but I still think the authors should be upfront somewhere in the text that the schizophrenia-associated genes can only confer biased risk for schizophrenia (and that the clinical phenotype can also include autism). As I said before, I think this is the best we can do and I agree with their choices, but it is important not to overstate the link. The differences they see make it clear that these are still relevant distinctions.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Kashiwagi et al. undertook a population analysis of dendritic spine nanostructure applied to the objective grouping of 8 mouse models of neuropsychiatric disorders. They report that spine morphology in cultured hippocampal neurons shows a higher similarity among schizophrenia mouse models (compared with autism spectrum disorder (ASD) mouse models), and identify an effect of Ecrg4 (encoding small secretory peptides) on spine dynamics and shape in these models.

      Strengths:

      The study developed a method for objectively comparing spine properties in primary hippocampal neuron cultures from 8 mouse models of psychiatric disorders at the population level using high-resolution structured illumination microscopy (SIM) imaging. This novel technique identified two distinct groups of mouse models according to the population-level spine properties: those with ASD-related gene mutations and those with schizophreniarelated gene mutations. Functional studies, including gene knockdown and overexpression experiments, identified an effect of Ecrg4 on the spine phenotype of the schizophrenia model mice.

      We thank the reviewer for finding our strategy novel and useful for identifying molecules associated with the spine phenotype in schizophrenia-related mouse models.

      Weaknesses:

      The main weakness is that the study is wholly in vitro, using cultured hippocampal neurons. The authors present this as an advantage, however, arguing that spine morphology as measured in a reduced culture system can demonstrate direct effects of gene mutations on neuronal phenotypes in the absence of indirect influences from non-neuronal cells or specific environments.

      We appreciate this reviewer's concern about the limitation of cultured hippocampal neurons in extracting disease-related spine phenotypes. While we fully recognize this limitation, we consider that this in vitro system has several advantages that contribute to translational research on mental disorders.

      First, our culture system has been shown to support the development of spine morphology similar to that of the hippocampal CA1 excitatory synapse in vivo. High-resolution imaging techniques confirmed that the in vitro spine structure was highly preserved compared with in vivo preparations (Kashiwagi et al., Nature Communications, 2019). The present study used the same culture system and SIM imaging. Therefore, the difference we detected in samples derived from disease models is likely to reflect impairment of molecular mechanisms underlying native structural development in vivo.

      Second, super-resolution imaging of thousands of spines in tissue preparations under precisely controlled conditions cannot be practically applied using currently available techniques. The advantage of our imaging and analytical pipeline is its reproducibility, which enabled us to compare the spine population data from eight different mouse models without normalization.

      Third, a reduced culture system can demonstrate the direct effects of gene mutations on synapse phenotypes, independent of environmental influences. This property is highly advantageous for screening chemical compounds that rescue spine phenotypes. Neuronal firing patterns and receptor functions can also be easily controlled in a culture system. The difference in spine structure between ASD- and schizophrenia-related mouse models is valuable information to establish a drug screening system.

      Fourth, establishing an in vitro system for evaluating synapse phenotypes could reduce the need for animal experiments. Researchers should be aware of the 3Rs principles. In the future, combined with differentiation techniques for human iPS cells, our in vitro approach will enable the evaluation of disease-related spine phenotypes without the need for animal experiments. The effort to establish a reliable culture system should not be eliminated.

      We modified our text to have a balanced discussion on both advantages and disadvantages of the in vitro culture system in the study of mental disorder mouse models, as follows:

      "Finally, while the spine phenotype identified in the human postmortem brain undoubtedly resulted from complex interactions among genetic background, environmental influences, and regulation by non-neuronal cells, data from pure neuronal cultures are more likely to reflect the direct effects of schizophrenia-related gene mutations on synaptic functions. This property may be advantageous for identifying synaptic molecules that regulate synapse phenotypes in schizophrenia-related mouse models. However, the phenotype observed in the culture system requires confirmation using in vivo experiments of mouse models or human tissue samples. Efficient in vitro screening combined with reliable in vivo evaluation of synapses will facilitate translational research on mental disorders."

      Another weakness is that CaMKIIαK42R/K42R mutant mice are presented as a schizophrenia model, the authors justifying this by saying that "CaMKII-related signaling pathway disruption has been implicated in the working memory deficits found in schizophrenia patients". Since mutations in CAMK2A cause autosomal dominant intellectual developmental disorder-53 (OMIM 617798) and autosomal recessive intellectual developmental disorder-63 (OMIM 618095), and mice carrying the CAMK2A E183V mutation exhibit ASD-related synaptic and behavioral phenotypes (PMID: 28130356), I think it's stretching credibility to refer to the CaMKIIαK42R/K42R mice as a schizophrenia model.

      We agree with this reviewer that CAMK2A mutations in humans are linked to multiple mental disorders, including developmental disorders, ASD, and schizophrenia. Association of gene mutations with the categories of mental disorders is not straightforward, as the symptoms of these disorders also overlap with each other. For the CaMKIIα K42R/K42R mutant, we considered the following points in its characterization as a model of mental disorder. Analysis of CaMKIIα +/- mice in Dr. Tsuyoshi Miyakawa's lab has provided evidence for the reduced CaMKIIα in schizophrenia-related phenotypes (Yamasaki et al., Mol Brain 2008; Frankland et al., Mol Brain Editorial 2008). It is also known that the CaMKIIα R8H mutation in the kinase domain is linked to schizophrenia (Brown et al., 2021). Both CaMKIIα R8H and CaMKIIα K42R mutations are located in the N-terminal domain and eliminate kinase activity. On the other hand, the representative CaMKIIα E183V mutation identified in ASD patients exhibits unique characteristics, including reduced kinase activity, decreased protein stability and expression levels, and disrupted interactions with ASD-associated proteins such as Shank3 (Stephenson et al., 2017). Importantly, reduced dendritic spines in neurons expressing CaMKIIα E183V is a property opposite to that of the CaMKIIα K42R/K42R mutant, which showed increased spine density (Koeberle et al. 2017).

      References related to this discussion.

      (1) Yamasaki et al., Mol Brain. 2008 DOI: 10.1186/1756-6606-1-6

      (2) Frankland et al. Mol Brain. 2008 DOI: 10.1186/1756-6606-1-5

      (3) Stephenson et al., J Neurosci. 2017 DOI: 10.1523/JNEUROSCI.2068-16.2017

      (4) Koeberle et al. Sci Rep. 2017 DOI: 10.1038/s41598-017-13728-y

      (5) Brown et al., iScience. 2021 DOI: 10.1016/j.isci.2021.103184

      We fully agree with the reviewer that different CAMK2A mutations likely cause distinct phenotypes observed in the broad spectrum of mental disorders. In the revised manuscript, we include a discussion of the relevant literature to categorize this mouse model appropriately.

      "CaMKII-related signaling pathway disruption has been implicated in the working memory deficits found in schizophrenia patients [45,46]. CAMK2A mutations in humans are linked to multiple mental disorders, including developmental disorders, ASD, and schizophrenia [47]. The K42R mutation of CAMK2A does not correspond to any known human genetic variant, but the CAMK2A R8H mutation is linked to schizophrenia [48]. Both R8H and K42R mutations in the N-terminal domain of CaMKIIα eliminate kinase activity; these mutations may have a similar impact on human mental disorders."

      Although the manuscript is largely well written, there are some instances of ambiguous/unspecific language. This extends to the title (Decoding Spine Nanostructure in Mental Disorders Reveals a Schizophrenia-1 Linked Role for Ecrg4), which gives no indication that the work was in vitro on cultured neurons derived from mouse models.

      We appreciate the reviewer for pointing out the lack of information about the experimental system in the title of this manuscript. According to the suggestion of the reviewer, we modified the title as "Decoding spine nanostructure in cultured neurons derived from mouse models of mental disorder reveals a schizophrenia-linked role for Ecrg4".

      Reviewer #2 (Public review):

      Okabe and colleagues build on a super-resolution-based technique that they have previously developed in cultured hippocampal neurons, improving the pipeline and using it to analyze spine nanostructure differences across 8 different mouse lines with mutations in autism or schizophrenia (Sz) risk genes/pathways. It is a worthy goal to try to use multiple models to examine potential convergent (or not) phenotypes, and the authors have made a good selection of models. They identify some key differences between the autism versus the Sz risk gene models, primarily that dendritic spines are smaller in Sz models and (mostly) larger in autism risk gene models. They then focus on three models (2 Sz - 22q11.2 deletion, Setd1a; 1 ASD - Nlgn3) for time-lapse imaging of spine dynamics, and together with computational modelling provide a mechanistic rationale for the smaller spines in Sz risk models. Bulk RNA sequencing of all 8 model cultures identifies several differentially expressed genes, which they go on to test in cultures, finding that ecgr4 is upregulated in several Sz models and its misexpression recapitulates spine dynamics changes seen in the Sz mutants, while knockdown rescues spine dynamics changes in the Sz mutants. Overall, these have the potential to be very interesting findings and useful for the field. However, I do have a number of major concerns.

      We thank the reviewer for evaluating our findings as potentially very interesting and useful.

      (1) The main finding of spine nanostructure changes is done by carrying out a PCA on various structural parameters, creating spine density plots across PC1 and PC2, and then subtracting the WT density plot from the mutant. Then, spines in the areas with obvious differences only are analyzed, from which they derive the finding that, for example, spine sizes are smaller. However, this seems a circular approach. It is like first identifying where there might be a difference in the data, then only analyzing that part of the data. I welcome input from a statistician, but to me, this is at best unconventional and potentially misleading. I assume the overall means are not different (although this should be included), but could they look at the distribution of sizes and see if these are shifted?

      We appreciate the reviewer's concern regarding our analysis of spine population data. The intention of pre-selecting the areas showing differences between wild-type and mutant was to make a direct comparison between two subareas (one is enriched with wild-type spines and the other is enriched with mutant spines) and clarify that the spines of schizophreniarelated mouse models were smaller than wild-type spines. Conventional methods of comparing the total spine population using simple size parameters are not useful for this purpose, as shown in Supplementary Figure 2.

      To clarify the reviewer's concern, we revised the analysis of the spine population data for both Figure 3 and Figure 8.

      Figure 3: We first divided the feature space projected onto PC1 and PC2 into four areas with distinct structural properties: (1) small and short, (2) small and long, (3) large and short, and (4) large and long. Next, we calculated the normalized spine counts in the four areas for both wild-type and mutant spines and obtained the relative ratio (mutant/wild-type) for each area. As we performed three independent SIM imaging experiments (in one, we imaged both wild type and mutant culture dishes prepared from the same pregnant mouse), there are three independent datasets from 8 mouse models.

      We found that the spine ratio (mutant/wild-type) only in area 2 (small and long spines) differed significantly between genotypes. This result is shown in Fig. 3 and explained in the text. The spine ratios in areas 1 and 3 did not show a clear relationship to the genotypes, while the ratio in area 4 showed the opposite trend to that in area 2. The opposite trend between areas 2 and 4 indicates enrichment of both small and long spines in schizophrenia-related mouse models, consistent with our previous analysis.

      Figure 8: In this analysis, we aimed to evaluate the rescue effect of Ecrg4 shRNA relative to that of control shRNA. If Ecrg4 shRNA is effective, the spine population enriched in the control shRNA condition should be reduced in the Ecrg4 shRNA condition. To confirm this point in the revised manuscript, we first defined areas in the projected PC1-PC2 plane showing either enrichment or depletion of spines in the control shRNA condition (spine numbers increasing or decreasing by more than 3 × SD). We next measured the difference in spine numbers between the control and Ecrg4 shRNA conditions in either enriched or depleted areas. The expectation is that Ecrg4 shRNA treatment reduces the extent of both enrichment and depletion. The effect was significant in both the 22qdel and Setd1a mouse models, as indicated by permutation tests. This analysis was explained in the revised manuscript.

      (2) Despite extracting 64 parameters describing spine structure, only 5 of these seemed to be used for the PCA. It should be possible to use all parameters and show the same results. More information on PC1 and PC2 would be helpful, given that the rest of the paper is based on these - what features are they related to?

      We thank the reviewer for the advice on providing the rationale for parameter selection in PCA. We divided spines into 160-nm segments along their long axis, and the spine segments were used to calculate the 64 parameters, which include volume of each spine segment (20 segments), convex hull volume of each spine segment (20 segments), and convex hull ratio of each spine segment (20 segments). As most spines are shorter than 0.16 × 20 =3.2 μm, these segment-related parameters contain a large fraction of zero values, which affect the proper calculation of principal components. Therefore, we selected two parameters that reflect the principal structural features (length and volume), together with three other parameters that were mutually independent and also independent from the first two parameters (pairwise correlation coefficients < 0.3). These selection criteria were described in the original manuscript. We also confirmed that PCA using all 64 parameters yields a cross correlation map similar to that shown in Fig. 2B.

      Author response image 1.

      We provided additional information in the Materials and Methods section of the revised manuscript.

      As described previously, the pattern of four areas with distinct spine structures (1. small and short, 2. small and long, 3. large and short, 4. large and long) supports the idea that the PC1PC2 plane reflects the relationship between spine volume and length (Fig. 3A and B).

      These specific features could then be analyzed in the full dataset, without doing the cherry picking above.

      We provided the dataset for the relative enrichment of spine counts across four areas of the PC1-PC2 plane in Fig. 3A and B. This analysis provides a comprehensive view of spine population properties related to spine volume and length, without relying on a pre-set region of interest.

      It would also be helpful to demonstrate whether PC1 and 2 differ across groups - for example, the authors could break their WT data into 2 subsets and repeat the analysis.

      We noticed differences in the pattern of spine distribution across the PC1-PC2 planes in each experiment. The subtraction of the distributional data between wild-type and mutant samples effectively cancels out such differences. In general, the difference between two wild-type samples is smaller than that between wild-type and mutant samples, as shown in Author response image 2.

      Author response image 2.

      We added a description of variation across groups to the revised manuscript.

      (3) Throughout the paper, the 'n' used for statistical analysis is often spine, which is not appropriate. At a minimum, cell should be used, but ideally a nested mixed model, which would take into account factors like cell, culture, and animal, would be preferable. Also, all of these factors should be listed, with sufficient independent cultures.

      We agree that nested mixed models are more appropriate for evaluating genotype effects in most of our datasets. We confirm that the results of statistical analysis using nested mixed models were consistent with our previous conclusions in most cases.

      Figure 3: We performed three independent primary cultures of embryonic hippocampal tissue with genotypes of both wild-type and mutant from the same pregnant mice for each mouse model. In our new Figure 3, each data point represents an independent culture experiment, and group comparisons were performed using one-way ANOVA followed by Tukey's post hoc test. In this analysis, statistical analysis using neurons as units of 'n' is not possible, as the number of spines measured from a single neuron is insufficient to generate the density map shown in Figure 3. The statistical analysis was described in the revised text. The details of experimental conditions related to Figure 3 are provided in Supplementary Table 1.

      Figure 5A-C: We analyzed spine turnover rate using a linear mixed-effects model with genotype as a fixed effect and plate, cell, and dendrite as nested random effects. In both 22q deletion model and Setd1a model, there were significant effects of genotype (F(1,25) = 5.79, p = 0.024 for 22q deletion model and F(1,22) = 7.33, p = 0.013 for Setd1a model). In contrast, Nlgn3 mutant neurons did not show a significant difference (F(1,14) = 1.35, p = 0.26). This analysis was described in the revised text.

      Figure 5D-F: Spine lifetime was analyzed using a linear mixed-effects model accounting for the hierarchical structure of the data (spines nested within dendrites, cells, and culture plates). The analysis revealed a significant effect of genotype in both 22q deletion mutant and Setd1a mutant (22qdel mutant; F(1,336) =5.33, p=0.022, Setd1a mutant; F(1,282)=6.38, p=0.012 ). The neurons of both mutants exhibited significantly longer spine lifetimes compared with wild-type neurons (22qdel mutant; ratio = 1.28, 95% CI 1.04–1.58, Setd1a mutant; ratio = 1.35, 95% CI 1.07–1.70). In contrast, Nlg3 mutation did not significantly alter spine lifetime (ratio = 0.86, 95% CI 0.61–1.22; F(1,220)=0.69, p=0.41). This analysis was described in the revised text.

      Figure 5G-I: Spine volume trajectories were analyzed using linear mixed-effects models incorporating nested random effects (spine/dendrite/cell/culture plate) to account for the hierarchical structure of the data. In the 22q deletion model, newly formed spines were significantly smaller than those in wild-type neurons (genotype effect: p < 0.001). The spines in Setd1a mutant neurons also displayed significantly smaller volume than those in wild-type neurons (p < 10<sup>-7</sup>). There were also differences in the temporal profiles of spine growth in these two mutants (p < 0.001). In contrast, newly formed spines in the Nlgn3 mutant neurons were significantly larger than those in wild-type neurons (p < 10<sup>-4</sup>) with preserved time-course of spine growth. This analysis was described in the revised text.

      Figure 5J-L: Similar analyses using linear mixed-effects models incorporating nested random effects (spine within dendrite within cell within culture plate) identified significantly smaller initial spine size in the 22q deletion model (p < 10<sup>⁻6</sup>), while no significant differences in the initial spine volume were found for Setd1a mutants. The temporal trajectories of spine shrinkage before their loss were also not significantly altered in both 22qdel and Setd1a mutants. The Nlg3 mutant showed a significantly different time-course of spine shrinkage (p < 0.05), while the initial spine size was not altered. This analysis was described in the revised text.

      Figure 7A overexpression dataset: We analyzed plate-averaged lifetime values using a linear mixed-effects model with treatment as a fixed effect. There exists a significant main effect of treatment (F(3,8) = 4.59, p = 0.038), with post hoc examination showing a significant increase in lifetime by Ecrg4 overexpression (β = 0.49 ± 0.16 SE, t(8) = 3.16, p = 0.013). Figure 7A shRNA dataset: We also applied a linear mixed-effects model for plate-averaged lifetime values with treatment as a fixed effect. The analysis revealed no significant effect of treatment (F(2,6) = 0.29, p = 0.76).

      The analyses of overexpression and shRNA datasets were described in the revised text.

      Figure 8: As in Figure 3, we performed three independent primary cultures of embryonic hippocampal tissue with genotypes of both wild-type and mutant from the same pregnant mice for each mouse model. The culture plates were transfected with either a control shRNA or an Ecrg4 shRNA construct. Each data point represents an independent culture experiment, and the effect of Ecrg4 shRNA relative to that of control shRNA was evaluated using a permutation test. The data analysis was described in the revised text. The details of experimental conditions related to Figure 8 are provided in Supplementary Table 1.

      (4) The authors should confirm that all mutants are also on the C57BL/6J background, and clarify whether control cultures are from littermates (this would be important). Also, are control versus mutant cultures done simultaneously? There can be significant batch effects with cultures.

      The mutant mice we used in this study are on C57BL/6J or C57BL/6N background. It is known that C57BL/6J or C57BL/6N mice exhibit distinct phenotypes across a range of physiological, biochemical, and behavioral systems. However, it is less likely that our analysis is affected by differences between C57BL/6J and C57BL/6N, as we compared wild-type and mutant littermates on the same genetic background. This experimental design can also reduce the batch effects with different culture preparations. This point was described in the revised text.

      (5) The spine analysis uses cultures from 18-22 DIV - this is quite a large range. It would be worth checking whether age is a confounder or correlated with any parameters / principal components.

      We described in the method sections that culture samples were processed for imaging at 18-22 DIV. However, all the SIM imaging experiments for eight mutant mouse models were performed on samples fixed at DIV 19. The wide range of imaging experiments (DIV 18-22) includes test samples we used to optimize imaging conditions. In the revised manuscript, we specified the timing of SIM imaging.

      (6) The computational modelling is interesting, but again, I am concerned about some circularity. Parameter optimization was used to identify the best fit model that replicated the spine turnover rates, so it is somewhat circular to say that this matched the observations when one of these is the turnover rate.

      We appreciate the reviewer's comment on some circularity of the argument. We agree that the turnover rate is already incorporated into the simulation model and is not an appropriate criterion for the evaluation. We modified the text accordingly.

      It is more convincing for spine density and size, but why not go back and test whether parameter differences are actually seen - for example, it would be possible to extract the probability of nascent spine loss, etc.

      We thank the reviewer for giving this important suggestion. The probability of nascent spine loss is an important parameter, and we initially attempted to estimate it from the original data set. However, the upper limit of our time-lapse imaging is 24 h, which is insufficient to distinguish stable and nascent spines clearly. The difficulty of extracting all the necessary parameters for spine remodeling is our motivation for starting this computational modelling.

      More compelling would be to repeat the experiments and see if the model still fits the data. In the interpretation (line 314-318) it is stated that '... reduced spine maturation rate can account for the three key properties of schizophrenia-related spines...', which is interesting if true, but it has just been stated that the probability of spine destabilization is also higher in mutants (line 303) - the authors should test whether if the latter is set to be the same as controls whether all the findings are replicated.

      As suggested by the reviewer, we set the probability of spine destabilization equal across wild-type and mutant models and repeated the simulations. The results indicate that this modification has small effects on spine density (0.61 vs 0.62), spine turnover rate (0.22 vs 0.21), fraction of small spines (0.21 vs 0.20), and mean spine size (0.37 vs 0.36). We described this point in the revised manuscript.

      (7) No validation for overexpression or knockdown is shown, although it is mentioned in the methods - please include.

      As suggested by the reviewer, we validated overexpression and knockdown. The results are summarized in Supplementary Figure 8.

      Supplementary Figure 8A-C shows the immunocytochemistry of anti-Ecrg4, anti-Cip4, and anti-NPAS4 for the confirmation of overexpression of these molecules.

      Supplementary Figure 8D-E shows the confirmation of the appropriate size of exogenously expressed Ecrg4, Cip4, and NPAS4 by immunoblotting. (previous Supplementary Figure 10F is now Supplementary Figure 8E).

      Supplementary Figure 8F-H indicates the efficient knockdown of exogenously expressed Met-GFP, ARHGAP15-GFP, and Ecrg4-HA by respective shRNA constructs in COS-7 cells. (previous Supplementary Figure 10G is now Supplementary Figure 8H)

      Also, for the knockdown, a scrambled shRNA control would be preferable.

      We used Stealth RNAi Negative Control Duplexes (Invitrogen) as the shRNA control in this study. To confirm that this RNAi sequence does not affect spine turnover, we performed timelapse imaging of neurons transfected with GFP alone or with GFP and the Stealth RNAi Negative Control. No detectable change in spine turnover was observed (Supplementary Figure 8I), indicating that this RNAi control sequence is suitable for our study.

      (8) The finding regarding ecgr4 is interesting, but showing that some ecgr4 is expressed at boutons and spines and some in DCVs is not enough evidence to suggest that actively involved in the regulation of synapse formation and maturation (line 356).

      To reveal the active roles of Ecrg4 in spine regulation, we exogenously applied a synthetic Ecrg4 peptide to wild-type neurons and monitored both spine density and turnover rate after Ecrg4 application. The Ecrg4 application increased the spine turnover rate, whereas samples treated with the scrambled peptide did not. This result supports the active role of Ecrg4 in regulating spine turnover. The data were added as Supplementary Figures 9F and G.

      (9) The same caveats that apply to the analysis also apply to the ecgr4 rescue. In addition, while for 22q the control shRNA mutant vs WT looks vaguely like Figure 2, setd1a looks completely different.

      We thank the reviewer for pointing out the apparent difference in the pattern of spine population data between Figure 2 and Figure 8. We performed SIM analysis using DiI-labeled neurons in Figure 2, whereas the data in Figure 8 are derived from GFP-expressing neurons. The images of cell-surface labeling and cytoplasmic labeling cannot be analyzed in the same way, as it is necessary to adjust parameters in SIM image processing and PCA-based dimensional reduction. Consequently, the distribution of the spine population projected onto the PC1-PC2 plane differs between DiI-labeled neurons and GFP-expressing neurons. To facilitate the comparison of PCA analysis applied to GFP-expressing neurons, we replaced the weight matrix for GFP-expressing neurons with that previously calculated for the DiIlabeled neurons. This adjustment increased the similarity of the data distributions shown in Figures 2 and 8. The explanation for the different patterns in the spine population map between Figure 2 and Figure 8 was added to the revised text. The related explanation for the data processing was described in the Materials and Methods.

      And if rescued, surely shRNA in the mutant should now resemble control in WT, so there shouldn't be big differences, but in fact, there are just as many differences as comparing mutant vs wild-type? Plus, for spine features, they only compare mutant rescue with mutant control, but this is not ideal - something more like a 2-way ANOVA is really needed. Maybe input from a statistician might be useful here?

      We appreciate the reviewer's important comment and agree that the analytical approach used in the original manuscript was not optimal. We therefore revised our analysis to examine whether the difference observed between wild-type and mutant neurons was reduced by suppression of Ecrg4 expression.

      To this end, we first identified two regions in the PC1–PC2 plane where mutant spines were either enriched or depleted relative to wild-type neurons (Areas A and B). We then counted the number of spines located in Areas A and B in control shRNA-treated mutant neurons (normalized spine counts XA and XB). Next, we quantified spine counts in the same areas using data from Ecrg4-suppressed mutant neurons (normalized spine counts YA and YB). If XA > YA and XB < YB, suppression of Ecrg4 would indicate a shift toward rescue of the phenotype observed in control shRNA-treated mutant neurons. Indeed, the datasets were consistent with this shift in relative spine counts.

      To determine whether these differences exceeded those expected from random variation in spine counts, we performed a permutation test. Specifically, spine identities were randomly shuffled between the two conditions while preserving the total number of spines in each dataset. The observed differences were then compared with the distribution obtained from the permuted datasets to assess statistical significance.

      We found that all three culture replicates showed statistical significance in both areas A and B for both the 22qdel and Setd1a mutations. This analysis is described in the Result section.

      (10) Although this is a study entirely focused on spine changes in mouse models for Sz, there is no discussion (or citation) of the various studies that have examined this in the literature. For example, for Setd1a, smaller spines or reduced spine densities have been described in various papers (Mukai et al, Neuron 2019; Chen et al, Sci Adv 2022; Nagahama et al, Cell Rep 2020).

      We appreciate the reviewer's suggestion to include a discussion of schizophrenia-related mouse models. We added more information related to the Setd1a mouse model to the Discussion section.

      "Population-level spine properties were more homogeneous in schizophrenia models (those with gene mutations implicated in schizophrenia) than in the other 4 models studied, in part due to a shared tendency for smaller spines. This observation is consistent with previous studies on Setd1a mutant mice, which showed reduced spine width, decreased mushroomtype spines, and lower spine density in the prefrontal cortex [43,56,57]. In contrast to these findings, several previous studies reported reduced numbers of small spines in the postmortem cortical tissues of schizophrenia patients [22,58]. "

      (11) There is a conceptual problem with the models if being used to differentiate autism risk from Sz risk genes. It is difficult to find good mouse models for Sz, so the choice of 22q11.2del and Setd1a haploinsufficiency is completely reasonable. However, these are both syndromic. 22qdel syndrome involves multiple issues, including hearing loss, delayed development, and learning disabilities, and is associated with autism (20% have autism, as compared to 25% with Sz). Similarly, Setd1a is also strongly associated with autism as well as Sz (and also involves global developmental delay and intellectual disability). While I think this is still the best we can do, and it is reasonable to say that these models show biased risk for these developmental disorders, it definitely can't be used as an explanation for the higher variability seen in the autism risk models.

      We appreciate the reviewer's suggestion for more careful consideration of the interpretation of phenotypes in mouse models, with regard to their relation to clinical phenotypes in human patients. According to the suggestion of the reviewer, we modified the relevant text as follows:

      "The nanoscale features of dendritic spines in ASD-associated mouse models were more variable than those in schizophrenia-associated mouse models. This difference may be related to the broader clinical spectrum of ASD, which ranges from mild impairments in social skills to severe intellectual disability. The four ASD-associated mouse models examined in this study, Nlgn3<sup>R451C/(y or R451C) , Syngap1<sup>+/-</sup>, POGZ<sup>Q1038R/+</sup>, and 15q11-13<sup>dup/+</sup>, may represent subgroups with different levels of hippocampal dysfunction. Among the four ASD-associated mouse models, 15q11-13<sup>dup/+</sup> showed population-level spine properties closer to those of the schizophrenia models. To understand this similarity, further analysis of neural circuit changes in both ASD- and schizophrenia-associated mouse models will be necessary. Analysis of the relationships between rare genetic variants and synapse phenotypes in mouse models may contribute to their eventual categorization. This information should be useful to understand the underlying mechanisms of the broader clinical spectrum of ASD."

      (12) I am not convinced that using dissociated cultures is 'more likely to reflect the direct impact of schizophrenia-related gene mutations on synaptic properties' - first, cultures do have non-neuronal cells, although here glial proliferation was arrested at 2 days, glia will be present with the protocol used (or if not, this needs demonstrating).

      In our culture system, the density of non-neuronal cells is low, and most neurons are not in direct contact with non-neuronal cells. We reported this method in Nat. Neurosci. 1999, where we utilized this culture system to visualize GFP-tagged PSD-95 in neurons using recombinant adenovirus. Because recombinant adenovirus shows higher infection efficiency in glial cells, it was essential for us to establish a culture condition that isolates neurons from glial cells.

      Second, activity levels will affect spine size, and activity patterns are very abnormal in dissociated cultures, so it is very possible that spine changes may not translate into in vivo scenarios. Overall, it is a weakness that the dissociated culture system has been used, which is not to say that it is not useful, and from a technical and practical perspective, there are good justifications.

      We appreciate the reviewer's comment on the advantages and disadvantages of using an in vitro culture system. This comment aligns with the first reviewer's. We modified our text to have a balanced discussion on the role of the in vitro culture system in the study of mental disorder mouse models as follows:

      "Finally, while the spine phenotype identified in the human postmortem brain undoubtedly resulted from complex interactions among genetic background, environmental influences, and regulation by non-neuronal cells, data from pure neuronal cultures are more likely to reflect the direct effects of schizophrenia-related gene mutations on synaptic functions. This property may be advantageous for identifying synaptic molecules that regulate synapse phenotypes in schizophrenia-related mouse models. However, the phenotype observed in the culture system requires confirmation using in vivo experiments of mouse models or human tissue samples. Efficient in vitro screening combined with reliable in vivo evaluation of synapses will facilitate translational research on mental disorders."

      (13) As a minor comment, the spine time-lapse imaging is a strength of the paper. I wonder about the interpretation of Figure 5. For example, the results in Figure 5G and J look as if they may be more that the spines grow to a smaller size and start from a smaller size, rather than necessarily the rate of growth.

      We thank the reviewer for the insightful comment. In the revised manuscript, we analyze the time-lapse data using linear mixed-effects models incorporating nested random effects (spine/dendrite/cell/culture plate). This analysis suggested the difference in the initial size of spines. This point is described in the revised manuscript as follows:

      "Schizophrenia-associated mouse models showed higher similarity in spine morphology, driven by reduced size and growth of nascent spines."

      "We further compared the initial increase in spine volume between genotypes (Figure 5G-I). Linear mixed-effects models incorporating nested random effects revealed significantly smaller initial spine volumes in both 22q11.2<sup>del/+</sup> and Setd1a<sup>+/-</sup> models (genotype effect: p < 0.001 for 22q11.2<sup>del/+</sup> and p < 10<sup>-7</sup> for Setd1a<sup>+/-</sup>). The spines in both mutants also displayed a significant reduction in spine volume increase (p < 0.001). In contrast, newly formed spines in the Nlgn3<sup>R451C/(y or R451C)</sup> neurons were significantly larger than those in wild-type neurons (p < 10<sup>-4</sup>) with preserved time-course of spine growth.”

      We tested whether the initial size difference in spines can be incorporated into the computational simulation. However, due to the large variability in the initial spine size, it was difficult to perform parameter optimization in the model with additional factors. Therefore, we did not further pursue this possibility in this revision. This point is described in the revised text.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The manuscript would be strengthened if the following issues were adequately addressed:

      (1) It would be helpful to know more about the in/ex vivo dendritic spine phenotype of the mouse models of neuropsychiatric disorders, to allow readers to judge whether and how the in vitro spine phenotype in hippocampal neuronal cultures overlaps with/replicates the spine phenotype within the mouse brain.

      We appreciate this comment, but our currently available data is insufficient to specify the difference between in vitro and in vivo spine phenotypes. Our previous study, published in Nature. Comm. (2019), provided data showing that the overall distribution of spine size is similar between in vivo and in vitro conditions in the mouse hippocampus.

      (2) Although the manuscript is largely well written, there are instances of ambiguous language, particularly when describing the spine phenotypes. For example, we are told that "ASD mouse models showed a tendency of decreasing spine subpopulation with small volumes." This description and other examples should be expressed more clearly.

      Following the reviewer's suggestions, we revised the text to improve clarity. We modified the sentence "ASD mouse models showed a tendency of decreasing spine subpopulation with small volumes" to "ASD-related mouse models showed an opposite spine phenotype."To avoid possible confusion for readers, we have revised several sentences in the text to clarify the intended meaning.

      Also, I question whether the word "decoding", meaning to convert (a coded message) into intelligible language, is the most appropriate for the title and abstract.

      The original meaning of the word "decoding" is the conversion of a coded message into an intelligible form; however, in this study, we use the term in a broader sense, referring to the extraction of latent population-level properties of dendritic spines from multidimensional structural parameters. We believe this usage is consistent with its common use in neuroscience and systems biology, where "decoding" often refers to inferring underlying biological states or information from complex datasets.

      (3) The authors should reconsider whether CaMKIIαK42R/K42R mice should be described as a schizophrenia model, when mutations in CAMK2A are known to cause autosomal dominant intellectual developmental disorder-53 (OMIM 617798) and autosomal recessive intellectual developmental disorder-63 (OMIM 618095), and mice carrying the CAMK2A E183V mutation exhibit ASD-related synaptic and behavioral phenotypes (PMID: 28130356).

      We provided a detailed answer to this question in the previous part of the rebuttal.

      (4) The title doesn't adequately summarise the contents of the manuscript. It should mention mice/mouse models and cultured neurons.

      We also responded to this request in the previous part of the rebuttal.

      Reviewer #2 (Recommendations for the authors):

      (1) Please provide a supplementary table with all DEGs. Also, DEGs are listed if present in 'more than 2' models - does this mean they had to be in 3 or more? Please clarify.

      According to the reviewer's suggestion, we added data on DEGs shared by >2 mouse models in Supplementary Figure 7. We also added Supplementary Tables 2 and 3 for all DEGs. The phrase "in more than 2 models" means "in 3 or 4 models".

      (2) There are several references to 'schizophrenia mouse models' - it is worth rephrasing this to make clear that these are not mice with schizophrenia.

      We replaced the expression "schizophrenia (or ASD) mouse models" with "schizophrenia (or ASD)-associated mouse models" or similar appropriate wording throughout the manuscript.

      (3) Line 66: 'a recent...' - 2014 is not really recent.

      We removed the word "recent" from the sentence.

      (4) Figure S1: The legend says A-D, but they are not on the figure. Also, make clear whether this data is only WT data - it seems to be from disorder models, with 4 colors for each model - please clarify.

      We changed the sentence from "shown as A to D" to "shown as A to C". The datasets in Supplementary Figure 1 are wild-type only. Each graph uses four colors to represent wildtype data from four imaging datasets obtained from different mouse models. Graphs A to C correspond to spine length, surface area, and volume, respectively.

      (5) Methods, line 680-4: More detail here would be helpful.

      We added more explanation for the generation of subtraction maps.

      (6) Line 193: Make it clear this is hippocampal in the main text.

      We added "cultures of embryonic hippocampi" to the text.

      (7) Figure 5, D-F: Make clear that these are transient spines (as per main text)

      We added "Lifetimes of transient spines" to both the main text and figure legend.

      (8) Figure 6B: More detail is needed; no idea what this is - no axis label. D - also not clear what numbers on the y-axis mean. E - color scale??

      We added details to the figure legend, the axis labels for Figures 6B and 6D, and the color scale for Figure 6E.

      (9) Supplementary Figure 9 - not clear what matrices are actually showing, nor what the scale refers to - is this the number of shared DEGs? If so, please make it clearer.

      The matrices show the shared DEG numbers, as shown in their titles. The scale indicates DEG numbers. We added the explanation of the color code to the figure legend.

      (10) Please make clear in the main text that ecgr4 affected the turnover rate. It would be good to measure other parameters as well.

      We added the phrase "a significant increase in spine turnover rate by Ecrg4 overexpression" to the main text.

      (11) Figure 7: Suggest to label C on images as well, so obvious which is GFP/anti-HA overlay (and respective colors) and which is anti-HA staining.

      We added the labels with respective colors to Figure 7.

      (12) Ecgr4 is a precursor protein that is cleaved to produce several hormone-like peptides. Where is the HA tag - so which cleavage products will it label? Any antibodies that work in immunocytochem?

      HA tag was attached to the C-terminal domain. We predict that anti-HA binds to four cleavage products (the full-length Ecrg4, Augurin, Argilin, and Δ16). Among several commercially available antibodies, only the SIGMA product could detect cells expressing Ecrg4-HA by immunocytochemistry.

      (13) Supplementary Figure 10: Synaptosome would be a good addition.

      We isolated the fraction of synaptosomes using Syn-PER™ Synaptic Protein Extraction Reagent in Supplementary Figure 9A. We added this explanation to the Materials and Methods section.

    1. eLife Assessment

      This study by Roseby and colleagues shows that region-specific mechanosensation - especially anterior-dorsal inputs - controls larval self-righting, and links this to Hox gene function in sensory neurons. The work is important for understanding how body plan cues shape sensorimotor behaviour, and the experimental toolkit will be of use to others. The strength of evidence is compelling with respect to the assays developed and the involvement of the anterior region, the evidence is more limited with respect to the dorso-ventral organization of sensory inputs in that region and the mechanism by which Hox genes contribute to the process. These findings will be of broad interest to researchers studying neural circuits, developmental genetics, and the evolution of behaviour.

    2. Reviewer #1 (Public review):

      Summary:

      Roseby and colleagues report on a body region-specific sensory control of the fly larval righting response, a body contortion performed by fly larvae to correct their posture from an inverted (dorsal side down) position. This is an important topic because of the general need for animals to locomote in the correct orientation and the clever and broadly useful methodologies used in this paper to uncover the sensory triggers for the behavior, including a body region-specific optogenetic approach along different axial positions of the larva, region-specific manipulation of surface contacts with the substrate, and a 'water unlocking' technique to initiate righting behaviors, all strengths of the manuscript. The authors found that multidendritic neurons, particularly the daIV neurons, are necessary for righting behavior. The contribution of daIV neurons had been shown by the authors in a prior paper (Klann et al, 2021), but that study had used constitutive neuronal silencing. Here the authors used acute inactivation to confirm this finding. Additionally, the authors describe an important role for anterior sensory neurons. They move on to test the genetic basis for righting behavior and, consistent with the regional specificity they observe, implicate sensory neuron expression of Hox genes Antennapedia and Abdominal-b in self-righting.

      Strengths:

      Strengths of this paper include the important question addressed and the elegant and innovative combination of methods, which led to clear insights into the sensory biology of self-righting and links between body plan and nervous system function that will be useful for others in the field. The manuscript is very clearly written and couched in interesting biology.

      Limitations:

      There are several important questions for future study that, left unresolved, do not diminish the significance of this manuscript. These include the cellular and developmental basis for Hox gene action, the contributions of dorsal and ventral regions of the animal in righting, and the regional contributions of other sensory cell types in the righting response.

      Comments on revised version.

      The authors have addressed my major concerns.

    3. Reviewer #2 (Public review):

      Summary

      This work explores the relationship between body structure and behavior by studying self-righting in Drosophila larvae, a conserved behavior that restores proper orientation when turned upside-down. The authors first introduce a novel "water unlocking" approach to induce self-righting behavior in a controlled manner. Then, they develop a method for region-specific inhibition of sensory neurons revealing that anterior, but not posterior, sensory neurons are essential for proper self-righting. Deep-learning-based behavioral analysis shows that anterior inhibition prolongs self-righting by shifting head movement patterns, indicating a behavioral switch rather than a mere delay. Additional genetic and molecular experiments demonstrate that specific Hox genes are necessary in sensory neurons, underscoring how developmental patterning genes shape region-specific sensory mechanisms that enable adaptive motor behaviors.

      Strengths

      The work by Roseby et al. is notable for its elegant experimental design, the development of innovative methods that are likely to benefit the fly behavior community, and the strong experimental support for its conclusions. The manuscript is clearly written, well structured, and presents thoughtfully designed experiments that have been further improved in the revised version. This updated manuscript includes a comprehensive set of behavioral experiments using an additional Gal4 line (ppk-Gal4), which yields confirmatory results and strengthens support for the original hypothesis. It also incorporates quantification of Gal4 line strength, improvements to existing figures, the addition of new figures, and overall refinement of the text.

      Weakness:

      A remaining limitation of this manuscript is the lack of a cellular and mechanistic analysis explaining how Hox genes give rise to the observed behavioral phenotypes. The authors note that this question is being addressed in an ongoing follow-up study, which will expand the project to examine the roles of all Hox genes across the sensory system and to characterize their expression patterns within each of its subcomponents, with the aim of providing mechanistic insight. I look forward to seeing this work in a future manuscript.

      Comments on revised version.

      I have no further recommendations for the authors; most of my comments and questions have been satisfactorily addressed.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Strengths:

      Strengths of this paper include the important question addressed and the elegant and innovative combination of methods, which led to clear insights into the sensory biology of self-righting, and that will be useful for others in the field. This is a substantial contribution to understanding how animals correct their body position. The manuscript is very clearly written and couched in interesting biology.

      Limitations:

      (1.1) The interpretation of functional experiments is complicated by the proposed excitatory and inhibitory roles of dorsal and ventral sensory neuron activity, respectively. So, while silencing of an excitatory (dorsal) element might slow righting, silencing of inputs that inhibit righting could speed the behavior. Silencing them together, as is done here, could nullify or mask important D-V-specific roles. Selective manipulation of cells along the D-V axis could help address this caveat.

      We highly appreciate the thoughtful comments by Rev1 pointing out the relative simplicity of our current inferences regarding the role of dorsal vs. ventral substrate contact, and agree with the suggestion that cells along the DV axis could have diverse roles in their contribution to self-righting. In this context, we wish to point out two aspects, one theoretical and one practical. Regarding theory, our view is that this may not be a simple case of “excitation vs. inhibition”, but rather one in which the coordinated and dynamic activity of distributed sensory neurons promotes differential action selection in alignment with environmental conditions – a framework that could involve many different behaviours with a still uncertain level of granularity (e.g., is self-righting different if the larva is rotated to 160º instead of exactly 180º?). Regarding the practical aspect, while this area represents a fascinating point for future investigation, it is currently limited by technological development, particularly in the context of this study where a relatively low-cost implementation has been used to probe the AP axis. Investigation of the DV axis would require further technological development, since optogenetic light would need to be precisely delivered from the side rather than from underneath, with a greater degree of resolution compared to the AP axis given the much smaller width of the larva (~120-140µm) relative to its length (~550-600µm). Therefore, whilst we appreciate these comments and suggestion, we believe this line of experiments is ideal for a follow-up investigation, rather than being implemented in the current study.

      (1.2) Prior studies from the authors implicated daIV neurons in the righting response. One of the main advances of the current manuscript is the clever demonstration of region-specific roles of sensory input. However, this is only confirmed with a general md driver, 190(2)80, and not with the subsetspecific Gal4, so it is not clear if daIV sensory neurons are also acting in a regionally-specific manner along the A-P axis.

      To address this interesting and important comment by Rev1 we have carried out a new experiment using an alternative driver to 109(2)80-Gal4 and testing the impact of these manipulations on larval behaviour. The revised version of our MS includes a new figure Supp Fig S3 which shows self-righting times when using the ppk-Gal4 driver with the opto-axial technique. As observed with the 109(2)80-Gal4 driver, self-righting was delayed in anterior but not posterior inhibition conditions, suggesting the daIV neurons act in a region-specific manner to trigger postural control behaviour.

      We have also conducted a head casting analysis in the ppk domain; in another new figure, Supp Fig S7, we also show that head casting behaviour is also increased in the same manner as with the 109(2)80-Gal4 driver.

      These new panels and figures are cited within the sub sections entitled “Optogenetic inhibition of anterior but not posterior multidendritic neurons delays self-righting” and “Inhibition of anterior multidendritic neurons is associated with increased head casting during self-righting”, on pages 25 and 28, respectively. We are grateful to Rev1 for this suggestion, which we consider qualitatively improves our paper.

      (1.3) The manuscript is narrowly focused on sensory neurons that initiate righting, which limits the advance given the known roles for daIV neurons in righting. With the suite of innovative new tools, there is a missed opportunity to gain a more general understanding of how sensory neurons contribute to the righting response, including promoting and inhibiting righting in different regions of the larva, as well as aspects of proprioceptive sensing that could be necessary for righting and account for some of the observed effects of 109(2)80.

      Once again, we appreciate this interesting comment by Rev1. We feel our study provides novelty in understanding how sensory neurons in different body regions contribute to the induction of the behaviour. We developed new technology to show that the activity of anterior sensory neurons is essential for normal righting and inhibiting this activity leads to a switch to a different behavioural regime. We feel this represents a substantial advancement in our understanding of how this behaviour is initiated that has not been previously described. Whilst we also appreciate there is likely to be a substantial role of proprioception in self-righting behaviour, our work here focuses on the external stimuli that elicit self-righting, as a detailed understanding of proprioception would be out of scope and require the development of further techniques to manipulate and measure larval posture. As detailed in the above comment, we feel that the more targeted investigation of daIV neurons can also shed some light on the cell-type specificity and inputs to the self-righting induction process.

      (1.4) Although the authors observe an influence of Hox genes in righting, the possible mechanisms are not pursued, resulting in an unsatisfying conclusion that these genes are somehow involved in a certain region-specific behavior by their region-specific expression. Are the cells properly maintained upon knockdown? Are axon or dendrite morphologies of the cells disrupted upon knockdown?

      We agree with this comment in that further investigating the effects of Hox expression on localised aspects of the sensory system poses an interesting line of investigation. Indeed, we are currently conducting a full scale analysis of Hox gene effects across the sensory field. As things stands, it is not clear how Hox gene expression could affect local sensory processes, a mechanism which could involve morphological changes, changes in neuronal excitability (e.g. due to changes in channel expression), synapse formation and/or efficiency, cell development and identity, and/or combinations of these effects, amongst other possibilities. It is clear that a complete and satisfying investigation of this mechanism for each of the Hox genes would pose a substantial amount of work so, while we acknowledge the merit of Rev1’s comment, we consider that adding a cellular-mechanistic analysis of Hox effects is out of scope for the present study and shall constitute a central matter for a followup study emerging from current projects. We think that our data on Hox expression/function as reported here should serve to open up the analysis of genetic regulation of local sensory function, an area in which we are currently working very actively.

      (1.5) There could be many reasons for delays in righting behavior in the various manipulations, including ineffective sensory 'triggering', incoherent muscle contraction patterns, initiation of inappropriate behaviors that interfere with righting sequencing, and deficits in sensing body position. The authors show that delays in righting upon silencing of 109(2)80 are caused by a switch to head casting behavior. Is this also the case for silencing of daIV neurons, Hox RNAi experiments, and silencing of CO neurons? Does daIII silencing reduce head casting to lead to faster righting responses?

      This is an insightful comment. In the revised version of the manuscript, we do indeed show that anterior inhibition of daIV neurons leads to the same head casting behaviour as with the 109(2)80 domain, which we interpret as an inability of the larvae to sense the underlying substrate (see page 28). We hope the new data addresses this comment, at least to an extent. While we acknowledge it would also be insightful to run this behavioural analysis for other experimental conditions, such as the daIII inhibition and Hox RNAi lines, these experiments pose a specific technical difficulty: the behavioural analysis relies on a deep neural network (DNN) which was trained solely on recordings of the opto-axial technique, meaning it does not translate well to other experimental situations. This problem is further compounded by the use of L1 larvae, which means recording resolution is insufficient to accurately define the body landmarks used in the posture tracking at a smaller scale. Therefore, the recourse for identifying behavioural changes is manual observation, which we feel is too inconsistent to address a quantitative question like this.

      (1.6) 109(2)80 is expressed in a number of central neurons, so at least some of the righting phenotype with this line could be due to silenced neurons in the CNS. This should at least be acknowledged in the manuscript and controlled for, if possible, with other Gal4 lines.

      We thank the reviewer for making this interesting comment. We have added a phrase to the section “Conditional inhibition of multidendritic neurons delays self-righting” (p21) which acknowledges the presence of 109(2)80 expression in the CNS (as reported by Hughes and Thomas). We agree that ideally, a variety of sensory Gal4 lines would be used to check for consistency of the effects. However, it is also important to note that 109(2)80 is one of the only available Gal4 lines with near sole md neuron expression, as other Gal4s also drive expression strongly in external sensory cells for example. Thus, re-running experiments with these other lines – which would involve a substantial investment of time and resources – would not be an ideal strategy. We feel that the new observation of (very) similar axial results using the ppk-Gal4, which does express solely in the daIV neurons, better helps to confirm the specificity of the findings to multidendritic neurons.

      Other points:

      (1.7) Interpretation of roles of Hox gene expression and function in righting response should consider previous data on Hox expression and function in multidendritic neurons reported by Parrish et al. Genes and Development, 2007.

      We thank Rev1 for pointing out this study, which is definitively important to discuss given our results on Hox genes. To address this gap, we have added an additional paragraph in the Discussion (p37) to discuss the documented effects of Hox genes on da neuron dendritic morphology and how our results can be interpreted in light of this.

      (1.8) The daIII silencing phenotype could conceivably be explained if these neurons act as the ventral inhibitors. Do the authors have evidence for or against such roles?

      This is another interesting suggestion. If the daIII neurons were to fulfil this role, then in theory, their inhibition would result in self-righting behaviour under conditions of combined dorsal and ventral substrate contact. This is not an experiment we performed, so we are currently unable to confirm or rule out this possibility. However, we note from casual observation that daIII inhibition does not cause larvae to spontaneously self-right. As mentioned above, our view is not one in which the system has “dorsal/ventral stimulators/inhibitors” for a given behaviour, but that action selection proceeds according to a coordination of many (dynamic) contextual clues. Given the new results with the axial inhibition of daIV neurons (see above) it might be more parsimonious to suggest that these “tiling” neurons are primarily responsible for detecting substrate contact around the full circumference of the animal, rather than this involving different cell types according to the different sides of the body.

      Reviewer #2 (Public review):

      Strengths:

      The work of Roseby et al. does what it says on the tin. The experimental design is elegant, introducing innovative methods that will likely benefit the fly behavior community, and the results are robustly supported, without overstatement.

      Weaknesses:

      The manuscript is clearly written, flows smoothly, and features well-designed experiments. Nevertheless, there are areas that could be improved. Below is a list of suggestions and questions that, if addressed, would strengthen this work:

      (2.1) Figure 1A illustrates the sequence of self-righting behavior in a first instar larva, while the experiments in the same figure are performed on third instar larvae. It would be helpful to clarify whether the sequence of self-righting movements differs between larval stages. Later on in the manuscript, experiments are conducted on first instar larvae without explanation for the choice of stage. Providing the rationale for using different larval stages would improve clarity.

      This is a very interesting point raised by Rev2. Most of our previous work on self-righting (e.g. PicaoOsorio et al. 2015 Science; Picao-Osorio, Baldaia et al. 2017 Genetics; Klann et al. 2021 Journal of Neuroscience) was focused on the first instar larva (L1) because this early stage: (i) represents the simplest form of all larval stages, (ii) allows meaningful comparisons with late embryonic processes guiding the development and physiology of the nervous system, (iii) captures the system in a relatively naïve state, that had limited if any exposure to external stimuli. Although these attributes remain valid for the investigation of the sensory stimuli that trigger self-righting, the implementation of the necessary regional physical measurements and manipulations used in this study (surface contact, opto-axial technique, deep neural network analysis) would be impossible to implement in the early forms of the larva simply due to its reduced size. Due to this, we employed L3s, which due to their larger dimensions enabled the development and use of the sophisticated regional stimulation techniques reported here. Yet, as Rev2 rightly points out, we return to the late embryo and early L1 at the point of conducting gene expression analyses as these are optimised for those early stages. The selection of larval stage according to experiment relies on the fact that all forms of the larva display self-righting (Issa, Picao-Osorio, et al. 2019 Current Biology), that SR does not differ according to larval stage and that the characterisation of the structure of the nervous system across larval stages has shown a large level of similarity and consistent topographically arranged connectivity between identified neurons (Gerhard et al. 2017 eLife).

      (2.2) What was the genotype of the larvae used for the initial behavioral characterization (Figure 1)? It is assumed they were wild type or w1118, but this should be stated explicitly. This also raises the question of whether different wild-type strains exhibit this behavior consistently or if there is variability among them. Has this been tested?

      Thank you to the reviewer for pointing this out. The genotype for Figure 1 was w<sup>1118</sup>; this has now been added to the figure legend and the results section – thank you to Rev2 for pointing this out. Although in this study we did not explicitly compare self-righting (SR) performance in wild type/control genotypes (as we are internally consistent in using w<sup>1118</sup>) based on previous data collected in our lab we know that self-righting times are similar and very consistent amongst inbred control lines such as w<sup>1118</sup>, yw, and Oregon Red. Furthermore, we can also add that when comparing SR times between these inbred populations with a highly polymorphic outbred Drosophila population (Martins et al. 2013 PLoS Pathogens) we observed that their SR time (i.e. 6.14s ± 1.06) was not significantly different from the inbred lines (p<0.05, U test) (Picao-Osorio, J. 2014 Doctoral Thesis, Chapter 4, p112).

      (2.3) Could the observed slight leftward bias in movement angles of the tail (Figure 1I and S1) be related to the experimental setup, for example, the way water is added during the unlocking procedure? It would be helpful to include some speculation on whether the authors believe this preference to be endogenous or potentially a technical artifact.

      This is an interesting comment, and we recognise that lateral manipulation biases in self-righting could indeed reflect experimental limitations or biological tendencies. At this point we cannot interpret these results as formal evidence of chirality, given that they may reflect subtle aspects of the micromanipulation of specimens. We are currently developing a motorised platform to conduct self-righting tests, which when fully developed, should help addressing the chirality question.

      (2.4) The genotype of the larvae used for Figure 2 experiments is missing.

      Thank you for pointing this out. These were again w<sup>1118</sup> larvae; this detail has now been added to the figure legend and the main text.

      (2.5) The experiment shown in Figure 2E-G reports the proportion of larvae exhibiting self-righting behavior. Is the self-righting speed comparable to that measured using the setup in Figure 1?

      Thank you for pointing this out. We have now added average self-righting times to the figure legends of figures 1 and 2. The self-righting times across for the dorsal + ventral contact conditions was notably longer than dorsal-only cases, which were also slightly longer than the “standard” case. This is perhaps to be expected, as the larvae are encountering unusual and ambiguous situations. We suggest the extra time could reflect an additional decision-making step or action flip-flopping process, or simply physical constraints on the movement (for example, not being able to use some parts of the body).

      (2.6) Line 496 states: "However, the effect size was smaller than that for the entire multidendritic population, suggesting neurons other than the daIVs are important for self-righting". Although I agree that this is the more parsimonious hypothesis, an alternative interpretation of the observed phenomenon could be that the effect is not due to the involvement of other neuronal populations, but rather to stronger Gal4 expression in daIVs with the general driver compared to the specific one. Have the authors (or someone else) measured or compared the relative strengths of these two drivers?

      We agree with this suggestion and to address this concern, we have added as part of our new figure Supp. Fig. S3, a dedicated panel S3C showing fluorescence measurements from ddaC using the 109(2)80-Gal4 and ppk-Gal4 lines. We found no difference in tdTomato fluorescence intensity, suggesting equal expression strength across the two Gal4 drivers. Our new results for axial daIV inhibition are also consistent with this effect size difference, further suggesting that inhibition of all md neurons poses stronger challenges for self-righting compared to the daIV neurons alone.

      (2.7) Is there a way to quantify or semi-quantify the expression of the Hox genes shown in Figure 6A? Also, was this experiment performed more than once (are there any technical replicates?), or was the amount of RNA material insufficient to allow replication?

      Unfortunately, we only had limited amounts of mRNA extracted from FACS-sorted 109(2)80>GFP cells to feed our reverse transcriptase reactions and used much of these samples for the experiment reported. After Rev2 suggestion we went back to our freezers, recovered traces of the samples used in the original experiment, and attempted a new amplification; despite this effort, this new experiment was unsuccessful. We feel that the main point deduced from the original experiment is valid in that we obtained amplicons of the expected size for all the Hox transcripts analysed and that for those cases in which we observed biological effects – i.e. Antp and Abd-B – we corroborated protein expression in the 109(2)80 domain using immunohistochemistry. We are currently expanding this project examining the roles of all Hox genes across the entire sensory system and shall report the expression patterns of all Hox genes in each of the subcomponents of the sensory system the future.

      (2.8) Since RNAi constructs can sometimes produce off-target effects, it is generally advisable to use more than one RNAi line per gene, targeting different regions. Given that Hox genes have been extensively studied, the RNAis used in Figure 6B are likely already characterized. If this were the case, it would strengthen the data to mention it explicitly and provide references documenting the specificity and knockdown efficiency of the Hox gene RNAis employed. For example, does Antp RNAi expression in the 109(2)80 domain decrease Antp protein levels in multidendritic anterior neurons in immunofluorescence assays?

      We used the TRiP RNAi lines, specifically the Valium10 selection available from the Bloomington Stock Centre. Unfortunately, there is not much information on how specific the Hox RNAi lines areor whether their might have off-target effects.

      (2.9) In addition to increasing self-righting time, does Antp downregulation also affect head casting behavior or head movement speed? A more detailed behavioral characterization of this genetic manipulation could help clarify how closely it relates to the behavioral phenotypes described in the previous experiments.

      This would be interesting line of investigation. As described in a previous comment, this is currently unfeasible for us given some important differences between experiments including larval stage and recording conditions. We have added some speculative comments to the manuscript describing the larval behaviour under Hox RNAi.

      (2.10) Does down-regulation of Antp in the daIV domain also increase self-righting time?

      Given the new results with axial effects of daIV neurons, we also sought to address this point with a new series of experiments expressing Hox RNAi constructs in the ppk-Gal4 domain. The new data is shown in a new figure (Figure S8) displaying self-righting times for ppk-Gal4-Hox-RNAi. Interestingly, we found no effect of any RNAi expression on self-righting times, suggesting that md types other than daIVs are under Hox regulation that is important for self-righting.

      Recommendations for the authors:

      Reviewing Editor Comments:

      The reviewers were enthusiastic about the value and quality of this study by Roseby and colleagues. There were two main issues that emerged from the reviews that we're highlighting for the authors to address, should they choose to:

      (1) A little more cell-type resolution of the anterior region

      The anterior region includes a lot of sensory neurons that may be contributing to the effect. Some sensory neurons (e.g., daIV) have been implicated in righting - are these the ones carrying the anterior signal? Are dorsal sensory neurons promoting righting and ventral ones stalling it?

      We are not suggesting a complete sensory-neuron mapping in the anterior region. Instead, we propose the authors conduct a focused check: repeat the axial inhibition with a daIV-specific driver (same photomask assay) to show the A-P effect within the implicated class, and, if possible, replicate one key result with an alternative broad md driver to address Gal4 strength/off-target expression.

      As mentioned above (see Rev1 comment) we have indeed carried out a new experiment using an alternative driver to 109(2)80-Gal4 and testing the impact of these manipulations on larval behaviour. The revised version of our MS includes a new figure Supp Fig S3 which shows self-righting times when using the ppk-Gal4 driver with the opto-axial technique. As with the 109(2)80-Gal4 driver, self-righting was delayed in anterior but not posterior inhibition conditions, suggesting the daIV neurons specifically act in a region-specific manner to trigger postural control behaviour.

      Furthermore, in another new figure, Supp Fig S7, we show that head casting behaviour is also increased in the same manner as with the 109(2)80-Gal4 driver. These new panels and figures are cited within the sub-sections entitled “Optogenetic inhibition of anterior but not posterior multidendritic neurons delays self-righting” and “Inhibition of anterior multidendritic neurons is associated with increased head casting during self-righting”, on pages 25 and 28, respectively. We are grateful to R1 for this suggestion, which we consider qualitatively improves the quality of our paper.

      (2) The Hox section to strengthen this section, we recommend:

      (a) Confirm specificity/efficacy of knockdown (e.g., Antp protein reduction in targeted md neurons and a second RNAi line if available).

      This is a reasonable comment. For our experiments, we selected a UAS-Antp<sup>RNAi</sup> line (Bloomington #27675) given that this construct has been: (i) utilised in several previous studies as the main and single line to interfere with Anpt expression (e.g. Baek et al. 2013 Development; Paul et al. 2021 Nature Comms) and (ii) shown to display a consistent reduction in Antp protein levels of approximately 50% (see Poliacikova et al. 2024 Science Adv.). Furthermore, previous work comparing #27675 with other UAS-Antp<sup>RNAi</sup> lines has demonstrated that all available lines lead to a similar level of reduction in protein expression, although the #27675 line exhibits the most consistent effects (lower variability) (Poliacikova et al. 2024 Science Adv.). Unfortunately, at this point in time, we do not have the capacity to conduct new experiments with other RNAi lines, but consider that the information and arguments mentioned above should be reassuring about our choice of a reasonable and previously validated method to interfere with Antp expression.

      (b) Perform one temporal control (GAL80^ts) or a simple rescue, to separate developmental vs acute roles.

      This is a good and interesting suggestion, but we consider that the discrimination between developmental and physiological effects falls outside the scope of this study. Indeed, experiments of this kind are currently being conducted in our lab as part of a wider examination of Hox gene roles in the sensory system.

      (c) Place the results clearly in the context of prior work (e.g., Parrish 2007), so the mechanism isn't left hanging.

      This is an important point, and we have now done this. Many thanks for pointing this out.

      Reviewer #1 (Recommendations for the authors):

      (1.1) A Gal4 line for the pannier dorsal specification gene shows expression in dorsal sensory neurons, as described in Galindo et al., Development, 2023, and could help tease apart dorsal v. ventral contributions.

      This is an interesting suggestion. However, we understand that the pannier (pnr) Gal4 line mentioned in Galindo et al. 2023 is an enhancer trap inserted in the pnr locus which drives expression in neural as well as non-neural tissues such as the embryonic dorsal ectoderm (see: Calleja et al. 1996 Development; Stronach et al. 2014 Genetics). Although, as Rev1 rightly indicates, this line also labels dorsal cluster sensory neurons, including ddaC (cIV) and ddaF (cIII) neurons the fact that the line displays expression in non-neural tissues makes its use in behavioural experiments difficult as non-neural effects might affect the behavioural patterns studied. A possible way to instrument the pnrGal4 tool into behavioural analyses might involve the creation of the necessary variants to implement a split-Gal4 approach, but this, we believe, unfortunately falls out of the scope of this study.

      (1.2) Potential roles for daII neurons and daI neurons are not examined. Drivers have been described for daII neurons, and there are drivers that will target a majority of proprioceptive md neurons, so these could be examined to complete the analysis started here.

      This is another interesting suggestion by Rev1, but we consider that the fine-grain mapping of effects mediated by sensory neuron sub-clases falls outside the scope of this study aimed at mapping sensory regional effects on self-righting. This does not take the merit of the suggestion away, and indeed, experiments of this kind are currently being conducted in our lab as part of a comprehensive examination of Hox gene roles in the sensory system.

      (1.3) To account for 109(2)80 off targets, the authors could consider other lines that silence most or all md neurons (clh201-Gal4; 5-40-Gal4; 21-7-Gal4) that could at least have different central offtargets. Some other lines are broad somatosensory system drivers but sensory-specific (pebbledGal4).

      This is an interesting comment, and so are the suggestions made. Although to include this kind of verification would be interesting, when carrying out our experiments, we did not observe any central expression at all. Also, to repeat all our experiments in which we use the established and validated 109(2) 80 line using instead these four Gal4 lines, is unfortunately out of scope for us at this point in time. We will nonetheless consider these comments by Rev1 in future extensions of our work.

      (1.4) There is a typo on line 481; it should be "other".

      We are grateful to R1 for pointing this out. This has now been amended

      Reviewer #2 (Recommendations for the authors):

      (2.1) Lines 91-92 cite references describing self-righting behavior across different animal groups, which is illustrated in Figure 1B. It would be helpful to indicate these references directly in the figure. For example, instead of using dots to denote their presence (which are, in a way, redundant since the behavior is reported in all groups), numbers or letters could be used to refer to the specific papers describing them.

      Thank you for this suggestion. We have now replaced the original dots by an abridged citation of a key paper providing evidence in that specific animal group, e.g. Smith, et al. 1997; Rogers et al. 2015

      (2.2) In Figure 1A, the diagrams illustrate the two large dorsal tracheae, which nicely indicate the larva's orientation. However, since they are drawn in a very light gray, they can be difficult to distinguish without zooming in. It might improve clarity if the tracheae were made slightly more prominent.

      Thank you for this suggestion. We have now implemented this change.

      (2.3) In Figure 1E, the dotted line and green bar mark the segment of the recording corresponding to self-righting, which is then quantified in Figure 1G. Was the same procedure applied when analyzing tail speed, or was it limited to head speed? Figure 1F does not show a dotted line or green bar, which is confusing; it would be helpful to clarify the reason for this discrepancy. Also, in Figure 1G, there is an inset showing photos of the movement sequence with the green bar and the caption 'Trimmed to SR sequence,' which implies to me that for tail speed, the 0.75-1 segment of the recording was also used for quantification. I suggest adding the dotted line and green bar to Figure 1F and removing this inset from Figure 1G, as it appears quite small and disrupts the layout of the figure. If it is retained, the figure legend should explicitly refer to the inset.

      Thank you for pointing this out. We have amended these figures as suggested.

      (2.4) In Figures 1 and 2, the box plots include the individual data points, whereas Figures 3 and S2 do not. For data transparency, it would be important to show the individual measurements here as well. I strongly recommend adding them to the figure, or alternatively providing a clear rationale in the text for not doing so.

      Thank you for mentioning this. The reason data points are not shown in Fig 3 or S2 is because the variance extends the scale and compresses the box making it illegible. To make this clear we now explain this in the figure legends.

      (2.5) In Figures 4 and 5, the distribution of self-righting times from the optogenetic inhibition experiments is shown using bar graphs rather than box plots, as in the previous figures. This choice obscures the data distribution, since all bars reach down to zero. Replacing the bar graphs in Figures 4 and 5 with box plots would more clearly convey the experimental results.

      We thak Rev2 for this comment, which gives us an opportunity to clarify the matter. Distributions of SR times are drawn with bars because we compare means +/- variance in the analysis, and not medians +/- IQR as is done in the other experiments. The choice of visualisation reflects the analysis, which is what is recommended by statisticians. Plus, we also show the individual observations, meaning the distribution can be observed. We hope that it is now clear that we are not obscuring any distributions.

      (2.6) Figure 6 would benefit from some reorganization. Panel A is very small and dense with information, making it difficult to interpret without significant zooming. In particular, the FACS graph is nearly impossible to read, as the axes remain unclear even when enlarged. It might be best to either remove this graph and replace it with a cartoon version of FACS-sorted populations, and reorganize the figure to ensure legibility. Additionally, the current layout progresses from the bottom up, which takes time to follow. Comprehension could be improved if the sequence began with the larva dissection placed in the top left area of the figure, where readers typically look first (I appreciate that this is mentioned in the figure legend; however, a different layout might present the information more effectively).

      We appreciate the constructive spirit of this comment and have indeed considered Rev2 suggestions including drafting new layouts of this figure. After all this experimentation, we remain of the view that the original presentation is probably the best trade-off between size and clarity, offering more space for the appreciation of confocal imaging and its interpretation.

      Minor corrections:

      (1) Throughout the text, the word Drosophila appears sometimes in italics and sometimes in regular font; please standardize its formatting for consistency.

      Amended

      (2) Line 179: the use of three hyphens in the sentence "minimum --- in all cases < 30 s --- to avoid larval desiccation" is unusual; exchanging them for commas or brackets is advised.

      Amended

      (3) Line 183: in w1118, the numbers are usually in superscript (not subscript), and the w should be italicized.

      Amended

      (4) In line 783, there is an incorrect space between "is" and the comma in "...repertoire, which is , in...".

      Amended

      (5) In Figure 2G, the left panel appears partially cut off, which makes the text at the edges difficult to read. It might help to adjust the panel so that all labels are fully visible.

      Done

      (6) In the current version of the manuscript, Figure 5 is presented before Figure 4, which is confusing.

      This has been amended.

      (7) Two videos are included in the supplementary material, but I could not find any reference to them in the main text of the manuscript.

      This has been amended.

    1. eLife Assessment

      This study presents a valuable finding on the mutational and expression profile of ZNF217, ZNF750, ZNF703 Zinc finger genes in Kenya women with BCs. The evidence supporting the claims of the authors is solid. The work will be of interest to scientists or clinicians working in the field of diagnosis and detection for breast cancer.

    2. Reviewer #2 (Public review):

      Summary:

      The authors sought to characterize the somatic mutation landscape and gene expression profiles of Kenyan breast cancer patients. By comparing Whole Exome Sequencing (WES) and RNA-seq data from 23 paired tumor-normal samples against The Cancer Genome Atlas (TCGA) cohorts, the study specifically aimed to highlight the role of the ZNF gene family.

      Strengths:

      The study addresses a critical gap in genomic research by focusing on an underrepresented African population, which is essential for achieving global health equity in oncology.

      Weaknesses:

      The cohort is relatively small for definitive landscape characterization. The study fails to explore the mechanistic link between identified somatic mutations and observed aberrant gene expression.

      Impact and Utility:

      The impact of this work is currently limited. While the data adds to the growing repository of African genomic samples, the lack of novelty and mechanistic insight reduces its utility for the broader scientific community. To be clinically valuable, the study would need to offer more robust, unbiased profiling that could eventually inform population-specific diagnostics or therapies.

      Additional Context:

      Breast cancer in African populations often presents with different clinical trajectories compared to Western cohorts. While any data from these regions is vital, "landscape" studies require high statistical power and unbiased analysis to differentiate true population-specific drivers from noise or small-sample variance. Without a clear regulatory mechanism linking mutations to phenotypes, the findings remain preliminary observations.

    3. Reviewer #3 (Public review):

      Summary:

      This revised study analyzes the somatic mutational profiles and transcriptomic expression of three zinc-finger genes (ZNF217, ZNF703, ZNF750) in 23 Kenyan women with breast cancer, using whole-exome sequencing and RNA-sequencing of paired tumor-normal tissues. A total of 358 somatic mutations were detected, and all three genes were significantly upregulated in tumors compared to normal tissues (ZNF217 showing the most prominent difference). The findings provide preliminary evidence for the idenfication of diagnostic/prognostic biomarkers or therapeutic targets in sub-Saharan African populations.

      Strengths:

      The study's key strengths lie in its focus on an underrepresented Kenyan cohort, addressing a critical gap in sub-Saharan African breast cancer genomic research. It integrates DNA-level mutation analysis with RNA-level expression data, leveraging standardized bioinformatics pipelines and rigorous quality control to deliver detailed insights into mutation types, functional impacts, and amino acid changes.

      Comments on revised version:

      After careful revision by the authors, the manuscript has become more rigorous. The limitations including small sample size and lack of functional validation are properly acknowledged, and conclusions are prudently presented as hypothesis‑generating rather than causal claims. Meanwhile, strengthened multi‑omics analyses, TCGA validation, logical reorganization of results and improved figure presentation further enhance the reliability of this work.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Weaknesses:

      (1) Research scope

      The results primarily focus on mutations in ZNF217, ZNF703, and ZNF750, with limited correlation analyses between mutations and gene expression. The rationale for focusing only on these genes is unclear. Given the availability of large breast cancer cohorts such as TCGA and METABRIC, the authors should compare their mutation profiles with these datasets. Beyond European and U.S. cohorts, sequencing data from multiple countries, including a recent Nigerian breast cancer study (doi: 10.1038/s41467-021-27079-w), should also be considered. Since whole-exome sequencing was performed, it is unclear why only four genes were highlighted, and why comparisons to previous literature were not included.

      We have significantly strengthened the biological and clinical rationale for focusing on these three genes in the Introduction. Specifically, we now clearly justify their selection based on distinct functional roles: ZNF217 (oncogene, 20q13 amplification); ZNF703 (luminal subtype oncogenic driver); ZNF750 (tumor suppressor involved in differentiation). We have also explicitly define the knowledge gap: lack of mutation and expression data for these genes in African populations, particularly Kenyan cohorts.

      Importantly, we have now incorporated comparative analysis with TCGA data in the Results. This include; A new section on “Recurrent mutations and comparison with TCGA”; a new table, “Table 6” and a curated dataset, “Supplementary Table S4”

      (2) Language and Style Issues

      There are many typos and clear errors in the main text (e.g. (ref)).

      Additionally, several statements read unnaturally. For example:

      "Investigators uncovered 170 mutations ..." should instead be phrased as "We identified 170 mutations ...."

      "The research team ..." should be rephrased as "Our team ...."

      The manuscript has undergone comprehensive language editing throughout the revised draft.

      (3) Methods and Data Analysis Details

      The methods section is vague, with general descriptions rather than specific details of data processing and analysis. The authors should provide:

      (a) Parameters used for trimming, mapping, and variant calling (rather than referencing another paper such as Tang et al. 2023).

      (b) Statistical methods for somatic mutation/SNP detection.

      (c) Details of RNA purification and RNA-seq library preparation.

      Without these details, the reproducibility of the study is limited.

      We have fully revised and substantially expanded the Methods section to improve clarity, transparency, and reproducibility. In the revised manuscript, we now provide explicit details of all key analytical steps. These include quality control procedures using FastQC and MultiQC, as well as read trimming parameters implemented in Trimmomatic (leading and trailing quality <3, sliding window 4:15, and minimum read length of 36 bp). We also clearly describe alignment of reads to the hg38 reference genome using BWA-MEM, followed by somatic variant calling using MuTect2 in paired tumor–normal mode with incorporation of a Panel of Normals (PON). Variant filtering criteria are now explicitly stated, including minimum read depth (≥10), base quality (≥20), and variant allele fraction (≥0.05), and functional annotation was performed using VEP (v108).

      In addition, we have included details on variant validation through visualization in the Integrative Genomics Viewer (IGV), as well as RNA-seq processing steps using STAR for alignment, featureCounts for quantification, and DESeq2 for normalization and differential expression analysis. Statistical analyses are now clearly described, including the use of paired tests and Benjamini–Hochberg correction for multiple testing. Collectively, these additions directly address the reviewer’s concerns by ensuring that all analytical procedures are transparently reported and fully reproducible.

      (4) Data Reporting

      This study has the potential to provide a valuable resource for the field. However, data-sharing plans are unclear. The authors should:

      (a) Deposit sequencing data in a public repository.

      (b) Provide supplementary tables listing all detected mutations and all differentially expressed genes (DEGs).

      (c) Clarify whether raw or adjusted p-values were used for DEG analysis.

      (d) Perform DEG analyses stratified by breast cancer subtypes, since differential expression was observed by HER2 status, and some zinc finger proteins are known to be enriched in luminal subtypes.

      We have improved data transparency and reporting in the revised manuscript. All sequencing data are now publicly available, with whole-exome sequencing (WES) data deposited in the Sequence Read Archive (SRA; PRJNA913947) and RNA-seq data available in the Gene Expression Omnibus (GEO; GSE225846). In addition, we have provided comprehensive Supplementary Materials to support reproducibility and facilitate further analysis, including detailed mutation summaries (Table S1), mutation positions (Table S2), amino acid changes (Table S3), the curated TCGA comparison dataset (Table S4), protein domain annotations (Table S5), and the combined gene expression and clinical dataset (Table S6).

      We have also clarified key aspects of the statistical analysis, including the use of Benjamini–Hochberg adjusted p-values and the thresholds applied for significance. Furthermore, in response to reviewer comments regarding subtype-specific analyses, we have explicitly addressed in the Discussion why subtype-stratified differential expression analysis was not performed, noting that the limited sample size would reduce statistical power and increase the risk of overinterpretation. Together, these revisions enhance the transparency, accessibility, and interpretability of the study.

      (5) Mutation Analysis

      Visualizations of mutation distribution across protein domains would greatly strengthen interpretation. Comparing mutation distribution and frequency with published datasets would also contextualize the findings.

      We have substantially enhanced the mutation analysis by incorporating several new figures and complementary analyses that provide deeper biological interpretation. Specifically, we added Figure 1 to summarize mutation burden, coding consequences, and prevalence; Figure 2 to illustrate the nucleotide substitution spectrum; Figure 3 to map mutations across protein domains; Figure 4 to assess functional enrichment and mutation composition; and Figure 5 to highlight recurrent mutations.

      Reviewer #2 (Public review):

      Weaknesses:

      The current cohort size is relatively small to reach significant findings, and targeted exploration on ZNF family without emphasizing the reason or clinical significance hinders the overall significance of the entire work.

      We acknowledge the limitation posed by the relatively small cohort size and have addressed this concern in several ways in the revised manuscript. First, we have explicitly stated this limitation in the Discussion section. We have also reframed the study as a pilot and population-specific exploratory analysis to better reflect its scope. To strengthen the overall significance, we integrated both mutation and gene expression data, incorporated comparisons with TCGA datasets, and emphasized the importance of African-specific genomic insights. Importantly, we highlight that this study provides novel data from an underrepresented population, which represents a key contribution to the field.

      Reviewer #3 (Public review):

      Weaknesses:

      The author has enhanced the descriptive depth of the study by adding details on mutations, expression subgroup analyses, and functional annotations but has not addressed the core weaknesses of small cohort size and lack of functional validation. While the revised version is more comprehensive in cataloging molecular alterations, it remains confined to descriptive analysis, with no substantial improvement in the reliability or generalizability of its conclusions.

      We have addressed this concern by clearly acknowledging the key limitations of the study, including the absence of functional validation, the relatively small sample size, and the limited generalizability of the findings. In response, we have refined our interpretation to avoid causal claims and instead present the results as hypothesis-generating. We have also expanded the Discussion to include future research directions, recommending functional validation studies, multi-omics approaches, and validation in larger, more diverse cohorts.

      In addition, we have strengthened the robustness of the study by incorporating comparisons with TCGA data, providing more detailed mutation classification, and integrating genomic and transcriptomic analyses. Beyond addressing reviewer comments, we have further improved the manuscript by reorganizing the Results section to follow a clear and logical flow—from mutation burden and spectrum to protein-level distribution, functional enrichment, recurrent mutations, and TCGA comparison. We have also improved figure quality and labeling to meet journal standards, added clear and consistent figure captions, and ensured alignment between the text, figures, and tables throughout the manuscript.

      We sincerely thank the reviewers for their valuable feedback, which has significantly improved the quality and rigor of this work.

    1. eLife Assessment

      In this important theoretical contribution, the authors study the evolution of large microbial populations competing for resources in the challenging and relevant regime of overlapping ecological and evolutionary timescales. The modeling approach is overall convincing, anlthough its presentation would benefit from clarifications, e.g. on assumptions and approximations. The results will be of broad interest to researchers in evolutionary biology, ecology and microbiology.

    2. Reviewer #1 (Public review):

      Summary:

      This important study performs a theoretical analysis of the evolutionary dynamics of strains under a classical resource competition model to understand how clonal interference and diversification of resource preferences interact to structure microbial population genetic structure. They find that in large asexual populations evolving in relevant parameter regimes, where evolutionary and ecological time scales overlap, populations are characterized by a small number of ecotypes, which are groups of strains that share a given resource preference, whose dynamics in the long run are dominated by priority effects.

      Strengths:

      The manuscript constitutes a novel and sound contribution to theory in ecology and evolution, under relevant parameter regimes which have been previously overlooked due to the complexities they bring, i.e. when the weak mutation regime breaks down. Here, the authors make a considerable step forward by taking advantage of analytical advances in the population genetics theory of clonal interference in recent years (travel fitness wave moving at a constant average speed v), which they apply to resource competition models typically studied in ecology.

      The main insights in the derivations shown in the supplementary text are clearly summarized in Figure 2 of the main manuscript, where the different phases of the somewhat counterintuitive dynamics of the strategic mutations in the model are quantified.

      Weaknesses:

      Despite its many merits, I believe the manuscript can profit from a few clarifications as I point out below:

      (1) I think the authors should make explicit in the abstract of the paper that they study a stair to heaven fitness landscape and that the rate of beneficial mutations does not slow down.

      (2) Evolution is elegantly incorporated in the resource consumption model by assuming two classes of mutations: strategic mutations and constitutively beneficial mutations. I believe that the biological meaning of these different types should be better explained. Specifically, on pages 3 and 4, the authors state that strategy mutations "alter resource uptake strategy and potentially its overall magnitude as well", whereas the other type is "only tangentially related to resource consumption (e.g. eliminating a pathway that is not necessary in the current environment)." I find this a bit strange since this is a model of resource competition, and I would assume that the latter type of mutations would be neutral. Maybe I am not reading this well, and the meaning of the mutations, as well as their assumed rates, could be clarified with some examples as the authors state that these mutations are routinely observed in microbial evolution experiments.

      (3) The authors discuss the theoretical results obtained in the light of the famous Lenski experiment, where ecotype formation is observed in some populations. However, in the mentioned example, cross-feeding was the mechanism involved. Since in their model, unlike in other models, cross-feeding is not considered, I found this example to be misplaced. In addition, in the Lenski experiment, a single (and essential) resource is present in the environment, so the assumptions of the model do not appear to apply. On the other hand, in Herron and Doebeli's experiments, two resources (substitutable) were present, so a comparison with their experimental results would be more appropriate.

      (4) The paper should also discuss deleterious mutations, which I did not see mentioned anywhere.

    3. Reviewer #2 (Public review):

      Summary:

      In "Ecological diversification in rapidly evolving populations", the authors use a consumer-resource model with competition for 2 different resources to study diversification for cases in which ecology and evolution are separated (weak-mutation limit) and when they overlap. They find the potential for the timing of a mutation (and not just its associated fitness) to confer an advantage against fitter strains (which they call "priority effect"), and the aggregation of dominant trait values that lead to the definition of "ecotypes" that discretize and structure the community.

      Strengths:

      The authors introduce detailed analytical calculations in the limit of overlapping ecology and evolution, which is a case that typically eludes analysis. The work also pays particular attention to the timing of "invasion" by a mutation, whereas most approaches focus on the long-term outcome of evolution (e.g. fixation of a trait value).

      Weaknesses:

      The model makes important assumptions that limit its generality considerably. In particular, the two "evolving traits" defined in the model are very specific and by no means the simplest possible resource competition evolutionary model that the authors claim it to be. The manuscript is not clear enough to be reproducible, and the authors do not discuss in sufficient depth the huge amount of work that is presented in the manuscript. The bibliography omits important work focused on diversification emerging from eco-evolutionary interactions similar to the ones studied in the manuscript.

    4. Author response:

      We thank the Editor and the Reviewers for their detailed and constructive feedback. We look forward to submitting a revised version of the manuscript that addresses their comments and suggestions, with a special focus on clarifying the assumptions and implications of our analysis. In particular, we will aim to demonstrate that (i) many of our qualitative findings -- and even some quantitative results -- extend beyond the simplest two-resource case considered in the main text, and (ii) that they can also be generalized to account for simple forms of cross-feeding. We hope that these changes will help to illustrate the broader applicability of our underlying mathematical framework.

    1. eLife Assessment

      This valuable study demonstrates that the inner membrane protease YME1 contributes to the formation of mitochondrial-derived compartments in yeast through the modulation of both the lipid transporter UPS2 and the MICOS complex. The evidence supporting this model is solid, although this manuscript could be improved by providing additional evidence supporting the independent roles for UPS2 and MICOS regulation in this process. This work will be of interest to cell biologists, biochemists, and geneticists interested in understanding the molecular basis of mitochondrial regulation and function.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Balasubramaniam and colleagues continue this group's efforts to understand mitochondrial-derived compartments (MDCs) that bud off from yeast mitochondria in response to metabolic stress. In a previous genetic screen, they identified Ups lipid transfer proteins and the AAA-protease Yme1 as components that modulate MDC formation. In this study, the authors link these observations by showing that Yme1 modulates levels of Ups1, Ups2, as well as MICOS complex members in the mitochondrial proteome. Using genetic approaches, they then show that Yme1's role on MDCs is dependent on its catalytic activity (via an inactive mutant) and that YME1 shows genetic interactions with UPS1/2 and MIC10/MIC60. The overall model is that Yme1 activity responds to metabolic cues and acts via proteolysis of these two distinct mitochondrial machineries to regulate MDC biogenesis.

      Strengths:

      The strengths of the study are its integration of mitochondrial proteomics with strong genetic approaches, as well as synergy with the authors' previous studies on the role of lipids in MD genesis. The work is overall well carried-out and experiments are thoughtfully discussed.

      Weaknesses:

      The major weaknesses are a lack of mechanistic resolution surrounding the model, e.g., proposed or tested mechanisms by which Yme1 activity is regulated by metabolic cues, or how Ups1/2 activity and the MICOS contribute to MDC generation. The authors acknowledge these as open questions, but addressing them would still enhance the significance of the study.

    3. Reviewer #2 (Public review):

      In this manuscript, the authors report a novel regulation of the outer mitochondrial membrane remodeling domains called mitochondria-derived compartments, MDCs. The team has previously established the main principles behind this recently identified quality control pathway, but the mechanisms that control MDCs formation remain incompletely understood. Using the baker's yeast model, the authors identify the conserved mitochondrial protease Yme1 as a crucial factor that regulates MDC formation. Mechanistically, Yme1's proteolytic function controls the levels of Ups1 and Ups2 lipid transfer proteins and the components of the membrane organizing complex called MICOS, thus providing a plausible model as to how Yme1-dependent proteolysis permits MDC formation through the removal of lipid and MICOS-dependent constraints. Finally, the authors show that this Yme1-mediated activity is also defined by metabolic conditions. In principle, this study is interesting and novel, and holds potential to provide new insights into the regulation of the MDC pathway that emerged as a new fundamental mitochondrial quality control mechanism. However, the following points should be carefully addressed.

      Major points:

      (1) Yme1 has been previously shown to regulate mitochondria-specific autophagy through Atg32 processing. Given the high similarity of the MDC pathway to piecemeal autophagy and the fact that both pathways share some of the core components, the authors should address the involvement of Atg32 in their model. It would also be important to include a brief discussion addressing the differences between piecemeal autophagy and the MDC pathway.

      (2) The Rpt3 (P215L) expression experiment is interesting, but appears to be somewhat superficial due to the unclear mechanism by which the mitochondrial network morphology is restored in these cells. Could this result be replicated in the dnm1∆ mgm1∆ double deletion mutant, which is a well-established model for mitochondrial network restoration?

      (3) Figure 3E. The changes in PE levels appear to be minor. While statistically significant, the observed differences may not be physiologically relevant. More in-depth lipidomic analysis data should be presented to substantiate the authors' argument and better address the questions at hand. Related to that, could PE or PA supplementation stimulate MDC formation?

      (4) The connection between rapamycin treatment and Yme1-regulated MDC formation is unclear and puzzling and needs to be explained better.

      (5) The MICOS complex is clearly involved in the regulation of MDC, but the manuscript misses the mark on providing compelling evidence and a clear explanation as to how MICOS contributes to said regulation.

      Minor points:

      (1) The authors should discuss potential reasons for the dramatically different rates of MDC formation in the S288C and W303 background cells. Does this have anything to do with generally more robust mitochondrial functions in the latter cells?

      (2) Proper statistical analyses should be provided for all the graphs presented.

      (3) The authors should include Yme1 immunoblots to confirm the identity of strains being studied and validate the presence or overexpression of Yme1 and its catalytic mutant in their experiments.

    4. Reviewer #3 (Public review):

      Summary:

      Since describing MDCs over a decade ago, the lab of the corresponding author, Hughes, has been at the forefront of further characterizing these structures. Here, they follow up on recent work (PMID: 38497895), where a screen identified Yme1 as a potential regulator of MDCs. After confirming that Yme1-ko prevents MDCs that are usually induced via various established treatments (Rapamycin, cycloheximide, Concanavalin A), the authors confirmed that the proteolytic activity of Yme1 is required. Next, using proteomics, they identified how loss of Yme1 impacts the mitochondrial proteome with and without Rapamycin treatment to induce MDCs. From this result and based on insight from other published data implicating lipids, the focused initially on the lipid transfer protein Usp2, a known target of Yme1. Here, they showed that loss of Usp2 could partially rescue MDC formation in Yme1-ko cells. To look for other Yme1 targets that might also be involved in MDC formation, next, they investigated the MICOS complex, which was also notable in their proteomics data. They then showed that inhibiting MICOS also partially restored MDC formation in Yme1-ko cells. They then tested the combined effects of Usp2 and MDC inhibition on MDCs, which was limited by the fact that the combination of full MICOS disruption, Usp2-KO, and Yme1-KO was not viable. To circumvent this limitation, they investigated the knockout of individual MICOS subunits in combination with Usp2 and/or Yme1. Finally, they showed that growth conditions also mediate MDC formation in the context of Yme1 overexpression. In rich media, Yme1 overexpression induces MDCs on its own. However, this induction is lost upon amino acid starvation, suggesting that there are still other as-yet-unidentified factors regulating the formation of MDCs.

      Strengths:

      The authors use unbiased approaches and genetic models to begin unraveling a novel regulatory role of Yme1 in the formation of MDCs.

      Weaknesses:

      (1) The authors find both Ups1 and Ups2 in their screens, but only focus on Ups2 in this paper. It would be good to know why they did not also investigate Ups1, and its other protease Atp23, which could potentially act similarly to Yme1, or even rescue the loss of Yme1.

      (2) I'm not convinced that the data support the notion that Usp2 and MICOS have distinct effects on MDCs. In Figure S3C-D, there is no statistical analysis to indicate whether the small differences between the MICOS-ko and the double knockout are significant. If MICOS-ko and Ups2-ko were acting through different mechanisms, one would expect their combination to be additive; this does not appear to be the case, as both single deletions and the double deletion all cause similar levels of MDCs (~30-40%). Rather, this result is what you would expect if they were working through the same mechanism. There also does not appear to be an additive effect in Figure 4F-G, when using the mic60-ko rather than the complete MICOS-ko. In this regard, the authors note in their discussion that 'loss of MICOS may disrupt membrane associations or alter lipid distribution between mitochondrial subcompartments' (lines 390-392). The latter situation seems like it would be the same mechanism as Usp2 and would more accurately explain their findings.

      (3) The manuscript is missing key data confirming the re-expression or overexpression of Yme1 protein (Figure 1 E/G and Figure 5A). It is important to know the relative levels of expression of the re-expressed proteins to each other and to endogenous Yme1.

      (4) Some clarification of the details for metabolically restrictive conditions would be helpful.

      (5) Beyond just the presence/absence of MDCs, does more detailed quantification of their size/shape reveal any subtle differences between conditions?

    5. Author response:

      We thank the editors and reviewers for their thoughtful and constructive evaluation of our manuscript. We are pleased that the reviewers found the study valuable and the evidence supporting a role for Yme1 in MDC formation solid. As described below, we plan to modify the manuscript to clarify the lipid model, better explain the relationship between Ups-family proteins and MICOS, distinguish MDC formation from Atg32-dependent mitophagy, clarify metabolic conditions, add statistical analyses where missing, and strengthen Yme1 validation with immunoblotting.

      eLife Assessment

      This valuable study demonstrates that the inner membrane protease YME1 contributes to the formation of mitochondrial-derived compartments in yeast through the modulation of both the lipid transporter UPS2 and the MICOS complex. The evidence supporting this model is solid, although this manuscript could be improved by providing additional evidence supporting the independent roles for UPS2 and MICOS regulation in this process. This work will be of interest to cell biologists, biochemists, and geneticists interested in understanding the molecular basis of mitochondrial regulation and function.

      We appreciate this positive assessment and agree that the roles of Ups-family lipid transport and MICOS in MDC regulation could be expanded further. This will be an important topic for future studies, especially with regard to how MICOS contributes to MDC formation. In the current revision, we will add new genetic data focused on PA-linked lipid metabolism through the yeast Pah1/Lipin pathway, which we think will help strengthen and clarify the lipid arm of the model. Our current interpretation is that Yme1-regulated Ups-family lipid transport and MICOS may both influence a shared mitochondrial membrane state that permits MDC formation. This interpretation is consistent with our genetic data and with known connections between Ups proteins, MICOS, and mitochondrial membrane organization.

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Balasubramaniam and colleagues continue this group's efforts to understand mitochondrial-derived compartments (MDCs) that bud off from yeast mitochondria in response to metabolic stress. In a previous genetic screen, they identified Ups lipid transfer proteins and the AAA-protease Yme1 as components that modulate MDC formation. In this study, the authors link these observations by showing that Yme1 modulates levels of Ups1, Ups2, as well as MICOS complex members in the mitochondrial proteome. Using genetic approaches, they then show that Yme1's role on MDCs is dependent on its catalytic activity (via an inactive mutant) and that YME1 shows genetic interactions with UPS1/2 and MIC10/MIC60. The overall model is that Yme1 activity responds to metabolic cues and acts via proteolysis of these two distinct mitochondrial machineries to regulate MDC biogenesis.

      Strengths:

      The strengths of the study are its integration of mitochondrial proteomics with strong genetic approaches, as well as synergy with the authors' previous studies on the role of lipids in MD genesis. The work is overall well carried-out and experiments are thoughtfully discussed.

      Weaknesses:

      The major weaknesses are a lack of mechanistic resolution surrounding the model, e.g., proposed or tested mechanisms by which Yme1 activity is regulated by metabolic cues, or how Ups1/2 activity and the MICOS contribute to MDC generation. The authors acknowledge these as open questions, but addressing them would still enhance the significance of the study.

      We thank the reviewer for the positive assessment, and we agree that the upstream regulation of this response remains an important open question. Yme1-dependent MDC regulation could involve changes in Yme1 activity, substrate accessibility, or broader changes in mitochondrial lipid and protein organization. Fully resolving how metabolic state gates this response will require future work, likely outside the scope of the current study.

      We also agree that the manuscript would benefit from a more developed discussion of how lipid changes could contribute to MDC formation. Our prior work showed that reduced mitochondrial PE promotes MDC formation, whereas cardiolipin is required for MDC biogenesis (Xiao et al., 2024). We proposed that reduced PE changes the membrane environment of mitochondrial outer membrane proteins, potentially affecting their stability, abundance, insertion, or lateral organization within the membrane. Such changes could increase the pool of proteins available for sorting into MDCs or make the outer membrane more permissive for domain formation. In the revision, we will connect this model more directly to Yme1-dependent regulation of Ups-family lipid transport.

      We will also expand the model to incorporate PA-linked metabolism. We did not initially focus heavily on Ups1 because complete loss of UPS1, or loss of downstream cardiolipin synthesis through CRD1, blocks MDC formation because cardiolipin is required. Thus, complete disruption of Ups1-dependent lipid transport may obscure the effects of more moderate changes in PA flux. To address this, we will include additional lipid measurements and new genetic data targeting PA metabolism through the yeast Pah1/Lipin pathway. Because Pah1 converts PA to DAG, this provides a way to alter PA-linked metabolism without simply eliminating cardiolipin synthesis. Our new data suggest that PA accumulation or altered PA-linked lipid flux may also promote MDC formation. Together, these findings support a broader model in which reduced PE and increased PA alter both the organization of OMM proteins and the physical properties of the membrane, including curvature and domain formation, thereby creating a membrane state that is more permissive for MDC biogenesis.

      Reviewer #2 (Public review):

      In this manuscript, the authors report a novel regulation of the outer mitochondrial membrane remodeling domains called mitochondria-derived compartments, MDCs. The team has previously established the main principles behind this recently identified quality control pathway, but the mechanisms that control MDCs formation remain incompletely understood. Using the baker's yeast model, the authors identify the conserved mitochondrial protease Yme1 as a crucial factor that regulates MDC formation. Mechanistically, Yme1's proteolytic function controls the levels of Ups1 and Ups2 lipid transfer proteins and the components of the membrane organizing complex called MICOS, thus providing a plausible model as to how Yme1-dependent proteolysis permits MDC formation through the removal of lipid and MICOS-dependent constraints. Finally, the authors show that this Yme1-mediated activity is also defined by metabolic conditions. In principle, this study is interesting and novel, and holds potential to provide new insights into the regulation of the MDC pathway that emerged as a new fundamental mitochondrial quality control mechanism. However, the following points should be carefully addressed.

      Major points:

      (1) Yme1 has been previously shown to regulate mitochondria-specific autophagy through Atg32 processing. Given the high similarity of the MDC pathway to piecemeal autophagy and the fact that both pathways share some of the core components, the authors should address the involvement of Atg32 in their model. It would also be important to include a brief discussion addressing the differences between piecemeal autophagy and the MDC pathway.

      We agree that this is an important point. The reason we did not focus on Atg32 in the current manuscript is that we previously investigated the relationship between MDC formation and Atg32-dependent mitophagy and found that Atg32 is dispensable for MDC formation (Hughes et al., 2016). Based on that result, we do not anticipate that Atg32 is required for the Yme1-dependent MDC phenotypes described here. This is also consistent with the different growth conditions associated with these pathways: Atg32-dependent mitophagy is stimulated under respiratory or post-diauxic conditions, whereas MDCs do not form under the respiratory conditions that stimulate Atg32-dependent mitophagy (Hughes et al., 2016; Raghuram and Hughes, 2024).

      We will clarify this distinction in the revised manuscript. In addition, to be thorough, we plan to generate and test the Atg32-GFP variant previously shown to block Yme1-dependent Atg32 processing and mitophagy (Wang et al., 2013). This will allow us to test directly whether preventing Yme1-dependent Atg32 cleavage affects MDC formation. If successful and interpretable, we will include these data in the revised manuscript.

      (2) The Rpt3 (P215L) expression experiment is interesting, but appears to be somewhat superficial due to the unclear mechanism by which the mitochondrial network morphology is restored in these cells. Could this result be replicated in the dnm1∆ mgm1∆ double deletion mutant, which is a well-established model for mitochondrial network restoration?

      We agree that the Rpt3(P215L) experiment is best viewed as a morphology control. The purpose was to test whether abnormal mitochondrial morphology alone explains the MDC defect in yme1Δ cells. Because Rpt3(P215L) improved mitochondrial morphology but did not restore MDC formation, we interpret this as evidence that morphology alone is not sufficient.

      We attempted to generate the requested dnm1Δ mgm1Δ yme1Δ triple-mutant combination, but that strain combination has not been viable in our hands. However, we do have dnm1Δ data showing that altering mitochondrial structure can rescue some morphological features but does not restore MDC formation in yme1Δ cells. We will include these data where appropriate and clarify that this experiment is intended as a morphology control.

      (3) Figure 3E. The changes in PE levels appear to be minor. While statistically significant, the observed differences may not be physiologically relevant. More in-depth lipidomic analysis data should be presented to substantiate the authors' argument and better address the questions at hand. Related to that, could PE or PA supplementation stimulate MDC formation?

      We agree that additional lipid data would strengthen this part of the manuscript. We initially streamlined the lipid section because we had previously examined the lipid requirements for MDC formation in detail, showing that reduced mitochondrial PE can promote MDC formation, whereas cardiolipin is required (Xiao et al., 2024). However, the current study would benefit from a broader analysis of the lipid changes associated with Yme1-dependent regulation.

      In the revision, we will expand the lipid data to include additional lipid species and incorporate these results into the model. We will also add new genetic data targeting PA metabolism through the yeast Pah1/Lipin pathway. Together, these data suggest that PA accumulation or altered PA-linked lipid flux may also contribute to MDC formation. This supports a broader lipid-balance or lipid-shunting model in which reduced PE, increased PA, or altered lipid distribution between mitochondrial membranes could influence OMM remodeling through effects on membrane curvature, OMM protein organization, or mitochondrial membrane contacts.

      We agree that direct PE or PA supplementation would be a valuable experiment. We have attempted lipid supplementation but have not been able to deliver these lipids effectively to yeast cells in a way that produces interpretable results. We are therefore focusing on lipid profiling and genetic approaches that alter lipid metabolism inside the cell.

      (4) The connection between rapamycin treatment and Yme1-regulated MDC formation is unclear and puzzling and needs to be explained better.

      We agree that this connection is not fully clear. In this manuscript, rapamycin is used primarily as a robust MDC-inducing condition. Our data do not define the full pathway connecting TORC1 inhibition to Yme1-dependent mitochondrial remodeling.

      In the revision, we will either clarify this point or reduce the emphasis on rapamycin as a mechanistic entry point. Our current interpretation is that rapamycin creates a metabolic/mitochondrial state in which Yme1-dependent remodeling of lipid and membrane-organization pathways becomes important for MDC formation. Whether this involves direct regulation of Yme1, altered substrate availability, altered membrane composition, or a combination of these remains open.

      (5) The MICOS complex is clearly involved in the regulation of MDC, but the manuscript misses the mark on providing compelling evidence and a clear explanation as to how MICOS contributes to said regulation.

      We agree that the mechanism by which MICOS regulates MDC formation remains an important open question and will be a major focus of future work. Our current data show that MICOS perturbation can partially restore MDC formation in yme1Δ cells, supporting a role for MICOS in this pathway. This analysis was motivated in part by the incomplete genetic suppression achieved through the lipid pathway alone, which suggested that additional Yme1-regulated factors contribute to MDC formation.

      MICOS therefore represents a strong candidate for this additional regulatory input. However, defining whether MICOS acts through lipid distribution, OMM-IMM organization, membrane architecture, or another mechanism will require a deeper investigation than is possible within the scope of the current study. We will clarify this point in the revised manuscript and present the current findings as the beginning of a broader investigation into how MICOS contributes to MDC biogenesis.

      Minor points:

      (1) The authors should discuss potential reasons for the dramatically different rates of MDC formation in the S288C and W303 background cells. Does this have anything to do with generally more robust mitochondrial functions in the latter cells?

      We agree this is worth discussing. One likely explanation is that the difference reflects broader differences in mitochondrial activity and metabolic state between these strain backgrounds. We and others have shown that W303 cells have more robust respiratory mitochondrial function than BY/S288C-derived cells, and in our hands W303 also shows lower MDC formation. This fits our broader model that MDCs are favored in glucose-grown or metabolically perturbed cells and do not form under respiratory conditions (Raghuram and Hughes, 2024). We do not yet know the genetic basis for this difference, so we will present this as an interesting future direction.

      (2) Proper statistical analyses should be provided for all the graphs presented.

      We will add statistical analyses where missing.

      (3) The authors should include Yme1 immunoblots to confirm the identity of strains being studied and validate the presence or overexpression of Yme1 and its catalytic mutant in their experiments.

      We agree that direct validation of Yme1 protein levels will strengthen the manuscript. Our quantitative mitochondrial proteomics already confirms strong depletion of Yme1 in yme1Δ cells, and we will also include quantitative proteomics showing increased Yme1 abundance in the overexpression strain. In addition, we have now obtained a Yme1 antibody from a colleague and will include immunoblots validating Yme1 loss, re-expression, catalytic mutant expression, and overexpression where appropriate.

      Reviewer #3 (Public review):

      Summary:

      Since describing MDCs over a decade ago, the lab of the corresponding author, Hughes, has been at the forefront of further characterizing these structures. Here, they follow up on recent work (PMID: 38497895), where a screen identified Yme1 as a potential regulator of MDCs. After confirming that Yme1-ko prevents MDCs that are usually induced via various established treatments (Rapamycin, cycloheximide, Concanavalin A), the authors confirmed that the proteolytic activity of Yme1 is required. Next, using proteomics, they identified how loss of Yme1 impacts the mitochondrial proteome with and without Rapamycin treatment to induce MDCs. From this result and based on insight from other published data implicating lipids, the focused initially on the lipid transfer protein Usp2, a known target of Yme1. Here, they showed that loss of Usp2 could partially rescue MDC formation in Yme1-ko cells. To look for other Yme1 targets that might also be involved in MDC formation, next, they investigated the MICOS complex, which was also notable in their proteomics data. They then showed that inhibiting MICOS also partially restored MDC formation in Yme1-ko cells. They then tested the combined effects of Usp2 and MDC inhibition on MDCs, which was limited by the fact that the combination of full MICOS disruption, Usp2-KO, and Yme1-KO was not viable. To circumvent this limitation, they investigated the knockout of individual MICOS subunits in combination with Usp2 and/or Yme1. Finally, they showed that growth conditions also mediate MDC formation in the context of Yme1 overexpression. In rich media, Yme1 overexpression induces MDCs on its own. However, this induction is lost upon amino acid starvation, suggesting that there are still other as-yet-unidentified factors regulating the formation of MDCs.

      Strengths:

      The authors use unbiased approaches and genetic models to begin unraveling a novel regulatory role of Yme1 in the formation of MDCs.

      Weaknesses:

      (1) The authors find both Ups1 and Ups2 in their screens, but only focus on Ups2 in this paper. It would be good to know why they did not also investigate Ups1, and its other protease Atp23, which could potentially act similarly to Yme1, or even rescue the loss of Yme1.

      We agree that Ups1 and Atp23 are important to consider. We initially focused on Ups2 because its deletion partially restores MDC formation in yme1Δ cells and because of its connection to mitochondrial PE synthesis, which we had previously shown to regulate MDC formation (Xiao et al., 2024). Ups1 is more difficult to assess genetically because complete loss of UPS1, or of downstream cardiolipin synthesis through CRD1, blocks MDC formation due to the requirement for cardiolipin. Thus, an ups1Δ phenotype cannot readily reveal whether a more moderate reduction in Ups1 activity, and the resulting accumulation or redistribution of PA, might promote MDC formation.

      In the revision, we will explain this rationale and include new genetic data targeting PA metabolism through the yeast Pah1/Lipin pathway. This provides a way to test the contribution of PA accumulation without simultaneously eliminating cardiolipin synthesis, and our initial results support a role for PA-linked lipid remodeling in partially bypassing the requirement for Yme1. We will also discuss Atp23 as a potentially important regulator of Ups1 and PA metabolism. A full investigation of Atp23 will be an important direction for future work.

      (2) I'm not convinced that the data support the notion that Usp2 and MICOS have distinct effects on MDCs. In Figure S3C-D, there is no statistical analysis to indicate whether the small differences between the MICOS-ko and the double knockout are significant. If MICOS-ko and Ups2-ko were acting through different mechanisms, one would expect their combination to be additive; this does not appear to be the case, as both single deletions and the double deletion all cause similar levels of MDCs (~30-40%). Rather, this result is what you would expect if they were working through the same mechanism. There also does not appear to be an additive effect in Figure 4F-G, when using the mic60-ko rather than the complete MICOS-ko. In this regard, the authors note in their discussion that 'loss of MICOS may disrupt membrane associations or alter lipid distribution between mitochondrial subcompartments' (lines 390-392). The latter situation seems like it would be the same mechanism as Usp2 and would more accurately explain their findings.

      This is a very good point, and we agree with the reviewer’s interpretation. The lack of strong additivity is consistent with Ups2 and MICOS acting within the same pathway or converging on a shared mechanism, rather than representing two separate mechanisms of MDC regulation. We did not intend to imply that these must be independent pathways. In the revised manuscript, we will ensure that the text reflects this interpretation and will add statistical analyses to the relevant comparisons.

      (3) The manuscript is missing key data confirming the re-expression or overexpression of Yme1 protein (Figure 1 E/G and Figure 5A). It is important to know the relative levels of expression of the re-expressed proteins to each other and to endogenous Yme1.

      We agree that direct validation of Yme1 protein levels is important. Our quantitative mitochondrial proteomics already confirms strong depletion of Yme1 in yme1Δ cells, and we will also include quantitative proteomics showing increased Yme1 abundance in the overexpression strain. In addition, we have now obtained a Yme1 antibody from a colleague and will add immunoblots validating Yme1 loss, re-expression, catalytic mutant expression, and overexpression.

      (4) Some clarification of the details for metabolically restrictive conditions would be helpful.

      Thanks for this suggestion. We will clarify these conditions throughout the manuscript and figure legends and will define exactly what we mean by low-amino-acid, amino-acid-free, synthetic, and rich media conditions. More broadly, MDC formation is strongly influenced by media composition and mitochondrial metabolic state. MDCs form less efficiently in synthetic media and do not form under conditions that promote respiratory mitochondrial function (Raghuram and Hughes, 2024).

      (5) Beyond just the presence/absence of MDCs, does more detailed quantification of their size/shape reveal any subtle differences between conditions?

      This is an interesting question. In our hands, MDC size and shape are variable and appear strongly influenced by mitochondrial fission/fusion state. Conditions that favor more fused mitochondrial networks can produce larger MDC-like structures, whereas fragmented networks can produce smaller structures. So far, we have not found a simple size or shape metric that explains the Yme1/Ups2/MICOS phenotypes better than MDC frequency.

      We will clarify this point in the revised manuscript and avoid implying that MDC frequency captures every possible morphological difference. More detailed morphometric analysis of MDC size, topology, and maturation state will be an important future direction, especially as we connect lipid remodeling to membrane curvature and MDC biogenesis.

      References

      Hughes, A.L., Hughes, C.E., Henderson, K.A., Yazvenko, N., and Gottschling, D.E. 2016. Selective sorting and destruction of mitochondrial membrane proteins in aged yeast. eLife. 5. doi: 10.7554/eLife.13943.

      Raghuram, N., and Hughes, A.L. 2024. Amino acids trigger MDC-dependent mitochondrial remodeling by altering mitochondrial function. bioRxiv. 2024.07.09.602707. doi: 10.1101/2024.07.09.602707.

      Wang, K., Jin, M., Liu, X., and Klionsky, D.J. 2013. Proteolytic processing of Atg32 by the mitochondrial i-AAA protease Yme1 regulates mitophagy. Autophagy. 9(11):1828–1836. doi: 10.4161/auto.26281.

      Xiao, T., English, A.M., Wilson, Z.N., Maschek, J.A., Cox, J.E., and Hughes, A.L. 2024. The phospholipids cardiolipin and phosphatidylethanolamine differentially regulate MDC biogenesis. Journal of Cell Biology. 223(5). doi: 10.1083/jcb.202302069.

    1. eLife Assessment

      This important study investigates the peptide-binding principles of promiscuous chicken MHC molecules. The data from crystallography, mass spectrometry, and modeling are convincing. However, the presentation would benefit from streamlining and clear links between data and conclusions. This paper will be of broad interest to immunologists and those interested in vaccine development.

    2. Reviewer #1 (Public review):

      Summary:

      Combining in vitro refolding, SEC-based assembly assays, peptide-library screening, MALDI-TOF, LC-MS/MS, structural analysis and immunopeptidomics, this manuscript investigates the peptide-binding principles of the promiscuous chicken MHC-I molecule BF2*21:01.

      Strengths:

      Although the peptide motif of BF2*21:01 is highly complex, this manuscript identified several principles, including a preference for 10-mer peptides, co-variation between P2 and Pc-2, effects of P3 and Pc-3, and a strong cellular preference for Leu at Pc. The results are important for avian MHC biology and poultry vaccine epitope prediction.

      Weaknesses:

      The manuscript is sometimes difficult to follow because the authors present a large amount of peptide-library, structural and immunopeptidomics data. without always clearly explaining how these datasets support the proposed simplifying principles.

      Major Issues - Points Requiring Clarification or Additional Support:

      (1)(Line 282-301, 537-545)<br /> The immunopeptidomics conclusions are mainly based on one B21 cell line with one biological replicate and at least two technical replicates. Given the complexity of the BF2*21:01 peptide repertoire, this is a major limitation. The authors should either provide additional biological replicates or clearly state this limitation in the Abstract, Results and Discussion.

      (2) (Lines 290-313)<br /> The B21 cell preparations contain both BF2 and the lowly expressed BF1 molecule. Some peptides, especially 8-mers or peptides with atypical motifs, may derive from BF1*21:01. The authors should clarify how BF2*21:01-bound peptides were distinguished from possible BF1-derived peptides, or interpret the immunopeptidomics motif more cautiously. The authors should also provide or cite evidence confirming the B21 haplotype identity of the cell line and chicken materials used for immunopeptidomics.

      (3) (Lines 217-221, 243-253)<br /> The authors acknowledge that MALDI-TOF cannot reliably distinguish peptide combinations with identical or similar masses, nor determine residue positions in some cases. Therefore, MALDI-TOF results should not be overinterpreted as precise evidence for residue preference. The authors should clearly indicate which conclusions are supported by LC-MS/MS.

      (4) (Lines 297-301, 316-330)<br /> The authors suggest that longer peptides may bulge in the middle or extend out of the groove at the C-terminal end. The rationale for the C-terminal extension is not clearly explained. Why is the C-terminal extension considered rather than the N-terminal extension? If the binding register is uncertain, long peptides should be analyzed separately from canonical-length peptides.

      (5) (Lines 406-439)<br /> In vitro assembly assays show that several hydrophobic residues can be tolerated at Pc, whereas immunopeptidomics shows a strong Leu preference at this position. The authors should clarify whether this Leu preference reflects intrinsic BF2*21:01 binding specificity, TAP-mediated peptide transport, antigen processing, peptide loading, or a cell-line-specific effect. Additional experimental support, such as TAP transport analysis, would strengthen this conclusion.

      (6) (Lines 172-178, 243-279, 442-457)<br /> The structural analysis explains some residue combinations, such as Arg at P2 with Glu at Pc-2 or Trp at Pc. However, the structural interpretation is not fully integrated with the large-scale peptide library and immunopeptidomics results. Representative high- and low-frequency combinations should be discussed structurally.

      (7) The inference of co-variation between P2 and Pc-2, as well as the modulatory effects of P3 and Pc-3, should be better explained. At present, some conclusions appear to be based mainly on residue-frequency patterns, and the logical connection between these observations and the proposed binding principles is not always clear. Statistical analyses, such as mutual information, chi-square tests or permutation tests, and representative structural explanations would strengthen this conclusion.

    3. Reviewer #2 (Public review):

      Summary:

      The study presents an in-depth analysis of the peptide repertoire bound by a promiscuous chicken MHC molecule using mass spectrometry, x-ray crystallography and modelling. While the MHC can bind a very diverse set of peptides, the authors have found some new rules that govern peptide binding to this MHC that could help to build a predictive model to study the repertoire of pathogen-derived peptides.

      Strengths:

      The study uses a range of well performed experiment across multiple techniques and provides an in-depth analysis of the peptide repertoire, including peptide sequences, length, preferred residues, stability and MHC presentation.

      Weaknesses:

      The data overall support the analysis and conclusion well. The only caveat is linked to Figure 4, which does not describe the stability of the peptide-MHC complex, but instead shows refold yield, and the two are not always linked.

    4. Author response:

      eLife Assessment

      This important study investigates the peptide-binding principles of promiscuous chicken MHC molecules. The data from crystallography, mass spectrometry, and modeling are convincing. However, the presentation would benefit from streamlining and clear links between data and conclusions. This paper will be of broad interest to immunologists and those interested in vaccine development.

      Overall, we are delighted and grateful to the eLife editors and the two reviewers for the careful and thoughtful assessments and reviews of our paper. We are glad that the strengths of the paper were apparent and appreciated. And of course, every paper has weaknesses, especially for a story as complex as this one.

      We are making only minor changes in our revision, so we would be happy if the editors decide to evaluate the revised manuscript without involving the reviewers further.

      Before answering the comments and questions directly, perhaps a few points would help clarify why the paper is as it is.

      First, the experiments cover over three decades of work, with the first gas phase sequencing results done in 1992. Unlike some of the chicken class I alleles which immediately gave completely clear stringent motifs (B4, B12 and B15 in Wallny et al 2006 PNAS, B19 in Han et al 2023 J Immunol), we harvested nothing but confusion from the B21 class I results (Fig. 1). Initially, we thought that the lack of a clear motif for B21 was due to multiple well-expressed class I molecules but only one dominantly-expressed class I molecule was found (Wallny et al 2006 PNAS, Shaw et al 2007 J Immunol) and, to our surprise, bacterially-expressed BF2*21:01 heavy chain and b2-microglobulin refolded with two synthetic peptides without sequence in common, and the crystal structures showed that this molecule remodeled the binding site to accommodate two such disparate peptides (Koch et al 2008 Immunity). This was the beginning of our understanding of the spectrum of class I alleles from promiscuous generalists to fastidious specialists, which we have explored in a series of further papers (in particular, Chappell et al 2015 eLife, Tresgaskes et al 2016 PNAS, Kaufman 2018 Trends Immunol, Tregaskes and Kaufman 2022 Mol Immunol).

      Second, over these many years, we continued to explore the binding properties of BF2*21:01 in ever more detail, resulting in the current manuscript. We learned only slowly how to probe this unexpected promiscuity, unprecedented in the MHC literature, so that the experiments proceeded with our best understanding at the time, including taking advantage of new approaches as they become available. Each experiment built on the previous set of experiments and each brought us closer to an understanding.

      Third, having amassed a collection of data, we chose eLIFE exactly because it allows us to present the entire story from beginning to end without compromise, not just the highlights with the major points illustrated by a few main figures and with the supporting data in many supplementary figures. We include all the data, because it is all part of the story, and so interested researchers to look at the data from their own perspective. Although mostly we provide bar graphs, we include the raw data (or close to it) for the final experiments (illustrated by Figs. 10 and 18) in the single supplementary data spreadsheet, so these can be assessed easily by others in the field, perhaps using approaches that we may not feel competent to perform.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Combining in vitro refolding, SEC-based assembly assays, peptide-library screening, MALDI-TOF, LC-MS/MS, structural analysis and immunopeptidomics, this manuscript investigates the peptide-binding principles of the promiscuous chicken MHC-I molecule BF2*21:01.

      Strengths:

      Although the peptide motif of BF2*21:01 is highly complex, this manuscript identified several principles, including a preference for 10-mer peptides, co-variation between P2 and Pc-2, effects of P3 and Pc-3, and a strong cellular preference for Leu at Pc. The results are important for avian MHC biology and poultry vaccine epitope prediction.

      Weaknesses:

      The manuscript is sometimes difficult to follow because the authors present a large amount of peptide-library, structural and immunopeptidomics data. without always clearly explaining how these datasets support the proposed simplifying principles.

      We are delighted and grateful to the reviewer 1 for the careful and thoughtful comments and questions concerning our manuscript. We are glad that the strengths of the paper were apparent and appreciated, and acknowledge the weaknesses that come with such a complex story with experiments performed over decades.

      Major Issues - Points Requiring Clarification or Additional Support:

      (1) (Line 282-301, 537-545)

      The immunopeptidomics conclusions are mainly based on one B21 cell line with one biological replicate and at least two technical replicates. Given the complexity of the BF2*21:01 peptide repertoire, this is a major limitation. The authors should either provide additional biological replicates or clearly state this limitation in the Abstract, Results and Discussion.

      This limitation is clearly stated in lines 537-545, as part of a paragraph covering the various ways in which the data presented in this manuscript could be improved. In fact, we have performed immunopeptidomics of several different B21 cell types, with many replicates and found similar data as presented, giving us confidence in our interpretations. However, these other experiments belong in different stories, so it is not appropriate that the data be reported in this manuscript.

      (2) (Lines 290-313)

      The B21 cell preparations contain both BF2 and the lowly expressed BF1 molecule. Some peptides, especially 8-mers or peptides with atypical motifs, may derive from BF1*21:01. The authors should clarify how BF2*21:01-bound peptides were distinguished from possible BF1-derived peptides, or interpret the immunopeptidomics motif more cautiously. The authors should also provide or cite evidence confirming the B21 haplotype identity of the cell line and chicken materials used for immunopeptidomics.

      The concern about the contribution of BF1*21:01 to the immunopeptidomics is clearly stated in the manuscript, both lines 290-313 and as part of the paragraph describing the limitations of the experiments (lines 542-543). In fact, the expression of BF1 molecules has long been known to be less than 10% of BF2 molecules at the RNA level, and much less at the protein level (Wallny et al 2006 PNAS, Shaw et al 2007 J Immunol). The proportion of 8mers identified by immunopeptidomics is also low (Fig. 14), and it is not impossible that most 8mers are due to BF1*21:01. We have used assembly assays with peptide libraries, immunopeptidomics and a crystal structure to determine the peptide motif for typical BF1 molecules, of which BF1*21:01 is one and found it may contribute to 8mer peptides but very seldom to longer peptides. This work is unpublished but gives us confidence that the characteristics of BF2*21:01 are not misrepresented by the data in this manuscript.

      The sources of the chicken samples and the cell lines are described in detail under Materials and Methods (lines 577-590), citing relevant publications. 

      (3) (Lines 217-221, 243-253)

      The authors acknowledge that MALDI-TOF cannot reliably distinguish peptide combinations with identical or similar masses, nor determine residue positions in some cases. Therefore, MALDI-TOF results should not be over-interpreted as precise evidence for residue preference. The authors should clearly indicate which conclusions are supported by LC-MS/MS.

      As described, the experiments follow each other in temporal sequence, so that we started with single peptides, then peptide libraries that varied in one position, then peptide libraries that varied in two positions first analysed by MALDI-TOF and later by LC-MS/MS. The final experiment (Fig. 10, with the original data in the supplementary spreadsheet) directly compares MALDI-TOF and LC-MS/MS results for six peptide libraries, so that the strength of the evidence for residue preference is clear. Throughout the manuscript, we do our best to not to overstate conclusions based on the data of any particular experiment.

      (4) (Lines 297-301, 316-330)

      The authors suggest that longer peptides may bulge in the middle or extend out of the groove at the C-terminal end. The rationale for the C-terminal extension is not clearly explained. Why is the C-terminal extension considered rather than the N-terminal extension? If the binding register is uncertain, long peptides should be analyzed separately from canonical-length peptides.

      When the first sequence of a chicken class I cDNA was determined, an immediate mystery was why one of the so-called invariant residues that coordinate the N- and C-termini of the bound peptide is not conserved (Kaufman et al 1992 J Immunol). In fact, this residue Tyr at position 86 in HLA-A2 and the equivalent position in all mammalian classical class I molecules is an Arg in the classical class I molecules of all non-mammalian vertebrates and is common with class II molecules (Kaufman et al 1995 Semin Immunol). Similar to class II molecules, this Arg in chicken class I molecules allows the peptide to extend out of the C-terminus, as shown by a crystal structure (Xiao et al 2018 J Immunol). The concern that we might be misidentifying the C-terminal amino acid was the basis for the analysis in Figs. 23 and 24, but in the absence of crystal structures, we are not able to provide a final answer this question. Perhaps relevant is the fact that a chicken class II molecule can bind exactly the same peptide in two conformations, one with a canonical 9mer core and the other with an unexpected 10mer core (Goryanin et al 2026 J Virol).

      By contrast, N-terminal extensions are only found for some class I alleles and thus far depend on the substitution of small amino acid sidechains for W166 (Li et al 2011 J Virol for bovine, Ma et al 2020 J Immunol for Xenopus, Wei et al 2022 J Immunol for ovine). Thus far, no chicken BF2 sequences have this substitution, consonant with the many crystal structures, including those for BF2*21:01 (Koch et al 2008 Immunity, Chappell et al 2015 eLlife, this manuscript). However, in unpublished data, we find that most BF1 sequences have sequence differences that could allow N-terminal extensions, although we have no crystal structures to support this possibility.

      (5) (Lines 406-439)

      In vitro assembly assays show that several hydrophobic residues can be tolerated at Pc, whereas immunopeptidomics shows a strong Leu preference at this position. The authors should clarify whether this Leu preference reflects intrinsic BF2*21:01 binding specificity, TAP-mediated peptide transport, antigen processing, peptide loading, or a cell-line-specific effect. Additional experimental support, such as TAP transport analysis, would strengthen this conclusion.

      The preference for Leu at the final position of the peptide by immunopeptidomics of the B21 cell line is strong but not absolute and is certainly affected at the least by the length of the peptide (Figs. 23 and 24). Unpublished immunopeptidomics results (mentioned above) show that this is not a cell line-specific result. The evidence from assembly assays of various peptides is that several hydrophobic amino acids are tolerated with sufficient stability of BF2*21:01 that they are detected in the assay (Figs. 3, 5, 9 and 10). Thermostability assays (Fig. 6) show that peptides with these same hydrophobic amino acids are stable to at least body temperature of chickens. These experiments show that such stability is peptide-dependent (that is, whether a particular amino acid is tolerated depends on the stability conferred by the rest of the peptide). Finally, peptide translocation assays using B21 cells have been done (Tregaskes et al 2016 PNAS) and show that peptides with several hydrophobic amino acids can be pumped into the lumen of the endoplasmic reticulum. However, the assays are with single synthetic peptides, so the data are not extensive enough to separate the effects of the final amino acid from the rest of the peptide. Certainly, peptides with amino acids other than Leu at the C-terminus can be translocated. So, it is not yet clear at which point the preference for Leu at the C-terminus of the peptide arises.

      (6) (Lines 172-178, 243-279, 442-457)

      The structural analysis explains some residue combinations, such as Arg at P2 with Glu at Pc-2 or Trp at Pc. However, the structural interpretation is not fully integrated with the large-scale peptide library and immunopeptidomics results. Representative high- and low-frequency combinations should be discussed structurally.

      Six crystal structures show that BF2*21:02 remodels the binding to accommodate a variety of anchor residues (Koch et al 2008 Immunity, Chappel et al 2015 eLife). These crystal structures are representative of sequences found by the immunopeptidomics from very frequent (H-E at roughly 15% 8-12mers) to moderately frequent (E-L at roughly 6% 8-12mers) to infrequent (N-F, A-D and E-D at roughly 1.5%, 1.6% and 0.7% 8-12mers) based on Fig. 18. All but one of the structures has Leu at the C-terminus, with the last one having Val which is found but not frequently by immunopeptidomics.

      Similar numbers are found by LC-MS/MS of double-substitution libraries of the two original peptide sequences in Fig. 10 with H-E found frequently (8.1% in P390, 3.8% in P498) and the others infrequently (0.1, 0.9, 1.0, 0.3% in P390, 0, 1.4, 1.0, 0.3% in P498), as calculated from the numbers in the Supplementary data spreadsheet. As discussed in the manuscript, for single-substitution peptide libraries of the two original peptides, Ile/Leu at the C-terminus was very frequent but at the same or slightly less level as Phe, with Met less frequent and Val even less so (Fig. 7).

      In addition, there are two more structures along with models explicitly testing some substitutions (Fig. 5). Attempting more current modelling approaches, we found AlphaFold 3 was unable to correctly predict most of the conformations that are found in the crystal structures of BF2*21:01, so we don’t feel confident in using them to predict unknown structures of this kind.

      (7) The inference of co-variation between P2 and Pc-2, as well as the modulatory effects of P3 and Pc-3, should be better explained. At present, some conclusions appear to be based mainly on residue-frequency patterns, and the logical connection between these observations and the proposed binding principles is not always clear. Statistical analyses, such as mutual information, chi-square tests or permutation tests, and representative structural explanations would strengthen this conclusion.

      We endeavored to do our best to explain the data, our interpretations and our reasoning, so we apologise if we have not managed to be as clear as might be desired. We have included as close to raw data as possible for the LC-MS/MS and MALDI-TOF (Fig. 10) and for the immunopeptidomics (Fig. 14 and 18) in the Supplementary Data spreadsheet, exactly so that competent practitioners can carry out further analyses (including the sophisticated statistical tests mentioned).

      Reviewer #2 (Public review):

      Summary:

      The study presents an in-depth analysis of the peptide repertoire bound by a promiscuous chicken MHC molecule using mass spectrometry, x-ray crystallography and modelling. While the MHC can bind a very diverse set of peptides, the authors have found some new rules that govern peptide binding to this MHC that could help to build a predictive model to study the repertoire of pathogen-derived peptides.

      Strengths:

      The study uses a range of well performed experiment across multiple techniques and provides an in-depth analysis of the peptide repertoire, including peptide sequences, length, preferred residues, stability and MHC presentation.

      Weaknesses:

      The data overall support the analysis and conclusion well. The only caveat is linked to Figure 4, which does not describe the stability of the peptide-MHC complex, but instead shows refold yield, and the two are not always linked.

      We are grateful for the clear understanding of the strengths of the work. With regards to Fig. 4, we agree with the reviewer that there are differences in refold yield but that measure may not be correlated with stability of the peptide-MHC complex. However, we were basing our interpretation of stability on the position and quality of the monomer peak, as illustrated by the trace in Fig. 2, in which a sharp peak at the monomer position represents a stable complex (as seen for the 10 and 11mer peptides) and later peaks represent unstable complexes falling apart during the chromatography (as seen for the 7, 8 and 9mer peptides).

    1. eLife Assessment

      This study presents a valuable contribution to comparative cognitive neuroscience by directly mapping functional homologues of the human multiple-demand network in macaques using a matched spatial maze task. However, the evidence is incomplete due to methodological asymmetries in task design and preprocessing parameters that warrant careful consideration. The work will be of interest to researchers studying the evolution of cognitive control and cross-species neuroimaging.

    2. Reviewer #1 (Public review):

      Summary:

      The "multiple-demand" (MD) system is a well-known finding of human brain imaging and is thought to play a central role in cognitive control. To directly compare the MD system in humans and monkeys, Mione et al. used functional magnetic resonance imaging to measure whole-brain activation in a multi-step saccadic maze task. In humans, the authors found a distributed pattern of brain activity close match to the canonical MD network and extends to adjacent regions of dorsal attention and other networks. While there was good correspondence between monkey and human data, differences were also notable in the lateral frontal cortex, the dorsal parietal cortex, and the sensorimotor cortex.

      Strengths:

      Though previous data hint at a corresponding network in the macaque, there has been no direct comparison to human data. This study provides a direct cross-species comparison with whole-brain data from fMRI, and the findings suggest an extended and strongly interconnected brain network recruited by increased cognitive challenge.

      Weaknesses:

      In previous human imaging, the MD system is defined by overlapping activation for many kinds of cognitive demands. In the present work, however, the authors used just a single task. Although there is some overlap between the putative monkey MD network and the canonical MD network identified in human imaging, there should be caution in linking current findings to the MD system based on limited task events.

    3. Reviewer #2 (Public review):

      Summary:

      Mione et al. aim to resolve a long-standing question in comparative neuroscience: whether the macaque brain contains a functional analogue to the distributed human multiple-demand (MD) network. To address this, the authors employ a direct cross-species fMRI comparison using a multi-step saccadic maze task in humans and a simplified two-step version in macaques. By contrasting goal-directed navigation against a control condition that requires similar motor responses but no strategic planning, the study isolates the neural signatures of cognitive control across species.

      Strengths:

      The most compelling aspect of this work is its methodological alignment. Previous attempts to compare these systems often relied on comparisons of human BOLD signals and macaque single-unit recordings. By running parallel fMRI protocols, the authors establish a shared measurement basis that allows for a more direct comparison. The resulting activation maps clearly demonstrate conserved network topology across dorsomedial frontal, lateral, and medial parietal, and insula cortices. Combining these results with recent research on functional and structural connectivity further supports the idea that these networks evolved across species and provides a helpful starting point for future comparative studies. The findings will be highly useful for researchers investigating the evolutionary origins of domain-general cognitive control, as well as for neuroimaging methodologists developing cross-species alignment pipelines.

      Weaknesses:

      However, there are several differences in how the two groups were studied that make it harder to compare the results precisely. The human task mixed 2-, 4-, and 6-step trials within the same experimental blocks, whereas macaques performed only 2-step trials. This design difference likely places human participants in a state of sustained proactive cognitive control (Braver, 2012), as they must remain prepared for highly demanding trials at any moment. This elevated baseline arousal may artificially inflate MD network activation during the simpler 2-step trials in humans, making direct magnitude comparisons with the macaque data difficult. Additionally, the general linear model combined correct and error trials into a single regressor. Given that macaques exhibited substantially higher error rates, this approach risks diluting task-specific planning signals with activity related to error monitoring and reward prediction errors. The preprocessing pipeline also applied a 4 mm full-width half-maximum smoothing kernel to macaque data acquired at 1.5 mm resolution. Relative to the smaller size of the macaque brain, this kernel is quite large and likely blurs fine-grained topographical distinctions. This may partly explain why the macaque lateral frontal cortex shows a single dorsal activation patch rather than multiple discrete patches seen in humans. Furthermore, there is concerning inter-individual variability in the macaque data. Normally, a functional network like the MD system is identified by consistent activation across all individuals. In this study, however, the two monkeys show substantially different activation maps and behavioral patterns. This lack of consistency renders the group-level results questionable, as it is unclear whether the group-level map represents a unified biological system or merely an average of disparate individual maps. Finally, the subcortical activations shown in Figure 7 require more precise anatomical localization to confidently distinguish cerebellar nodes from adjacent brainstem structures.

      The authors demonstrate a broad functional correspondence between human and macaque cognitive control networks, moving the field beyond speculative homology. The data suggest that an extended, interconnected network is recruited by cognitive challenge in both species; however, the strength of this claim is limited by the inter-individual variability and methodological constraints noted above. Assertions of precise topological equivalence should therefore be tempered. The absence of ventrolateral prefrontal and strong dorsal parietal activations in the macaque group analysis may reflect genuine biological differences, but could also stem from limited statistical power, excessive smoothing, or task-design asymmetries. While the overall conclusions are plausible, they would be significantly strengthened by a more explicit discussion of these limitations and additional analytical clarifications regarding individual-level consistency.

    1. eLife Assessment

      This valuable study provides the first broad cross-species evolutionary analysis of the pir multigene family in malaria parasites, showing that the family evolved through rapid duplication and loss while retaining a small number of conserved orthologs with essential functions. The authors identify pirC1 as a key determinant of parasite growth across multiple Plasmodium species. However, the work remains incomplete because the mechanistic role of PIRCl and its precise subcellular localization are not directly resolved.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript entitled "Essential function reflected in the phylodynamics of a multigene family - the pir genes of malaria parasites" by Jackson and colleagues investigates the global phylogeny of pir genes across 14 Plasmodium species and one Hepatocystis species. The authors also focus on the functional characterization of the conserved ortholog pirC1 and claim that pirC1 is not the founder of the family and that it plays an essential role in blood-stage growth.

      Strengths:

      Overall, the manuscript is well written and interesting, as it combines comparative genomics and evolutionary analysis with functional experiments. The phylogenetic analysis is rigorous and represents a major strength of the manuscript.

      Weaknesses:

      The general conclusions regarding the potential function of this gene family are not fully supported by the data presented. The manuscript moves too quickly from growth phenotype and localization studies to a specific mechanistic model. The discussion argues that PIRC1 may be involved in nutrient acquisition, host sensing, or metabolic support, but the data provided do not directly support these functions, and the manuscript in its present form remains speculative. Although the manuscript includes some experimental results, it lacks direct mechanistic validation of the specific functions of the pir genes, including pirC1. In its current form, the study does not yet establish a definitive role for pirC1 in metabolic processes.

    3. Reviewer #2 (Public review):

      Summary:

      This is an extensive study using phylogenetic comparison across multiple plasmodium species to gain new insights in relation to their evolutionary pathways and the potential function of pir. In addition to establishing a framework to identify related orthologues across species as well as expanding paralogues families within a species, the work also focuses on understanding loss and gain of different PIRs and how this indicates a relative lack of functional constraints and essentiality for most members of the gene family.

      The authors provide evidence that at least pirC has a conserved function and plays an important role in parasite growth in multiple species.

      While this study represents a significant effort and does provide interesting new insights that would help our understanding of this complex gene family in the future, it has a number of limitations.

      Strengths:

      Extensive and thorough phylogenetic analysis that is supported by some biological validation. Provides an indication that the PIR gene family has limited biological constraints and evolved independently across different species, leading to rapid expansion and deletion of orthologous groups. Identified pirC as a functional and important member of the family that is conserved across the species.

      Weaknesses:

      The phylogenetic tree is based on a truncated sequence that focuses on the more conserved parts of the pir sequence. This could potentially lead to missing the key functional drivers of evolution. The biological validation of the role of pirC has some inconsistencies that need to be addressed.

    4. Reviewer #3 (Public review):

      This paper aims to classify, from an evolutionary perspective, the multigene family PIR found in malaria parasites infecting rodents and Old World monkeys, and to link this classification to functional diversification. The authors also hypothesize that PIR members conserved across species play important roles in parasite survival, and seek to clarify their functions.

      To achieve these aims, the authors comprehensively analyze the evolution of PIR genes using genomic and transcriptomic information from many malaria parasite species. They focus on PIRC1, a member conserved across species, and attempt to clarify its function in rodent and simian malaria parasites by examining the phenotypes of parasites in which the corresponding genetic locus has been disrupted. They also attempt to determine its localization using PIRC1 tagged with an epitope sequence. However, although the locus-disrupted parasites appear to show an approximately 50% reduction in growth rate, this effect seems to be overestimated. Another weakness is that the cause of the reduced growth rate has not been clarified. The localization analysis also remains insufficiently conclusive.

      Therefore, I consider that the first half of the paper, consisting of the bioinformatics analyses, achieves the objective of comprehensively summarizing PIR and may become a reference paper for discussing the evolution and function of the PIR gene family. On the other hand, regarding the function of PIRC1, no clear conclusion can be drawn from the results presented, and several additional experiments are necessary.

      My major comments are as follows.

      (1) The claim that the failure of eight disruption attempts indicates that pirC1 is essential is too strong.

      Lines 319-321: The authors argue that a total of eight failed attempts to disrupt the pirC1 locus using two different construct designs suggest that pirC1 is essential in P. berghei. However, the failure of these attempts could also reflect technical issues with the construct design itself, such as the length of the homologous regions used for recombination, which are approximately 650 bp. Therefore, it is an overstatement to conclude that "pirC1 is essential for P. berghei blood-stage growth." Given that parasites with disruption of the corresponding locus could be obtained in both P. chabaudi and P. knowlesi, a more appropriate statement would be that "pirC1 is important for P. berghei blood-stage growth."

      (2) The data on the mCherry-expressing P. berghei line shown in Supplementary Figure 11 are insufficient.

      (a) Panel C: Southern blot analysis<br /> To conclusively identify the lower band in panel C as chromosome 1, additional probes specific to genes located on chromosomes 1 and 2 would be required. In addition, a parental parasite control should also be included. The Southern blot image of the parental parasite should show only a single band at the higher position, with no band at the lower position. Probes specific to chromosomes 1 and 2 would help demonstrate that the lower band corresponds to chromosome 1, rather than chromosome 2.

      To this end, the authors could describe the result as follows:<br /> "In the parental parasite, only a single band corresponding to chromosome 7 was detected, indicating that the smaller chromosome was genetically modified. The size of the lower band detected with the dhfr probe was identical to that of the band detected with the control chromosome 1 probe, but distinct from that detected with the chromosome 2 probe, indicating that chromosome 1 was modified."

      That said, this chromosome-level Southern blot analysis is not sufficient to demonstrate that the target PBANKA_0100500 locus was specifically modified. The authors should provide more direct evidence showing that the PBANKA_0100500 locus, rather than another genomic locus, was modified. For example, Southern blot analysis after restriction enzyme digestion would provide more definitive evidence. Diagnostic PCR may also provide more specific evidence.

      (b) Panel D: Flow cytometry analysis

      To allow a more accurate interpretation of the percentage of mCherry-positive cells, flow cytometry data for the parental parasite line should also be presented.

      (3) There are unclear points in the PCR results shown in Supplementary Figure 12.

      Supplementary Figure 12: In panel B, a PCR product should also be amplified from dPCHAS_0101200 using the P1-P3 primer pair. Why is this band absent? The authors should provide the uncropped electrophoresis image so that the larger band can be seen. In addition, if labels 1 and 2 indicate independent clones, this should be stated in the figure legend.

      (4) The growth rates of P. chabaudi and P. knowlesi parasites with disruption of the PIRC1 gene locus should be quantitatively analyzed.

      The growth rates of P. chabaudi and P. knowlesi are described only qualitatively, but they should be evaluated quantitatively. In Figure 4A, the parasitemia of wild-type P. chabaudi increases from approximately 6.1% on day 6 to approximately 15.6% on day 8, corresponding to a 3.8-fold increase. However, because parasite growth may already be affected by immune-mediated suppression at this stage, this value should be regarded as a minimum estimate. In contrast, the mutant increases from approximately 3.2% on day 8 to approximately 6.8% on day 10, corresponding to a 2.1-fold increase. Based on these values, the daily growth rate of the mutant appears to be reduced to at least approximately 56% of that of the wild type. Similarly, from the growth curve of P. knowlesi in Fig. 5A, the DMSO-treated group appears to increase approximately two-fold per day, whereas the rapamycin-treated group increases only approximately one-fold per day. Thus, P. knowlesi also appears to show an approximately 50% reduction in growth rate. Taken together, both P. chabaudi and P. knowlesi appear to reproducibly show an approximately 50% reduction in growth capacity. A reduction of this magnitude is difficult to describe as a "severe growth defect"; a more appropriate wording would be simply that the parasites "showed a growth defect." In addition, the terms "a severe growth defect" and "essential" appear to be overstated throughout the manuscript, and the wording should be toned down. Finally, I recommend presenting Figure 4A and Figure 5A on a logarithmic scale so that the trend in growth rates can be more intuitively appreciated from the graphs.

      (5) The evidence that disruption of the PIRC1 gene locus in P. knowlesi does not affect erythrocyte invasion is weak.

      The authors describe that "the developmental cycle of the parasites lacking PIRCl is slightly longer than that of parasites that produce PIRCl (line 383-384)," and appear to support this interpretation with data showing that "mutant parasites are significantly smaller than wild-type parasites (line 414)" and that "the DNA content in ML10-arrested parasites lacking PIRCl is lower than that of DMSO-treated parasites (line 417-418)" at 24 hours after invasion. However, a slightly longer developmental cycle alone does not seem sufficient to explain a 50% growth reduction.

      I think the erythrocyte invasion capacity has not been quantitatively evaluated, and therefore, the evidence supporting the conclusion that the phenotype of P. knowlesi parasites with disruption of the PIRC1 gene locus is unrelated to erythrocyte invasion is weak. The authors should assess invasion efficiency using purified merozoites. For P. chabaudi, it should also be possible to apply an in vitro or in vivo erythrocyte invasion assay similar to that used for other rodent malaria parasites, and this should be evaluated as well.

      (6) The authors should examine whether disruption of the PIRC1 gene locus results in a phenotype characterized by a reduced number of merozoites.

      Alternatively, the reduced DNA content in ML10-arrested parasites lacking PIRC1 (lines 416-417) could suggest that the number of merozoites formed per schizont may be reduced. To clarify this point, the authors should assess whether the number of merozoites per schizont is altered in P. knowlesi (and P. chabaudi parasites lacking PIRC1).

      (7) The authors propose the possibility that PIRC1 expressed in merozoites is released after invasion; however, the evidence that PIRC1 localizes to intracellular organelles is weak.

      Line 333: "a peripheral pattern around the parasite" is indicative of parasite plasma membrane, PV, or PVM. ", indicative of a parasitophorous vacuole (PV) or parasitophorous vacuole membrane (PVM) location" should be amended to ", indicative of parasite plasma membrane, a parasitophorous vacuole (PV) or parasitophorous vacuole membrane (PVM) location". In the Figure S14 image, red signals are uniformly detected from the merozoites formed in the schizont stage parasite (not really microorganelle patterns), but not from the PVM surrounding the schizont, suggesting parasite plasma membrane localization, not PVM. I agree that the signal is detected from the compartments extending into the iRBC cytosol, which may be difficult to explain if it is located on the parasite plasma membrane, but how frequently were such images seen?

      Figure 4D. In the images of liver-stage schizonts, AMA1 does not appear to localize to the micronemes in mature merozoites, suggesting this image is an immature schizont. Although PIRC1 appears to be expressed in liver-stage schizonts, it is difficult to clearly determine whether it localizes to intracellular organelles or to the parasite plasma membrane.

      To clarify the above points, the authors should examine whether PIRC1 is detected in intracellular organelles or around the merozoites by analyzing its localization in purified merozoites.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript entitled "Essential function reflected in the phylodynamics of a multigene family - the pir genes of malaria parasites" by Jackson and colleagues investigates the global phylogeny of pir genes across 14 Plasmodium species and one Hepatocystis species. The authors also focus on the functional characterization of the conserved ortholog pirC1 and claim that pirC1 is not the founder of the family and that it plays an essential role in blood-stage growth.

      Strengths:

      Overall, the manuscript is well written and interesting, as it combines comparative genomics and evolutionary analysis with functional experiments. The phylogenetic analysis is rigorous and represents a major strength of the manuscript.

      Weaknesses:

      The general conclusions regarding the potential function of this gene family are not fully supported by the data presented. The manuscript moves too quickly from growth phenotype and localization studies to a specific mechanistic model. The discussion argues that PIRC1 may be involved in nutrient acquisition, host sensing, or metabolic support, but the data provided do not directly support these functions, and the manuscript in its present form remains speculative. Although the manuscript includes some experimental results, it lacks direct mechanistic validation of the specific functions of the pir genes, including pirC1. In its current form, the study does not yet establish a definitive role for pirC1 in metabolic processes.

      The reviewer is correct that there is no definitive proof for the function of the PIRC1 protein. We speculate that this protein is involved in a metabolic process based on mutant phenotype – small, poorly developed parasites that do not produce the same amount of DNA as wildtype parasites (and hence likely fewer merozoites). That this occurs in an in vitro culture of Plasmodium knowlesi rules out a role in the interaction with the host organism, such as sequestration or facilitating passage through the spleen. The localization of the protein outside of the parasite is consistent with a role in nutrient uptake, but we agree that additional experiments are required to determine the role of the protein definitively. We aim to look at the differences in the transcriptome and the metabolome to gain more insight into the pirC1 phenotype; this should reveal metabolic deficiencies in the mutant parasite.

      Reviewer #2 (Public review):

      Summary:

      This is an extensive study using phylogenetic comparison across multiple plasmodium species to gain new insights in relation to their evolutionary pathways and the potential function of pir. In addition to establishing a framework to identify related orthologues across species as well as expanding paralogues families within a species, the work also focuses on understanding loss and gain of different PIRs and how this indicates a relative lack of functional constraints and essentiality for most members of the gene family.

      The authors provide evidence that at least pirC has a conserved function and plays an important role in parasite growth in multiple species.

      While this study represents a significant effort and does provide interesting new insights that would help our understanding of this complex gene family in the future, it has a number of limitations.

      Strengths:

      Extensive and thorough phylogenetic analysis that is supported by some biological validation. Provides an indication that the PIR gene family has limited biological constraints and evolved independently across different species, leading to rapid expansion and deletion of orthologous groups. Identified pirC as a functional and important member of the family that is conserved across the species.

      Weaknesses:

      The phylogenetic tree is based on a truncated sequence that focuses on the more conserved parts of the pir sequence. This could potentially lead to missing the key functional drivers of evolution. The biological validation of the role of pirC has some inconsistencies that need to be addressed.

      The reviewer is correct. We do not use the repetitive parts of the pir gene sequences for the phylogeny. We define these as the ‘distal variable’ and ‘proximal’ domains of the protein in Fig. S1, results text and supplementary results. We remove these parts from the alignment because they are only nominally homologous (they cannot be aligned) and so break the basic assumption of phylogenetic analysis. Amino acid repeats evolve quickly and are homoplasic (their similarities do not reflect ancestry) so omitting them is correct and makes the phylogeny more reliable. While these features do not contribute to the phylogenetic estimate, we propose in the results text and Fig. S3, in agreement with the reviewer, that they are an important demonstration of how pirs have differentiated and what is different between the subfamilies. The reviewer is also correct that we have considered the whole gene sequence when comparing Alphafold predictions and in selection analyses of closely related sequences (in these cases, the repeat sequences can be aligned).

      A structural prediction for the sequence used in the alignment would mostly reflect the distal conserved domain but would be misleading because the alignment combines conserved regions that are not physically attached in reality. We will clarify these points.

      Reviewer #3 (Public review):

      This paper aims to classify, from an evolutionary perspective, the multigene family PIR found in malaria parasites infecting rodents and Old World monkeys, and to link this classification to functional diversification. The authors also hypothesize that PIR members conserved across species play important roles in parasite survival, and seek to clarify their functions.

      To achieve these aims, the authors comprehensively analyze the evolution of PIR genes using genomic and transcriptomic information from many malaria parasite species. They focus on PIRC1, a member conserved across species, and attempt to clarify its function in rodent and simian malaria parasites by examining the phenotypes of parasites in which the corresponding genetic locus has been disrupted. They also attempt to determine its localization using PIRC1 tagged with an epitope sequence. However, although the locus-disrupted parasites appear to show an approximately 50% reduction in growth rate, this effect seems to be overestimated. Another weakness is that the cause of the reduced growth rate has not been clarified. The localization analysis also remains insufficiently conclusive.

      Therefore, I consider that the first half of the paper, consisting of the bioinformatics analyses, achieves the objective of comprehensively summarizing PIR and may become a reference paper for discussing the evolution and function of the PIR gene family. On the other hand, regarding the function of PIRC1, no clear conclusion can be drawn from the results presented, and several additional experiments are necessary.

      My major comments are as follows.

      (1) The claim that the failure of eight disruption attempts indicates that pirC1 is essential is too strong.

      Lines 319-321: The authors argue that a total of eight failed attempts to disrupt the pirC1 locus using two different construct designs suggest that pirC1 is essential in P. berghei. However, the failure of these attempts could also reflect technical issues with the construct design itself, such as the length of the homologous regions used for recombination, which are approximately 650 bp. Therefore, it is an overstatement to conclude that "pirC1 is essential for P. berghei blood-stage growth." Given that parasites with disruption of the corresponding locus could be obtained in both P. chabaudi and P. knowlesi, a more appropriate statement would be that "pirC1 is important for P. berghei blood-stage growth."

      It is correct that we cannot rule out that the inability to delete the pirC1 gene is Plasmodium berghei is unrelated to an essential function. We are happy to change the text to the suggested description.

      (2) The data on the mCherry-expressing P. berghei line shown in Supplementary Figure 11 are insufficient.

      (a) Panel C: Southern blot analysis

      To conclusively identify the lower band in panel C as chromosome 1, additional probes specific to genes located on chromosomes 1 and 2 would be required. In addition, a parental parasite control should also be included. The Southern blot image of the parental parasite should show only a single band at the higher position, with no band at the lower position. Probes specific to chromosomes 1 and 2 would help demonstrate that the lower band corresponds to chromosome 1, rather than chromosome 2.

      To this end, the authors could describe the result as follows:

      "In the parental parasite, only a single band corresponding to chromosome 7 was detected, indicating that the smaller chromosome was genetically modified. The size of the lower band detected with the dhfr probe was identical to that of the band detected with the control chromosome 1 probe, but distinct from that detected with the chromosome 2 probe, indicating that chromosome 1 was modified."

      That said, this chromosome-level Southern blot analysis is not sufficient to demonstrate that the target PBANKA_0100500 locus was specifically modified. The authors should provide more direct evidence showing that the PBANKA_0100500 locus, rather than another genomic locus, was modified. For example, Southern blot analysis after restriction enzyme digestion would provide more definitive evidence. Diagnostic PCR may also provide more specific evidence.

      Although we are confident that the parasites has been modified in the expected way, we are planning to generate PCR data confirming that the mCherry tag is correctly integrated into PBANKA_010050.

      (b) Panel D: Flow cytometry analysis

      To allow a more accurate interpretation of the percentage of mCherry-positive cells, flow cytometry data for the parental parasite line should also be presented.

      We will repeat the flow cytometry experiments and include a wildtype strain in the analysis.

      (3) There are unclear points in the PCR results shown in Supplementary Figure 12.

      Supplementary Figure 12: In panel B, a PCR product should also be amplified from dPCHAS_0101200 using the P1-P3 primer pair. Why is this band absent? The authors should provide the uncropped electrophoresis image so that the larger band can be seen. In addition, if labels 1 and 2 indicate independent clones, this should be stated in the figure legend.

      We will gladly supply the full, uncropped electrophoresis image and we will clarify what the numbers indicate in the legend.

      (4) The growth rates of P. chabaudi and P. knowlesi parasites with disruption of the PIRC1 gene locus should be quantitatively analyzed.

      The growth rates of P. chabaudi and P. knowlesi are described only qualitatively, but they should be evaluated quantitatively. In Figure 4A, the parasitemia of wild-type P. chabaudi increases from approximately 6.1% on day 6 to approximately 15.6% on day 8, corresponding to a 3.8-fold increase. However, because parasite growth may already be affected by immune-mediated suppression at this stage, this value should be regarded as a minimum estimate. In contrast, the mutant increases from approximately 3.2% on day 8 to approximately 6.8% on day 10, corresponding to a 2.1-fold increase. Based on these values, the daily growth rate of the mutant appears to be reduced to at least approximately 56% of that of the wild type. Similarly, from the growth curve of P. knowlesi in Fig. 5A, the DMSO-treated group appears to increase approximately two-fold per day, whereas the rapamycin-treated group increases only approximately one-fold per day. Thus, P. knowlesi also appears to show an approximately 50% reduction in growth rate. Taken together, both P. chabaudi and P. knowlesi appear to reproducibly show an approximately 50% reduction in growth capacity. A reduction of this magnitude is difficult to describe as a "severe growth defect"; a more appropriate wording would be simply that the parasites "showed a growth defect." In addition, the terms "a severe growth defect" and "essential" appear to be overstated throughout the manuscript, and the wording should be toned down. Finally, I recommend presenting Figure 4A and Figure 5A on a logarithmic scale so that the trend in growth rates can be more intuitively appreciated from the graphs.

      It should be possible to determine the growth rate of the wildtype and mutant P. knowlesi parasites. In addition, we can change the text to reflect that although there is a growth phenotype in the two species in which we obtained mutants, the parasites do have the capacity to replicate. Note that in the case of P. knowlesi, the parasites numbers in vitro do not increase, hence any additional factors that decrease the growth rate, such as immune system and spleen, will lower the reproductive rate further and render the mutant parasite unable to proliferate.

      (5) The evidence that disruption of the PIRC1 gene locus in P. knowlesi does not affect erythrocyte invasion is weak.

      The authors describe that "the developmental cycle of the parasites lacking PIRCl is slightly longer than that of parasites that produce PIRCl (line 383-384)," and appear to support this interpretation with data showing that "mutant parasites are significantly smaller than wild-type parasites (line 414)" and that "the DNA content in ML10-arrested parasites lacking PIRCl is lower than that of DMSO-treated parasites (line 417-418)" at 24 hours after invasion. However, a slightly longer developmental cycle alone does not seem sufficient to explain a 50% growth reduction.

      I think the erythrocyte invasion capacity has not been quantitatively evaluated, and therefore, the evidence supporting the conclusion that the phenotype of P. knowlesi parasites with disruption of the PIRC1 gene locus is unrelated to erythrocyte invasion is weak. The authors should assess invasion efficiency using purified merozoites. For P. chabaudi, it should also be possible to apply an in vitro or in vivo erythrocyte invasion assay similar to that used for other rodent malaria parasites, and this should be evaluated as well.

      We can further investigate the invasion phenotype of the mutant P. knowlesi parasites. The presence of a clear phenotype during the intraerythrocytic stage indicates that the protein also has a role after invasion, but we agree that determining the effect on invasion directly will be useful.

      Alternatively, the reduced DNA content in ML10-arrested parasites lacking PIRC1 (lines 416-417) could suggest that the number of merozoites formed per schizont may be reduced. To clarify this point, the authors should assess whether the number of merozoites per schizont is altered in P. knowlesi (and P. chabaudi parasites lacking PIRC1).

      We aim to count merozoites and the level of invasion, which will allow us to determine the reproductive rate of the mutant parasites.

      (7) The authors propose the possibility that PIRC1 expressed in merozoites is released after invasion; however, the evidence that PIRC1 localizes to intracellular organelles is weak.

      Line 333: "a peripheral pattern around the parasite" is indicative of parasite plasma membrane, PV, or PVM. ", indicative of a parasitophorous vacuole (PV) or parasitophorous vacuole membrane (PVM) location" should be amended to ", indicative of parasite plasma membrane, a parasitophorous vacuole (PV) or parasitophorous vacuole membrane (PVM) location". In the Figure S14 image, red signals are uniformly detected from the merozoites formed in the schizont stage parasite (not really microorganelle patterns), but not from the PVM surrounding the schizont, suggesting parasite plasma membrane localization, not PVM. I agree that the signal is detected from the compartments extending into the iRBC cytosol, which may be difficult to explain if it is located on the parasite plasma membrane, but how frequently were such images seen?

      To determine the localization of the protein in the merozoite, we will image P. knowlesi merozoites.

      Figure 4D. In the images of liver-stage schizonts, AMA1 does not appear to localize to the micronemes in mature merozoites, suggesting this image is an immature schizont. Although PIRC1 appears to be expressed in liver-stage schizonts, it is difficult to clearly determine whether it localizes to intracellular organelles or to the parasite plasma membrane.

      This is a valuable comment. It is difficult to impossible to determine the exact localization of the protein at this stage, irrespective of the exact stage of the parasite. It is clear from the images is that the protein is not secreted at this stage. The main aim of the experiment was to determine whether the protein is produced by the parasite during the liver stage, which the results confirm.

      To clarify the above points, the authors should examine whether PIRC1 is detected in intracellular organelles or around the merozoites by analyzing its localization in purified merozoites.

      This we aim to do.

    1. eLife Assessment

      This important manuscript presents the Crunchometer, an open-source and low-cost acoustic system for high-resolution quantification of biting and chewing in mice. The work addresses a need for reliable measures of food consumption and feeding microstructure, and the tool has broad relevance for studies of ingestive behavior, appetite circuits, hypothalamic function, and pharmacological interventions. The evidence supporting the methodological advance is convincing, and the Crunchometer outputs were carefully validated against human observer scoring, reliably distinguished biting and chewing events, and captured changes in feeding behavior across different foods, physiological states, and semaglutide treatment. The study also demonstrates that the system can reveal biologically meaningful features of feeding, including meal structure, bite and chew dynamics, and altered consumption patterns after pharmacological manipulation. A significant additional contribution is the identification of previously unrecognized meal-related neurons in the lateral hypothalamus, providing novel circuit-level insight into solid food consumption and naturalistic feeding behavior. Although some neuroscience conclusions remain more preliminary than the methodological validation, the study provides strong evidence for the utility of the Crunchometer and will be of interest to researchers studying ingestive behavior, hypothalamic circuits, and metabolic regulation.

    2. Reviewer #1 (Public review):

      This is an interesting and valuable paper by Gil-Lievana, Arroyo et al. that presents an open-source method (the "Crunchometer") for quantifying biting and chewing behavior in mice using audio detection. The work addresses an important and unmet need in the field: quantitative measures of feeding behavior with solid foods, since most prior approaches have been limited to liquids. The authors make a clear and compelling case for why this problem is important, and I fully agree with their motivation.

      The system is carefully validated against human-scored video data and is shown to be at least as accurate, and in some cases more accurate, than human observers. This is a major strength of the study. I also particularly appreciate the demonstration of the technology in the context of LHA circuitry, which nicely illustrates its utility and importance for mechanistic studies of feeding. I also appreciate the ability to readily time lock neural data to individual crunches. Overall, the manuscript is well executed and represents a useful contribution to the field.

      Comments on revised version.

      The revised manuscript has addressed my minor initial concerns. I appreciate that the sample size was increased for the recording experiments.

    3. Reviewer #2 (Public review):

      Summary:

      The authors set out to develop and validate the Crunchometer, a low-cost, open-source acoustic system designed to overcome the limitations of existing methods for studying feeding behavior in rodents. Their goal was to provide a tool that could precisely capture the microstructure of solid food intake, something often overlooked in favor of liquid-based assays, while being affordable, scalable, and compatible with neural recording techniques. By doing so, they aimed to enable detailed analysis of how physiological states, drugs, and specific neural circuits shape naturalistic feeding behaviors.

      Strengths:

      (1) Introduces a low-cost, open-source acoustic tool for measuring solid food intake, filling a critical gap left by expensive and proprietary systems.

      (2) Makes the method easily adoptable across labs with detailed setup instructions and shared benchmark datasets.

      (3) Provides high temporal precision for detecting bite events compared to human observers.

      (4) Successfully distinguishes feeding microstructure (bites, bouts, IBIs, gnawing vs. consumption) with greater objectivity than manual annotation.

      (5) Demonstrates compatibility with electrophysiology and calcium imaging, enabling fine-scale alignment of neural activity with feeding behavior.

      (6) Effectively discriminates between fed vs. fasted states, validating physiological sensitivity.

      (7) Captures pharmacological effects of semaglutide, although this is really just reduced feeding and associated readouts (bouts, latency, etc.)

      (8) Has potential to distinguish consummatory vs. non-consummatory behaviors (e.g., food spillage, gnawing), however the current SVM model struggles to separate biting from gnawing due to similar acoustic profiles and manual validation is still required.

      (9) Provides potential for closed-loop experiments

      Weaknesses:

      (1) Some neuroscience findings (calcium imaging of GABAergic vs. glutamatergic neurons) are based on small pilot samples (n=2 mice per condition), limiting generalizability.

      (2) Chemogenetic and pharmacological experiments used small cohorts, raising statistical power concerns.

      (3) Correlation with actual food intake is modest and sometimes less accurate than human observers

      (4) Sensitive to hoarding behavior, which can reduce detection accuracy and requires manual correction for misclassifications (e.g., tail movements, non-food noises). However, these limitations are discussed and not ignored.

      Comments on revised version.

      The authors have addressed all my comments and have put forth a creative, accurate approach to assessing food intake in rodents.

    4. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This valuable manuscript presents an open-source and low-cost acoustic system for quantifying biting and chewing in mice. The approach is carefully validated against human observers, demonstrating strong methodological reliability and enabling high-resolution analysis of feeding microstructure. The tool has broad relevance for studies of appetite circuits and pharmacological interventions. An important contribution is the identification of previously unrecognized "meal-related" neurons in the lateral hypothalamus, providing novel biological insight into solid food consumption. While the support for the methodological advances is compelling and robust, some circuit-level conclusions are preliminary or incomplete, relying on small pilot samples and manual classification, and should be interpreted with caution. This paper will be of interest to those interested in ingestive behavior and/or the hypothalamus.

      We thank the reviewers for their careful reading and constructive comments, which have substantially strengthened the manuscript. In the revised version, we have addressed every suggestion and introduced the following major additions: New experiments. We added one additional Vglut2 mouse to the calcium imaging cohort, achieving 386 neurons (Figure 8), and three naive Vgat mice with unilateral DREADD injections (Supplementary Fig. 5-1). New analyses. We performed ROC analyses on all feeding- and licking-related responses of n = 79 LH GABAergic and n = 386 LH glutamatergic neurons (Figures 7D-F and 8D-F). We also characterized the robustness of the Crunchometer to additive white-noise injection (Supplementary Fig. 1-2). New supplementary material. Three new supplementary figures have been added in total (Supplementary Figs. 1-2, 5-1, and 6-1). Supplementary Fig. 6-1 provides instructions for building a 1-Hz pulse generator that blinks an LED in synchrony with the video. Software improvements. We upgraded the original MATLAB scripts to an App GUI version, migrated the full codebase from MATLAB to Python, and packaged it as fully standalone executables for macOS (Apple Silicon) and Windows both of which run without a MATLAB license.

      Our point-by-point responses to the reviewers' comments are in red below. Deletions are omitted for brevity. We hope that the revisions fully address the points raised and render the manuscript suitable for publication.

      Public Reviews:

      Reviewer #1 (Public review):

      This is an interesting and valuable paper by Gil-Lievana, Arroyo et al. that presents an open-source method (the "Crunchometer") for quantifying biting and chewing behavior in mice using audio detection. The work addresses an important and unmet need in the field: quantitative measures of feeding behavior with solid foods, since most prior approaches have been limited to liquids. The authors make a clear and compelling case for why this problem is important, and I fully agree with their motivation.

      The system is carefully validated against human-scored video data and is shown to be at least as accurate, and in some cases more accurate, than human observers. This is a major strength of the study. I also particularly appreciate the demonstration of the technology in the context of LHA circuitry, which nicely illustrates its utility and importance for mechanistic studies of feeding. I also appreciate the ability to readily time-lock neural data to individual crunches. Overall, the manuscript is well-executed and represents a useful contribution to the field.

      We thank you for your appreciation of the Crunchometer and its alignment with ephys:

      To further facilitate alignment with neuronal activity, we have now also included a schematic diagram of the pulse generator used to blink an LED in synchronization with the video (see the new Supplementary Fig. 6-1).

      The comments I have are largely minor and should be straightforward to address:

      (1) The authors should report sample sizes for all mouse cohorts, either alongside the statistics or in the figure legends for mean data.

      We apologize for this oversight. We have now included all sample sizes in the figure captions.

      (2) Clarification is needed as to whether crunch detection fidelity is influenced by the hardness or softness of the food. The focus here is on standard pellets, with some additional high-fat pellet data, but it would be useful to know how generalizable the method is across different textures.

      We thank the reviewer for this important observation. Because the Crunchometer depends on bites generating an audible acoustic signal, food hardness directly impacts detection fidelity. Hard, brittle foods are readily detected, whereas soft foods such as jelly, pudding, or peanut butter are unlikely to produce a reliably detectable signal. This is a genuine scope limitation of the method, and we now make it explicit in the manuscript (see below).

      Regarding the two diets used in our study, Chow and HFD pellets differ only slightly in consistency, with HFD being marginally softer. These differences proved too subtle to separate acoustically: the intensity (dB) and spectral content of bites on the two diets were closely overlapping. Accordingly, when we trained an SVM on audio features alone, it could not reliably discriminate Chow from HFD bites.

      Importantly, the Crunchometer does not need to resolve food identity from sound, because audio and video play complementary roles in the system: the acoustic channel confirms that a bite occurred, while the mouse's position within the food-specific ROI determines which food was consumed. This division of labor is what allows per-diet attribution despite acoustically similar pellets.

      We have added to the Result section:

      “The Crunchometer, therefore, does not need to infer food identity acoustically: audio confirms that a bite occurred, and the mouse's position within a food-specific ROI identifies which food was consumed. This design enables per-diet attribution even for pellets with indistinguishable crunch signatures.”

      We fully agree with the reviewer that the study of solid-food consumption should not be restricted to standard murine diets. Foods with naturalistic textures, for example, the Granny Smith apple, chocolate, and salted peanuts used by O'Connell et al. (2025), span a much wider range of hardness and elasticity than Chow vs. HFD, and would likely generate more clearly differentiated acoustic signatures. We hypothesize that the Crunchometer could generalize to such foods to the extent that each food produces a clear and distinct acoustic pattern, and even where acoustic signatures overlap, ROI-based spatial attribution would continue to resolve food identity as long as each food is presented at a separate, trackable location.

      To make this scope explicit for readers, we have added the following clarification to the Behavioral Protocol section:

      "Our study is limited to the acoustic detection of standard Chow and HFD pellets, both of which exhibit a firm, brittle consistency. Future work should evaluate the fidelity of the Crunchometer across a broader range of food textures, encompassing varying degrees of hardness and elasticity, as explored by O'Connell et al. (2025)."

      (3) The authors should comment on how susceptible the Crunchometer is to background noise. For example, how well does it perform in the presence of white noise, experimenter movement, or other task-related sounds?

      We thank the reviewer for this valuable comment. The Crunchometer performs reliably in controlled, low-noise environments, but like any acoustic detection system, it is vulnerable to interference from sounds whose spectral content overlaps with the bite-related frequency band (500–950 Hz). To quantify this vulnerability, we stress-tested both the threshold-based and SVM-based detection methods by adding white noise to the original audio recordings at progressively decreasing amplitudes and measuring how detection performance degraded as the signal-to-noise ratio decreased. We found that the threshold-based method was more robust to white-noise contamination than the SVM-based method, maintaining acceptable detection performance at lower SNR values before degrading [see the new Supplementary Fig. 1-2].

      First, the white noise amplitude is generated as follows:

      Where L<sub>𝑛𝑜𝑖𝑠𝑒</sub> is the desired amplitude of the White Noise in dB. Then, the audio signal was range-normalized to its absolute maximum value, and the white noise was added with its desired amplitude, as shown by the following formula:

      (4) Chemogenetic activation of LHA GABAergic neurons is used. DREADD-based activation may strongly drive these neurons in a way that is not directly comparable to optogenetic or more physiological manipulations. While I do not think additional experiments are required, it would strengthen the discussion to briefly acknowledge this limitation.

      We thank the Reviewer for this thoughtful observation, which we agree with. Chemogenetic activation of LHA GABAergic neurons via DREADDs does not reproduce the physiological firing dynamics of these neurons along several dimensions: it imposes a sustained, tonic drive lasting hours after CNO administration; it likely produces firing rates above the endogenous range; and it lacks the fine temporal structure, phasic bursts, behaviorally- phased locked activity that these neurons exhibit during natural feeding episodes.

      We recognize, however, that this limitation is not unique to chemogenetics. Optogenetic approaches likewise fail to reproduce endogenous activity, as they impose synchronous, high-frequency activation patterns on a single cell type that are unlikely to occur under physiological conditions. Moreover, as we previously described in a phenomenon our laboratory termed optoception (Luis-Islas et al., 2022), optogenetic stimulation can itself generate signals perceptible to the animal, adding a further interpretive caveat. Thus, both techniques depart from physiological activity.

      For these reasons, we interpret our findings as evidence that activation of LHA GABAergic neurons is sufficient to drive the observed behavioral effects, without claiming that the endogenous firing pattern encodes these behaviors in the same manner or with the same dynamics imposed by our manipulation. We have now added a brief statement to the Discussion acknowledging this limitation explicitly:

      “A methodological consideration is that chemogenetic activation via DREADDs imposes a sustained, supra-physiological drive that does not reproduce the temporal structure of endogenous LHA GABAergic activity during feeding; optogenetic manipulations share analogous limitations (see optoception; Luis-Islas et al., 2022). Our findings, therefore, establish that activation of this neuronal population is sufficient to produce uncontrolled feeding and gnawing, without implying that its endogenous firing encodes them in the same manner.”

      Reviewer #2 (Public review):

      Summary:

      This manuscript introduces the Crunchometer, a low-cost, open-source acoustic platform for monitoring the microstructure of solid food intake in mice. The Crunchometer is designed to overcome the limitations of existing methods for studying feeding behavior in rodents. The goal was to provide a tool that could precisely capture the microstructure of solid food intake, something often overlooked in favor of liquid-based assays, while being affordable, scalable, and compatible with neural recording techniques. By doing so, the authors aimed to enable detailed analysis of how physiological states, drugs, and specific neural circuits shape naturalistic feeding behaviors.

      Strengths:

      The study's strengths lie in its clear innovation, methodological rigor in validation against human annotation, and demonstration of broad utility across behavioral and neuroscience paradigms. The approach addresses a significant methodological gap in the field by moving beyond liquid-based feeding assays and provides an accessible tool for precisely dissecting ingestive behavior. The system is validated across multiple contexts, including physiological state (fed vs. fasted), pharmacological manipulation (semaglutide), and circuit-level interventions (chemogenetic activation of LH neurons), and is further shown to integrate seamlessly with both electrophysiology and calcium imaging.

      (1) Introduces a low-cost, open-source acoustic tool for measuring solid food intake, filling a critical gap left by expensive and proprietary systems.

      (2) Makes the method easily adoptable across labs with detailed setup instructions and shared benchmark datasets.

      (3) Provides high temporal precision for detecting bite events compared to human observers.

      (4) Successfully distinguishes feeding microstructure (bites, bouts, IBIs, gnawing vs.

      consumption) with greater objectivity than manual annotation.

      (5) Demonstrates compatibility with electrophysiology and calcium imaging, enabling fine-scale alignment of neural activity with feeding behavior.

      (6) Effectively discriminates between fed vs. fasted states, validating physiological sensitivity.

      (7) Captures the pharmacological effects of semaglutide, although this is really just reduced feeding and associated readouts (bouts, latency, etc).

      (8) Has potential to distinguish consummatory vs. non-consummatory behaviors (e.g., food spillage, gnawing); however, the current SVM model struggles to separate biting from gnawing due to similar acoustic profiles, and manual validation is still required.

      (9) Provides potential for closed-loop experiments.

      Weaknesses:

      Several limitations temper the strength of the conclusions: the supervised classifier still requires manual correction for gnawing, generalizability across different setups is limited, and the neuroscience findings, particularly calcium imaging of GABAergic and glutamatergic neurons, are based on small pilot samples. These issues do not undermine the value of the tool, but mean that the neural circuit findings should be interpreted as preliminary.

      We sincerely thank the Reviewer for the careful and generous reading of our manuscript, and particularly for recognizing the methodological gap that the Crunchometer seeks to fill. We appreciate the acknowledgment that the tool's validation spans physiological, pharmacological, and circuit-level contexts, and that its integration with electrophysiology and calcium imaging was considered seamless. The Reviewer has also accurately identified the three main limitations of the current version of the platform, which we address in turn below:

      (1) The supervised SVM classifier still requires manual correction for gnawing.

      We agree with the Reviewer. The acoustic signatures of biting (consummatory) and gnawing (non-consummatory manipulation of the pellet) share overlapping linear spectrotemporal features that our SVM exploits for discrimination. This overlap reflects a genuine biomechanical similarity (both involve incisor contact with the pellet surface) rather than a shortcoming of the classifier per se. In ongoing work toward Crunchometer 2.0, we are addressing these limitations. The Crunchometer 2.0 will incorporate more sophisticated deep learning algorithms, such as ResNet, to better exploit non-linear features. Also, we are currently collecting a larger database of bite, gnawing, and environmental noise sounds across different setups, microphones, and conditions to build a more robust dataset for training new AI algorithms that can discriminate between gnawing and biting and generalize more robustly across microphones and behavioral setups. This effort will also be important for developing a closed-loop version of the Crunchometer to detect bites in real time and trigger an actuator (e.g., a laser). But we agree that, for the present manuscript, gnawing classification remains the weakest link in the pipeline.

      Nevertheless, we think that having a human in the loop is an advantage (not a disadvantage) of the equipment, as it improves the quality of database curation. No matter how sophisticated future algorithms become, human intervention will remain essential. To this end, we have now developed a human-validation GUI that further facilitates human revision of snippets through an intuitive, easy workflow, reducing human effort (Author response image 1).

      Author response image 1.

      The visual validator GUI allows a human to verify and reclassify snippets into the correct category in a friendly interface.

      (1) Generalizability across different setups is limited.

      This is a fair concern and one we have taken seriously, as noted above, and one we have already recognized. The acoustic signal captured by the Crunchometer is inherently sensitive to the geometry and material of the box, microphone placement, the ambient noise floor of the vivarium or experimental room, and the hardness of the specific pellet batch. To mitigate this, we have 1) released the full hardware specifications and bill of materials so that other laboratories can reproduce the acquisition geometry, and 2) provided the benchmark dataset and trained classifier weights so that groups using comparable setups can deploy the tool directly. We have already acknowledged that the SVM does not always generalize across setups. In this regard, we have now shown that the threshold method is more resistant to white-noise contamination (see new Supplementary Fig. 1–2) and, in our experience in the lab, it performs robustly across multiple setups and conditions we have tested. More importantly, improved algorithms are currently under development in our laboratory.

      (1) Some neuroscience findings (calcium imaging of GABAergic vs. glutamatergic neurons) are based on small pilot samples (n=2 mice per condition), limiting generalizability.

      (3) The neuroscience findings (calcium imaging of GABAergic and glutamatergic LH neurons) are based on small pilot samples.

      The Reviewer is correct, and we appreciate the comment. As noted in the manuscript, we explicitly state in the Results and Discussion that these findings are presented as preliminary. As the Reviewer noted, these findings do not undermine the value of the Crunchometer; we fully agree. The calcium imaging experiments were designed as a proof-of-concept to demonstrate that the temporal precision of the Crunchometer is sufficient to align neural activity with individual bite events, rather than as a definitive circuit-level characterization of LH GABAergic and glutamatergic populations during feeding. Nevertheless, we have now increased the number of Vglut2 mice by 1, bringing the total number of glutamatergic neurons to 386. We have now also performed a formal quantification of all the experiments recorded in Vgat (n=2, three sessions, 79 neurons) and Vglut2 (n=3, 6 sessions, 386 neurons). This new formal analysis uncovers neurons selectively tuned to liquid, solid, and both food types. A fully powered characterization of these two populations is underway in our laboratory, once funding arrives in the lab, and will be reported in a dedicated follow-up study.

      (2) Chemogenetic and pharmacological experiments used small cohorts, raising statistical power concerns.

      The chemogenetic experiments were conducted with a modest sample size (n = 4 bilaterally infected mice). Nevertheless, the data revealed a robust, reproducible behavioral effect consistent across all four subjects. The primary aim of this study was to illustrate the potential utility of the Crunchometer using complementary experimental approaches, including chemogenetic activation of GABAergic neurons in the lateral hypothalamic area (LHA). To further address this concern, we have now included three additional transgenic mice with unilateral infections and obtained results comparable to those of the bilateral condition. These new data are presented in a new supplementary figure comparing unilateral and bilateral infections (Supplementary Fig. 5-1). Notably, chemogenetic activation of LHA GABAergic neurons promoted eating-related consummatory behaviors to a similar extent under both unilateral and bilateral DREADD activation. Accordingly, we have now added the following text to the Results section:

      “Notably, unilateral DREADD infections in other naïve n=3 Vgat-cre mice yielded results comparable to bilateral infections. While the effect size was slightly reduced with unilateral administration, the difference between the two delivery methods was not statistically significant (Supplementary Fig. 5-1)”

      (3) Correlation with actual food intake is modest and sometimes less accurate than human observers.

      We agree that this result highlights the complexity of feeding behavior, influenced by factors such as hoarding and spillage. The threshold method detects feeding behavior solely based on the magnitude of bite-related sounds (e.g., when the mouse bites the pellet close to the microphone), whereas human observers incorporate additional visual information to infer feeding behavior even in the absence of detectable chewing sounds, introducing variability in detection criteria. Although the number of bouts identified by the Threshold method was comparable to those annotated by human observers, the estimated duration (Bout Size) of those detections differed. This discrepancy likely reflects some inconsistency in the detection criteria among human observers and delays in identifying the onset. Moreover, instances of mice chewing pellets without consuming them (i.e., spillage) were observed. These events were often misclassified as feeding bouts, resulting in false positives for both the threshold method and human observers.

      (4) Sensitive to hoarding behavior, which can reduce detection accuracy and requires manual correction for misclassifications (e.g., tail movements, non-food noises). However, these limitations are discussed and not ignored.

      We thank the reviewer for this constructive comment and for acknowledging that we explicitly discuss these limitations rather than overlook them. Indeed, gnawing and hoarding behaviors (together with tail movements and non-food noises) are factors that can reduce the accuracy of feeding detection. Even using the Crunchometer, an accurate measurement of solid-food consumption therefore remains challenging, which further supports the inclusion of a human-in-the-loop step to ensure a high-quality, well-curated database. Accordingly, we have added the following sentence to the Result section:

      "This human validation was essential for ensuring the high fidelity of our behavioral database and mitigating the inherent limitations of automated classification."

      Conclusion:

      Overall, this is an exciting and impactful methodological advance that will likely be widely adopted in the field. I recommend minor revisions to clarify the limits of classifier generalizability, better contextualize the small-sample neuroscience findings as pilot data, and discuss future directions (e.g., real-time closed-loop applications).

      We thank you for your constructive comments.

      Reviewer #3 (Public review):

      Summary:

      The manuscript provides detailed information on the construction of open-source systems to monitor ingestive behavior with low-cost equipment. Overall, this is a welcome addition to the arsenal of equipment that could be used to make measurements. The authors show interesting applications with data that reveal important neurophysiological properties of neurons in the lateral hypothalamus. The identification of previously unknown "meal-related" neurons in the LH highlights the utility of the device and is a novel insight that should spark further investigation on the LH. This manuscript and videos provide a wealth of useful information that should be a must-read for anyone in the ingestive behavior or hypothalamus fields.

      A scholarly introduction to the history and utility of various ways feeding is measured in rodents is provided. One point - the microstructure of eating solid food - has been studied extensively (for one of many studies, see https://doi.org/10.1371/journal.pone.0246569 ). However, I agree that the crunchometer will allow for more people to access recordings during food intake and temporally lock consummatory behavior to neural activity.

      Apologize for this oversight. This is indeed an important reference for the microstructure of eating solid food in a social context. We have now included it in the Introduction of this reference “Food intake in social contexts is a more ethologically valid model, in which radio-frequency identification (RFID) transponders enable the simultaneous assessment of feeding behavior across multiple mice in a single box (Rathod and Fulvio, 2021)”

      Questions on results:

      (1) It is unclear why 10% sucrose solution was used as a liquid instead of water, given that the study is focusing on the solid food source.

      One motivation for using sucrose rather than water alone was to create a highly palatable environment and to test whether mice would prefer palatable liquid sucrose over HFD. However, the choice of liquid stimulus will ultimately depend on the end user and the specific experimental conditions of each lab implementing the Crunchometer. Future versions of the apparatus could also incorporate multiple sippers to deliver several tastants alongside solid food.

      (2) It is unclear how essential the human verification is in the pipeline - results for Figure 1 keep referring to the verification as essential. Is that dispensable once the ML algorithms have been trained?

      Human validation, also referred to as a human-in-the-loop approach, is a deliberate design feature of the Crunchometer rather than a limitation (also see answer to Reviewer 2). The outputs of machine-learning algorithms, no matter how accurate, require expert corroboration to confirm or reject the specific behaviors under study, particularly when the behavioral repertoire is as heterogeneous as feeding (which encompasses sniffing, gnawing, biting, hoarding, and manipulating the food item). For this reason, we view human oversight as a safeguard for scientific rigor that remains valuable even as more advanced algorithms (e.g., deep learning and convolutional neural networks) are incorporated into future versions of the pipeline. As noted above, we have implemented a graphical user interface (GUI) that enables batch sorting and rapid inspection of multiple snippets (using a photographic montage view strategy), substantially reducing manual curation time.

      (3) The ability to extrapolate food quantity consumed is limited, with high variability. This limitation does not undercut the utility of the crunchometer, but should be highlighted as one of the parameters that are not suitable for this system. This limitation should be added to the limitations section.

      We thank the reviewer for this constructive observation. We fully agree that, although the Crunchometer reliably detects feeding events and their temporal microstructure (bouts, meals, and latencies), extrapolating absolute food quantity consumed from acoustic signals is indirect and carries substantial variability and should not be the primary readout for studies that require precise gravimetric measurements. As recommended, we have now explicitly listed this limitation in the Limitations section of the Discussion:

      "While the Crunchometer provides accurate temporal detection of bites and feeding microstructure, the estimation of absolute food mass consumed from bite-related acoustic signals shows considerable variability across trials and subjects. This limitation arises from individual differences in gnawing patterns, food fragmentation, and hoarding behavior. Accordingly, the Crunchometer is best suited for analyses of feeding dynamics and behavioral microstructure, whereas studies requiring precise quantification of ingested mass should complement the system with direct gravimetric measurements for example, real-time weighing of feeders."

      (4) The ability to discriminate between gnawing and consummatory behavior is a strength (Figure 5), and these findings are important. However, it is unclear what can be made of mice that have 'gnawing' behavior in the fasted state (like in Figure 3). It seems they would need to be eliminated from the analysis with this tool?

      We apologize for this misunderstanding. We have now more clearly indicated in Figure 3A that the cumulative feeding time reflects only Chow and HFD feeding bouts, excluding gnawing.

      We now state: “The lower panel shows the cumulative feeding time (only for Chow and HFD pellets, gnawing is excluded) over a two-hour session for the fed (green) and fasted (purple) groups (n = 6 mice).”

      Under normal physiological conditions, gnawing is an infrequent behavior in rodents. In our study, however, its frequency increased in the fasted state a change possibly attributable to heightened stress. This behavior was further exacerbated by chemogenetic manipulation, driving it to non-physiological levels.

      (5) Why is there a post-semaglutide fed group and not a fasted group in Figure 4? It seems both would have been interesting, as one could expect an effect on feeding even 24h after semaglutide treatment. This would help parse the preference better because the animals eat such a small amount of semaglutide, that it is hard to compare to the fasted condition with saline treatment.

      We thank the reviewer for this insightful suggestion. It would have been interesting to include a fasted post-semaglutide group, as it could provide relevant information about the lasting effect of an acute administration of semaglutide. However, we decided not to include this additional experimental condition because the semaglutide fasted mice displayed a markedly reduced food intake during the experimental session. An additional post-semaglutide fasted session would have required a prolonged food restriction (at least 24 hours), which we consider an unnecessarily stressful condition for the mice. Therefore, we decided to feed the mice once the experiment was completed. Nevertheless, we believe that comparing the food intake (grams) between the fed group shown in Figure 3C and the post-semaglutide fed group reported in Figure 4D provides insight into the lasting effect of semaglutide. The comparison reveals a remarkable reduction of food intake in the post-sem fed mice relative to the fed group, suggesting that the acute administration of semaglutide suppresses the feeding behavior for up to 24 hours.

      (6) The identification of 'meal-related' neurons in the LH is another strength of the manuscript. Although there is currently insufficient data, could similar recordings be used to give a neurophysiological definition of a 'meal' duration/size? Typically, these were somewhat arbitrarily defined behaviorally. Having a neural correlate to a 'meal' would be a powerful tool for understanding how meals are involved in overall caloric intake.

      We thank the reviewer for this insightful suggestion. We agree that the traditional behavioral criteria for defining meals, typically derived from log-survivor analyses of inter-pellet or inter-lick intervals, are operationally useful but ultimately arbitrary, and that a neurophysiologically grounded definition would be a valuable complement for the field.

      Our current dataset was not designed to formally establish such a definition, and we want to be cautious about the logic of the problem: validating a neural criterion solely against the behavioral one it would replace is circular. A genuinely neural definition of a meal would need to be anchored to independent criteria, for example, its ability to predict the latency and size of the subsequent meal, its correspondence with post-prandial satiety markers, or its response to anorectic agents such as GLP-1 receptor agonists. This is a methodologically nontrivial undertaking that we believe deserves a dedicated follow-up study.

      As preliminary evidence that such a problem is tractable, we note that the meal-related LH neurons identified here display sustained activity with onset and offset dynamics that broadly parallel the behaviorally defined meal boundaries (Figure 6), suggesting that meal structure is reliably encoded at the population level. A related approach, using neural activity to segment ingestive behavior at finer temporal scales, has been successful in our previous work on licking microstructure in the nucleus accumbens (Tellez, et al. 2012), and we consider the present findings a natural extension of that line of research to the larger meal timescale.

      (7) The conclusion in the title of Figure 8 is premature, given the pilot nature and small number of neurons and mice sampled.

      We appreciate this comment and agree with the reviewer. Accordingly, we have performed additional experiments on the Vglut2 glutamatergic population, in some cases using three-plane recordings, which substantially increased the yield to 386 glutamatergic neurons. As the reviewer anticipated, we observed a broad diversity of response profiles in this population, including neurons selective for liquid licking, for solid food intake, and for both food types. We also formally quantified these responses using ROC analysis, applying the same procedure to the Vgat GABAergic neurons (n = 79). These new findings have been incorporated into the revised manuscript (Results and Discussion). We thank the reviewer for prompting this extension of the analysis (see Manuscript).

      Conclusion:

      Overall, this report on the Crunchometer is well done and provides a valuable tool for all who study food intake and the behaviors around food intake. Clarification or answers to the points above will only further the utility and understanding of the tool for the research community. I am excited to see the future utility of this tool in emerging research.

      We sincerely thank the Reviewer for these kind and encouraging words, and for the constructive feedback provided throughout the review. The clarifications and additional analyses prompted by these comments have substantially improved the manuscript, and we share the Reviewer's enthusiasm about the potential of the Crunchometer to contribute to future research on feeding behavior.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) The authors have done a phenomenal job with the Introduction, highlighting the need for this tool, citing the history of feeding measurement systems and their relative strengths and weaknesses.

      Thank you for your comment; we greatly appreciate your positive feedback.

      (2) A limitation of Automated Pellet Dispensers is the possibility that the animals fail to consume the pellet after it has been retrieved from and registered by the device, potentially constraining accuracy.

      We address this issue in the Introduction, specifically, we wrote:

      “Current methods to monitor feeding behavior could be classified into four different classes…3) Automated Pellet Dispensers: Often integrated into operant conditioning chambers, these devices provide a controlled way of delivering food pellets. While devices like the open-source Feeding Experimentation Device (FED3) (Ali and Kravitz, 2018; Matikainen-Ankney et al., 2021), a pellet dispenser, are useful for measuring reinforcement, they alter the natural feeding patterns of mice, for example, requiring a simple action, such as a nose-poke can reduce overeating and weight gain in mice (Barrett et al., 2025). A further limitation is that FED3 may overestimate consumption if an animal retrieves and registers a pellet without actually consuming it. A significant strength of this method is its ability to enable closed-loop optogenetic stimulation concurrent with neuronal recordings.”

      (3) I really appreciate the data in Figure 2G, where they displayed the results of an "outlier" animal, as behavior is extremely variable, and it's useful to see how this system deals with the variability of the subjects. This is again highlighted by mouse number 5 in Figure 3A, which exhibited profound gnawing behavior.

      We thank the reviewer for this positive comment. Our decision to include the outlier animal in Fig. 2G and to report the atypical gnawing behavior of mouse 5 in Fig. 3A reflects a deliberate commitment to documenting inter-individual variability, which we consider a core strength rather than a limitation of behavioral work. We believe that such cases are particularly informative for evaluating the robustness of automated monitoring systems under behavioral-lab conditions.

      (4) It would be useful to know if the mice had prior exposure to HFD, as I found it surprising that many animals consumed the chow at all, sometimes completely ignoring the HFD (fasted mouse 3). I only ask because in our experience, mice with constant exposure to both HFD and chow predominantly, if not always, consume the HFD over chow. This could have something to do with the way the food substrates are presented in this chamber.

      We thank the reviewer for this point. Mice in this experiment did receive prior exposure to both Chow and HFD during the habituation phase, with at least two 30-min sessions in the experimental chamber with both diets available (no video was collected at this stage). The Chow and HFD feeders were identical in geometry, position, and accessibility, so we do not consider either environmental novelty or spatial bias to be the main driver of the pattern. Rather, we interpret the strong chow preference of fasted mouse 3 as a case of residual neophobia toward the HFD pellet. Since performing these experiments, we have refined our habituation protocol: pre-exposing animals to a single HFD pellet in their home cage, a familiar and safe environment, prior to any chamber session, greatly mitigates HFD neophobia in our hands. Familiarity with the novel food in a safe context thus appears to be the critical factor, rather than the duration of exposure in the experimental chamber. We have added this refinement to the Methods as a recommendation for future users of the Crunchometer.

      “Behavioral protocol. All mice were habituated to the Crunchometer for 2 days before the recording session. Each habituation session lasted 30 minutes, during which two food pellets were placed in the chamber: one standard Chow pellet (LabDiet 5008) and one highly palatable high-fat diet (HFD) pellet (Research Diet, D12451). As a practical note, we recommend allowing the HFD to equilibrate to room temperature before the experiment and pre-exposing mice to a single HFD pellet in their home cage to attenuate neophobia prior to testing.”

      (5) The authors claim saline or semaglutide was administered immediately before the start of the behavioral experiment, but given the time it takes for this drug to blunt appetite, I was somewhat surprised it led to such a rapid decrease in both chow and HFD intake. Could the authors comment on this? How quickly do these animals experience the malaise associated with these drugs? Also, this dose seems to be on the very high side, so I imagine it's making the animals feel quite sick and is probably a big reason why the effects last so long into the post-sem measurements. Was bodyweight tracked across this treatment? I'm not so convinced that sema treatment led to a loss of strong HFD preference, as the chow intake was already very low to begin with, and as mentioned above, it looks like the drug just led to a cessation of all intake. I'd just tamp down this claim of preference switch. It clearly reduced intake of both substrates, it's just harder to detect for the chow because it was already so low to begin with.

      Thank you for these comments. We agree with the Reviewer and have toned down the claim regarding a switch in HFD/chow preference. In the revised Results section, we now explicitly acknowledge that further characterization is needed using chronic semaglutide treatment. Specifically, we added the following sentence:

      "Future studies should use the Crunchometer to characterize changes in HFD/chow preference during 24-h monitoring under chronic semaglutide treatment."

      In addition, we administered a single subcutaneous dose of semaglutide at 30 nmol/kg (0.123 mg/kg), following the protocol described by Zhang et al. (2023). In their study, pharmacokinetic analyses showed that plasma concentrations, measured by an ELISA assay that immunoreacts with both growth differentiation factor 15 (GDF15) and the intact N-terminal region of glucagon-like peptide-1 (GLP-1), increased shortly after administration of the 30 nmol/kg dose in C57BL/6 mice. Peak plasma concentration (Cmax = 43.1 nmol/L) was reached at 6.7 hours (Tmax), and levels returned to baseline by 24 hours post-administration, indicating complete drug clearance. Although this dose is relatively high, it was intentionally selected to produce a robust acute response from a single administration, as our objective was to assess the drug’s effects within a short, 2-hour observational window. Under these conditions, we observed a rapid reduction in food intake immediately following the onset of Crunchometer recording. While we do not exclude the possibility that these effects could be more pronounced over longer observation periods or with chronic dosing regimens, our study was strictly limited to a single acute exposure.

      Although semaglutide is known to suppress food intake through multiple mechanisms, including stress and malaise measured by Conditioned Taste Aversion and release of stress hormones (Teixidor-Deulofeu et al., 2025), we do not believe that discomfort or malaise played a significant role in our study. While the mice did reduce their food intake during semaglutide administration, this reduction persisted for at least 24 hours after the final dose—at which point the drug was no longer present—suggesting a satiety-driven effect rather than one mediated by aversion. In this sense, previous studies have demonstrated that semaglutide continues to suppress food intake even when the aversive pathway mediated by Area Postrema GLP1R neurons is inhibited. Although blocking this pathway reduces flavour aversion, the anorexic effect remains, indicating that suppression of intake can be driven by satiety independently of nausea or malaise (Huang et al., 2024). In summary, although we selected a relatively high dose to ensure a detectable acute effect within our experimental window, this choice was grounded in previously published data, and our findings are consistent with established mechanisms of action for semaglutide.

      Additionally, body weight data have now been included in Figure 4D. We observed a similar body weight loss of approximately 5% on the first day of drug administration, consistent with the findings reported by Zhang et al. (2023).

      (6) The authors demonstrate that CNO administration prompted significant increase in liquid sugar intake in the last panel of Figure 5F as a confirmation that LH GABAergic neurons are implicated in processing reward, however given the above results it seems likely that these mice will drink anything including water (when not thirsty, thus in a non-rewarding scenario) or possibly aversive agents like quinine.

      This is an interesting question, and we agree with the Reviewer. The original discovery by Jennings and Stuber showed that optogenetic activation of these GABAergic neurons induces voracious feeding and that Vgat mice kept licking for liquid rewards in an appetitive task (Jennings et al., 2015). We also acknowledge that prior work has shown LH GABAergic neuron activation can drive consumption of non-caloric and biologically irrelevant stimuli, including wood gnawing, water, or saccharin (Navarro et al., 2016). However, several lines of evidence support a role in reward/palatability processing rather than purely indiscriminate consumption. Our own lab (Garcia et al., 2021) showed that activation of LH Vgat+ neurons increased quinine intake only during water deprivation; in sated animals, activation failed to promote quinine intake. Instead, these neurons promoted overconsumption of sucrose when available, leading us to conclude that LH Vgat+ neurons increase the drive to consume the nearest food, but this drive is potentiated by the palatability of the tastant. In non-human primates, LH GABA activation drives goal-directed eating predominantly for palatable food (Ha et al., 2024), supporting a reward-related function across species. Together, these findings indicate that while LH GABAergic activation does broadly promote consumption, the selectivity toward palatable stimuli observed in Figure 5F is consistent with a reward-related function.

    1. eLife Assessment

      This important study examines the role of TNF in modulating energy metabolism during parasite infection. The authors perform an elegant set of studies combining genetics, small molecule perturbation, and phenotypic experiments to highlight a role for glycolysis and glucose transport in control of parasitemia. This solid work integrates an interesting set of observations that will be of interest to the Plasmodium and pathogenesis communities with an expanded set of experiments.

    2. Reviewer #2 (Public review):

      [Editors' note: this version has been assessed by the Reviewing Editor without further input from the original reviewers.]

      Summary:

      The premise of the manuscript by Matteucci et al. is interesting and elaborates a mechanism via which TNFa regulates monocyte activation and metabolism to promote murine survival during Plasmodium infection. The authors show that TNF signaling (via an unknown mechanism) induces nitrite synthesis, which (via yet an unknown mechanism), and stabilizes the transcription factor HIF1a. Furthermore, that HIF1a (via an unknown mechanism) increases GLUT1 expression and increases glycolysis in monocytes. The authors demonstrate that this metabolic rewiring towards increased glycolysis in a subset of monocytes is necessary for monocyte activation including cytokine secretion, and parasite control.

      Strengths:

      The authors provide elegant in vivo experiments to characterize metabolic consequences of Plasmodium infection, and isolate cell populations whose metabolic state is regulated downstream of TNFa. Furthermore, the authors tie together several interesting observations to propose an interesting model.

      Weaknesses:

      The authors show that TNFa induces GLUT1 in monocytes, but do not show a direct role for GLUT1 or glucose uptake in monocytes in host resistance to infection.

    3. Author response:

      The following is the authors’ response to the previous reviews

      We thank the reviewers for their careful evaluation and constructive comments throughout the two rounds of revision. We hope that the revisions have satisfactorily addressed all concerns and that the manuscript is now suitable for publication.

      This novel contribution highlights the role of this pro-inflammatory factor in the pathogenesis of and resistance to Plasmodium chabaudi infection in mice. While aspects of this response have been previously described, this study is the first to link the TNF–iNOS–HIF-1α axis to the in vivo mediation of malaria disease through its involvement in glucose metabolism. Despite well-documented metabolic alterations during malaria, including hypoglycemia and hyperlactatemia, the mechanisms underlying these changes and their relationship to host immune responses remain poorly understood. Addressing this gap is essential for elucidating how metabolic adaptation shapes disease outcomes during Plasmodium infection.

      In response to the reviewer’s comments, we have revised the Abstract, Introduction, and Discussion to clearly distinguish between:

      Previously established mechanisms (TNF–iNOS–HIF-1α–glycolysis axis), and

      The novel contribution of our study (its in vivo integration during Plasmodium infection and association with host resistance).

      Public Reviews:

      Reviewer #2 (Public review):

      Summary:

      The premise of the manuscript by Matteucci et al. is interesting and elaborates a mechanism via which TNFa regulates monocyte activation and metabolism to promote murine survival during Plasmodium infection. The authors show that TNF signaling (via an unknown mechanism) induces nitrite synthesis, which (via yet an unknown mechanism), and stabilizes the transcription factor HIF1a. Furthermore, that HIF1a (via an unknown mechanism) increases GLUT1 expression and increases glycolysis in monocytes. The authors demonstrate that this metabolic rewiring towards increased glycolysis in a subset of monocytes is necessary for monocyte activation including cytokine secretion, and parasite control.

      Strengths:

      The authors provide elegant in vivo experiments to characterize metabolic consequences of Plasmodium infection, and isolate cell populations whose metabolic state is regulated downstream of TNFa. Furthermore, the authors tie together several interesting observations to propose an interesting model regarding

      Weaknesses:

      The main conclusion of this work - that "Reprogramming of host energy metabolism mediated by the TNF-iNOS-HIF1a axis plays a key role in host resistance to Plasmodium infection" is unsubstantiated. The authors show that TNFa induces GLUT1 in monocytes, but never show a direct role for GLUT1 or glucose uptake in monocytes in host resistance to infection (nor the hypoglycemia phenotype they describe).

      We thank the reviewer for this important comment and for highlighting the need to clarify the mechanistic link between TNF-driven metabolic rewiring and host resistance to Plasmodium infection. As noted in our first revision, our primary objective was to investigate how TNF integrates systemic and cellular metabolic responses during infection in vivo. We demonstrate that glucose uptake is significantly increased in spleen and liver during infection in a partially TNF-dependent manner, and that TNF promotes GLUT1 expression (main glucose transporter in immune cells) and glycolysis specifically in monocytic cells. Importantly, to directly address the role of TNF signaling in myeloid cells, we also observed the same phenotype (higher parasitemia, but absence of hypothermia and hypoglycemia) in mice with conditional deletion of TNF receptor 1 in lysozyme M–expressing cells (TNFR1^ΔLyz2) (Figure 4P–R), thereby validating in a cell-specific context the findings previously observed in mice with global TNFR1 deficiency. Together, these findings support a functional link between TNF signaling in monocytes, induction of GLUT1-dependent glucose metabolism, and the regulation of both systemic metabolic responses and host resistance during experimental malaria.

      While we agree that we do not demonstrate a cell-intrinsic role for GLUT1 in monocytes, multiple lines of evidence in our study support the functional relevance of glycolytic metabolism downstream of the TNF–iNOS–HIF-1α axis.

      (1) First, we show that Pc infection results in a marked increase in glucose uptake in the spleen and liver, but not in skeletal muscle or adipose tissues (Figure 2K), and that this effect is absent in TNFR-/- mice (Figure 2L), indicating a TNF-dependent and tissue-specific metabolic reprogramming. We have also clarified in the Discussion that this process appears to be insulin-independent and likely driven by pro-inflammatory signals.

      (2) Second, we show that the TNF–iNOS–HIF-1α axis. induces GLUT1 expression in monocytic cells (Figures 4M, 5D, 6L). This supports a model in which these cells contribute to observed systemic metabolic changes.

      (3) Third, we also observed a similar phenotype—characterized by higher parasitemia but absence of hypothermia and hypoglycaemia-in mice with conditional deletion of TNF receptor 1 in lysozyme M–expressing cells (TNFR1^ΔLyz2) (Figure 4P–R), thereby validating in a cell-specific context the findings previously observed in mice with global TNFR1 deficiency. These findings indicate that disruption of glycolysis phenocopies key aspects of the TNF-driven metabolic and immunological response to infection. 

      (4) Finally, we demonstrate that glycolytic metabolism is functionally relevant for host resistance. Pharmacological inhibition of glycolysis in vivo using 2-DG led to increased parasitemia (Figure 6O), resembling the impaired parasite control observed in HIF-1α^ΔLyz2, TNFR-/-, and iNOS-/- mice. These findings indicate that disruption of glycolysis phenocopies key aspects of the TNF–iNOS–HIF-1α axis deficiency, supporting the conclusion that this pathway is required to sustain glycolytic metabolism and effective parasite control during infection.

      About the hypoglycemia phenotype and resistance, our previous study (PMID: 29805094) demonstrates that TNF-driven inflammation regulates systemic glucose metabolism during Plasmodium chabaudi infection. We showed that infection-induced hypoglycemia correlates with TNF levels and is associated with changes in parasite development. Specifically, leukocytes primed with IFNγ display increased expression of glucose metabolism and inflammatory genes, and TNFα-induced hypoglycemia is linked to the accumulation of non-proliferative trophozoite forms, whereas parasite replication (schizogony) occurs during host feeding. These findings indicate that blood glucose availability, regulated by TNF, directly influences parasite growth dynamics and infection outcome. Although the cellular mechanisms were not addressed in that study, our current work builds on these findings by identifying the TNF-iNOS–HIF-1α axis as a driver of GLUT1-dependent glycolysis in monocytes, linking systemic metabolic changes to a cell-intrinsic mechanism that contributes to host resistance. 

      We agree that directly establishing the cell-intrinsic contribution of GLUT1 would require dedicated genetic approaches (e.g., conditional deletion in monocytes), which are beyond the scope of the present study. 

      Comments on revisions:

      The demonstration that the established TNF-iNOS-HIF-1α-glycolysis axis operates in vivo during P. chabaudi infection is valuable and relevant. However, it constitutes contextual validation and must be carefully described as such. This distinction, i.e., "what has already been shown vs. what is new" is not consistently reflected in the framing of the manuscript raising overstatement concerns. This is particularly evident in the abstract and other conclusive statements, where mechanistic novelty is implied, even when the underlying pathways/mechanisms are already known. To improve the manuscript, all sentences that refer to already established findings should be accurately described as such.

      For example, the abstract states: "Here, we show that TNF signaling hampers physical activity, food intake, and energy expenditure while enhancing glucose uptake by the liver and spleen as well as controlling parasitemia in P. chabaudi-infected mice." In this sentence, the effects of TNF signaling on physical activity, food intake, energy expenditure, glucose metabolism and control of parasitemia are unequivocally established and therefore do not, in themselves, constitute new findings. Feeding behavior, not cell-intrinsic metabolism, may drive glycemic differences.

      We thank the reviewer for this comment and for highlighting the importance of distinguishing systemic metabolic effects from cell-intrinsic mechanisms. We have now revised the manuscript to more consistently distinguish between previously established mechanisms and our novel findings, particularly in the Abstract and other summary statements, to avoid any potential overstatement.

      We also would like to emphasize that, in both the Introduction and Discussion, we explicitly acknowledge that key components of the TNF–iNOS–HIF-1α–glycolysis axis have been previously described. In the Introduction, we cite studies demonstrating that TNF can induce glucose uptake and metabolic reprogramming in immune cells (refs. 14–17), as well as the role of HIF-1α as a central regulator of glycolysis and inflammation in myeloid cells (refs. 21–28). Similarly, in the Discussion, we detail prior evidence that TNF induces iNOS-derived RNI (refs. 51–54), that RNI stabilizes HIF-1α (ref. 52), and that HIF-1α drives the expression of glycolytic genes including GLUT1 (refs. 55–57). We also cite studies showing that TNF contributes to parasite control and glucose metabolism in malaria (refs. 58–61).

      Importantly, while these pathways have been described in other contexts, their integration and functional relevance in vivo during Plasmodium infection, particularly in the context of host systemic metabolism and monocytic cell function, have not been previously demonstrated. Our study addresses this gap by showing that this axis operates during P. chabaudi infection and links inflammatory signaling to both cellular metabolic reprogramming and organismal metabolic changes.

      Specifically, we demonstrate that TNF signaling drives increased glucose uptake in spleen and liver in a tissue-specific manner, promotes GLUT1 expression and glycolysis in monocytic cells, and that disruption of this axis (genetically or pharmacologically via glycolysis inhibition) impairs parasite control. In addition, we provide evidence connecting these cellular processes to systemic metabolic alterations, including hypoglycemia.

      The authors propose that TNF signaling leads to GLUT1 upregulation (in inflammatory monocytes, MO-DCs, and within the liver and spleen) during Plasmodium infection, and that this results in increased glucose uptake contributing to systemic hypoglycemia. While this is an intriguing hypothesis, we urge the authors to consider an alternative explanation that, at present, is not adequately ruled out. Given that glycemia serves as a central functional readout in the manuscript, this distinction is essential to clarify.

      The observed regulation of glycemia is likely not a direct consequence of increased glucose uptake by immune cells or by tissues but may instead reflect broader differences in disease severity across genotypes. The iNOS KO, TNFR KO, and HIF-1ΔLyz2 mice likely experience a dampened inflammatory response, which would blunt infection-induced anorexia and help preserve overall metabolic homeostasis. This alternate interpretation is supported by the authors' metabolic cage data showing increased physical activity in TNFR KO mice and the elevated food intake shown in Figure 2B.

      We thank the reviewer for this important point regarding the potential contribution of feeding behavior and systemic energy balance to the observed metabolic phenotypes. In fact, this possibility has been explicitly already incorporated into the revised manuscript. Also, we have revised the Discussion to explicitly state that the hypoglycemia observed during infection likely reflects both systemic changes in energy balance and TNF-driven metabolic reprogramming in immune cells, rather than a single isolated mechanism. Specifically, we have had already added the following statement to the Discussion:

      “Although restored physical activity, food consumption and energy expenditure in knockout mice may contribute to the observed systemic metabolic parameters by altering energy balance, these effects are not mutually exclusive with the TNF-driven, cell-intrinsic metabolic mechanisms described here”.

      In addition, we note that under naive conditions, we did not observe differences between genotypes in physical activity, food intake, energy expenditure, respiratory exchange ratio, or glycemia. These findings support that baseline metabolic parameters are comparable and that the differences observed during infection arise in the context of TNF-dependent inflammatory responses. During infection, although TNFR-deficient mice display increased food intake and activity, these differences arise in the context of altered inflammatory signaling. Therefore, rather than being mutually exclusive, behavioral and metabolic changes are likely coordinated downstream of TNF signaling.

      Furthermore, our data using pharmacological inhibition of glycolysis (2-deoxy-D-glucose) demonstrate that disruption of glycolytic metabolism results in increased parasitemia and reduced lactate levels, recapitulating key aspects of the phenotype observed in TNFR-/-, iNOS-/-, and HIF-1αΔLyz2 mice. This supports a functional role for glycolytic metabolism in host response, beyond differences in feeding behavior.

      Since anorexia and energy expenditure are tightly coupled to the inflammatory milieu, it is plausible that these behavioral and systemic differences-not monocyte nor tissue GLUT1 expression per se-are the primary contributors to the observed glycemic patterns. To support their current interpretation, the authors should perform a pair-feeding experiment in which (at least) TNFR KO mice are restricted to the same food intake as infected WT controls. This would help disentangle whether differences in glycemia truly reflect immune-driven metabolic rewiring or are secondary to differences in caloric intake.

      We thank the reviewer for this suggestion. We agree that pair-feeding experiments would provide an additional layer of control to isolate the contribution of caloric intake. However, we note that:

      (1) Baseline metabolic equivalence in naive animals argues against intrinsic differences in energy balance.

      (2) The observed phenotypes occur in the context of infection-driven inflammation, where anorexia is itself a TNF-dependent host response.

      (3) Our data support a model in which behavioral changes and metabolic rewiring are integrated components of the host response rather than independent variables.

      Importantly, our data already support a role for TNF-driven metabolic rewiring beyond feeding behavior, as inhibition of glycolysis with 2-deoxy-D-glucose recapitulates the impaired parasite control observed in genetic models. In addition, as discussed in the manuscript, systemic factors such as food intake are not mutually exclusive with cell-intrinsic metabolic mechanisms.

      We therefore consider that pair-feeding experiments are beyond the scope of the present study.

      The contribution of monocyte-specific glucose metabolism to host resistance remains unresolved.

      We appreciate the authors' effort to address the mechanistic role of glycolysis in host resistance using in vivo 2-deoxyglucose (2DG) treatment. However, I would like to point out that while this experiment is informative, it does not fully resolve the specific concern raised regarding the cell-intrinsic role of TNF-induced glycolysis in monocytes. 2DG acts systemically, inhibiting glycolysis across a wide range of cell types-including hepatocytes, endothelial cells, lymphocytes, and myeloid populations. Therefore, the observed increase in parasitemia following 2DG treatment may reflect the broad importance of glycolysis for host defense, or alternatively, may result from elevated circulating glucose levels induced by 2DG (PMID: 35841892), which could enhance parasite growth by increasing nutrient availability. Therefore, this experiment does not allow for a specific conclusion about the requirement for TNF-driven metabolic reprogramming in monocytes.

      We thank the reviewer for this comment regarding the interpretation of the 2-deoxyglucose (2DG) experiments. We agree that systemic 2DG treatment does not allow cell-specific conclusions, as it broadly inhibits glycolysis across multiple cell types. Accordingly, these data are interpreted as supporting a role for glycolysis in host defense at the organismal level, rather than as direct evidence for a monocyte-intrinsic requirement of TNF-driven metabolic reprogramming.

      At the same time, our study includes cell-specific analyses that support the engagement of this pathway in myeloid populations. In particular, we observe increased GLUT1 expression in CD11b<sup>+</sup> cells within both the liver and spleen during infection, with marked upregulation in monocyte-derived dendritic cells (MODCs). Importantly, this induction is not observed in the corresponding knockout models, supporting the idea that TNF signaling is required for this metabolic adaptation in these cells in vivo. Consistent with this, we validated that both parasitemia and systemic glucose levels in TNFR1^ΔLyz2 mice phenocopy those observed in TNFR-deficient animals, reinforcing the contribution of myeloid TNF signaling to the metabolic and disease outcomes.

      In addition, our in vitro data demonstrate increased GLUT1 expression in WT monocytes but not in cells lacking components of the TNF–iNOS–HIF-1α axis, further supporting a pathway-specific effect. Given that GLUT1 is the primary glucose transporter in immune cells, these combined in vivo and in vitro findings, together with the 2DG experiments, provide strong evidence supporting our proposed model. 

      We agree that directly establishing a monocyte-intrinsic role would require targeted genetic approaches, which are beyond the scope of the present study.

    1. eLife Assessment

      This valuable study characterizes the emergence of the membrane-associated periodic cytoskeleton (MPS) in the axons of human motor neurons derived from induced pluripotent stem cells. Super-resolution imaging of beta-II spectrin provides convincing evidence for the patterned assembly of spectrin-poor gaps and spectrin-rich MPS in the medial region of the axons and its enhancement by the kinase inhibitor staurosporine. The data advocates against gap formation by axonal degeneration or cytoskeleton disassembly in a continuous MPS. Instead, a continuous MPS may result from nascent MPS patches and their maturation, a model that would benefit from live imaging for validation.

    2. Reviewer #1 (Public review):

      The authors have presented a revised version of their investigation into the Membrane Associated Periodic Skeleton (MPS) in iPSC derived human motor neurons. As mentioned in the earlier report, the main observations reported in this article-occurrence of patch and gap arrangement of MPS-is very interesting. The real puzzle is whether, and if so how, this structure coarsens over time to produce continuous MPS.

      Following suggestions from reviewers, the authors attempted live cell imaging, but the results were not consistent enough and the authors point out difficulties in obtaining sufficient numbers and possible artefacts of over-expression. This investigation would have been much stronger with live cell imaging data on the dynamics of patch and gap structures.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Gazal et al., describe the presence of unique gaps and patches of BetaII-spectrin in medial sections of long human motor neuron axons. BII-spectrin, along with Alpha-spectrin forms horizontal linkers between 180nm spaced F-actin rings in axons. These F-actin rings along with the spectrin linkers form membrane periodic structures (MPS) which are critical for maintenance of the integrity, size and function of axons. The primary goal of the authors was to address if long motor axons, particularly those carrying familial mutations associated with the neurodegenerative disorder ALS, show defects in gaps and patches of BetaII-spectrin ultimately leading to degradation of these neurons.

      Strengths:

      The experiments are well designed and the authors have used the right methods and cutting-edge techniques to address the questions in this manuscript. The use of human motor neurons and the use of motor neurons with different familial ALS mutations is a strength. The use of isogenic controls is a positive. The induction of gaps and patches by the kinase inhibitor staurosporine and their rescue by Latrunculin A is novel and well executed. The use of biochemical assays to explore the role of calpains is appropriate and well designed. The use of STED imaging to define the periodicity of MPS in the gaps and patches of spectrin is a strength.

      Weaknesses:

      Primary weakness is the lack of rigorous evaluation to validate the proposed model of spectrin capture from the gaps into adjacent patches by the use of photobleaching and live-imaging. Another point is the lack of investigation into how gaps and patches change in axons carrying the familial ALS mutations as they age, since 2 weeks is not a timepoint when neurodegeneration is expected to start.

      Comment on revised version.

      The authors have given a point-by-point response to all the reviewer's concerns. They have also addressed concerns which I raised adequately. I have no further concerns.

    4. Reviewer #3 (Public review):

      Summary:

      Gazal et al present convincing evidence supporting a new model of MPS formation where a gap-and-patch MPS pattern coalesces laterally to give rise to a lattice covering the entire axon shaft.

      Strengths:

      (1) This is a very interesting study that supports a change in paradigm in the model of MPS lattice formation.

      (2) Knowledge on MPS organization is mainly derived from studies using rat hippocampal neurons. In the current manuscript, Gazal et al use human IPS-derived motor neurons, a highly relevant neuron type to further the current knowledge on MPS biology.

      (3) The quality of the images provided, specifically of those involving super-resolution is of high standards, supporting adequately the conclusions of the authors.

      Weaknesses:

      (1) The main concern raised by the manuscript is the assumption that staudosporine-induced gap and patch formation recapitulates the physiological assembly of gaps and patches of betaII-spectrin.

      (2) One technical challenge that limits a more compelling support of the new model of MPS formation, is that fixed neurons are imaged, which precludes the observation of patch coalescence.

    5. Author response:

      The following is the authors’ response to the original reviews.

      eLife Statement

      This valuable study characterizes the emergence of the membrane-associated periodic cytoskeleton (MPS) in the axons of human motor neurons derived from induced pluripotent stem cells. Super-resolution imaging of beta-II spectrin provides convincing evidence for the patterned assembly of spectrin-poor gaps and spectrin-rich MPS in the medial region of the axons and its enhancement by the kinase inhibitor staurosporine. The data advocates against gap formation by cytoskeleton disassembly in a continuous MPS. Instead, a continuous MPS may result from nascent MPS patches and their maturation, a model that would benefit from live imaging for validation.

      (R1) We thank the reviewers and editor for their constructive and thoughtful feedback. We are pleased the reviewers found our evidence to be convincing and that our study provides a valuable framework for understanding the complex dynamics of MPS assembly.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Ever since the surprising discovery of the membrane-associated Periodic Skeleton (MPS) in axons, a significant body of published work has been aimed at trying to understand its assembly mechanism and function. Despite this, we still lack a mechanistic understanding of how this amazing structure is assembled in neuronal cells. In this article, the authors report a "gap-and-patch" pattern of labelled spectrin in iPSC-derived human motor neurons grown in culture. The mid-sections of these axons exhibit patches with reasonably well-organized MPS that are separated by gaps lacking any detectable MPS and having low spectrin content. Further, they report that the intensity modulation of spectrin is correlated with intensity modulations of tubulin as well. However, neurofilament fluorescence does not show any correlation. Using DIC imaging, the authors show that often the axonal diameter remains uniform across segments, showing a patch-gap pattern. Gaps are seen more abundantly in the midsection of the axon, with the proximal section showing continuous MPS and the distal segment showing continuous spectrin fluorescence but no organized MPS. The authors show that spectrin degradation by caspase/calpain is not responsible for gap formation, and the patches are nascent MPS domains. The gap and patch pattern increases with days in culture and can be enhanced by treating the cells using the general kinase inhibitor staurosporine. Treatment with the actin depolymerizing agent Latrunculin A reduces gap formation. The reasons for the last two observations are not well understood/explained.

      (R2) We thank the reviewer for the detailed and accurate description of the data shown and its relevance to further our understanding of MPS assembly mechanism and function.

      Strengths:

      The claims made in the paper are supported by extensive imaging work and quantification of MPS. Overall, the paper is well written and the findings are interesting. Although much of the reported data are from axons treated with staurosporine, this may be a convenient system to investigate the dynamics of MPS assembly, which is still an open question.

      (R3) We thank the reviewer for the positive comments on the manuscript and the convenience of the experimental system developed to further study the dynamics of MPS assembly. We hope others turn into motor neurons to explore cortical cytoskeleton biology and hopefully shed light into their susceptibility in various degenerative diseases.

      Weaknesses:

      Much of the analysis is on staurosporine-treated cells, and the effects of this treatment can be broad. The increase in patch-gap pattern with days in culture is intriguing, and the reason for this needs to be checked carefully. It would have been nice to have live cell data on the evolution of the patch and gap pattern using a GFP tag on spectrin. The evolution of individual patches and possible coalescence of patches can be observed even with confocal microscopy if live cell super-resolution observation is difficult.

      (R4) Because staurosporine may hit various kinases relevant to the phenomenon under study we did not elaborate too deeply on the likely targets in the discussion. We have, however, included the possibility that the relevant kinase in this matter could be PKC, in light of the new study published while our manuscript was under revision (Heller et al., 2025) (see second last paragraph in the Discussion section). Staurosporine represented a convenient initial approach that allowed us to find the phenomenon, and we are now conducting new studies dissecting the molecular pathways involved. However, the extent of such studies lies beyond the scope of the present report.

      See R16 regarding possible live-imaging experiments using tagged βII-spectrin constructs.

      Some more comments:

      (1) Axons can undergo transient beading or regularly spaced varicosity formation during media change if changes in osmolarity or chemical composition occur. Such shape modulations can induce cytoskeletal modulations as well (the authors report modulations in microtubule fluorescence). The authors mention axonal enlargements in some instances. Although they present DIC images to argue that the axons showing gaps are often tubular, possible beading artefacts need to be checked. Beading can be transient and can be checked by doing media changes while observing the axons on a microscope.

      (R5) As we acknowledge this possibility, we believe that, even if they occurred, they could not contribute to our observations of gaps-and-patches phenomenon since this latter subsisted long (hours and days) after any gross manipulation of media. Moreover fixed samples, when observed under DIC, confocal or STED did not evidence such beadings. We do refer to a characteristic local enlargement that was very localized and very low in numbers (see Fig.1C and E, and Suppl. Fig1C and E), so we don't believe these are transient, and do not resemble the structure referred to as beading. Structurally, beading is essentially different since it appears in rows of consecutive “beads” in long stretches, where round, small enlargements of axonal caliber are arranged in a consecutive manner, resembling pearls on a string. As mentioned by the reviewer, the beading phenomena can occur transiently when drastically changing media osmolarity (rarely done in cell culture manipulations) or non-tranciently when axons are undergoing degeneration. Indeed, to prevent gross changes in osmolarity, our routine fixation is a 4% PFA and 4% sucrose in PBS. In any case, we did not observe signs of beading in the cultures used for this study.

      (2) Why do microtubules appear patchy? One would imagine the microtubule lengths to be greater than the patch size and hence to be more uniform.

      (R6) Our stainings are for tubulin protein isoforms beta-III and alpha-II. That is, they would label microtubules, but free tubulin as well. Hence we don't think this is evidence for “patchy microtubules”. The slight decrease in intensity for tubulin within gaps is indeed something to investigate, and can indicate that tubulin prefers to accumulate within patches.

      (3) Why do axons with gaps increase with days in culture? If patches are nascent MPS that progressively grow, one would have expected fewer gaps with increasing days in culture. Is this indicative of some sort of degeneration of axons?

      (R7) We agree with the apparent discrepancy. However, one has to take into account that these axons are still elongating even at 2 weeks in culture and beyond. Hence, at any time point, there is a new axonal compartment recently added, and hence, with low βII-spectrin and no organized MPS. Also, the dynamical evolution of the gaps-and-patches structure has to take into account the rate of βII-spectrin supply and transport. If supply is somehow lower than a given threshold, it is expected that there will be more gaps, given the new, more distant parts of the axons have a lower supply of βII-spectrin. To explore this formally, we are working on simulations of these multifactorial dynamic systems to better understand this, that together with key experimental observations would enhance our understanding into our model of MPS assembly in growing axons. However, findings for this project will be the subject of another manuscript.

      (4) It is surprising that Latrunculin A reduces gap formation induced by staurosporine (also seems to increase MPS correlation) while it decreases actin filament content. How can this be understood? If the idea is to block actin dynamics, have the authors tried using Jasplakinolide to stabilize the filaments?

      (R8) The results with the co-treatment with Latrunculin A and Staurosporine are indeed intriguing, and provide clear evidence that the gap-and-patch pattern arises from local assembly of the MPS, requiring newly formed actin filaments. On the other hand, the fact that F-actin within the pre-formed MPS seems unaffected is not surprising. There are many different populations of F-actin in axons (i.e. MPS rings, longitudinal filaments, actin patches, actin trails), all of which have a different rate of monomer turnover. Latrunculin A affects filaments indirectly. The target of Latrunculin A is not actin filaments, but free monomers. Monomer sequestration ultimately affects actin filaments: filaments are constantly exchanging monomers, but, devoid of free monomers, filaments get shorter and eventually disappear. The drastic decrease in global F-actin in LatA-treated axons reflects that. The fact that F-actin in the MPS is preserved shows that these filaments are stable -if they are not losing monomers in the time frame of the treatment, the filament remains unaffected. This subject is extensively covered in the 8th paragraph of the Discussion section.

      We have not used Jasplakinolide. The expected outcome will not mimic that of Latrunculin A since Jasplakinolide has a different mechanism of action (i.e. it binds -and stabilizes- the actin filament).

      (5) The authors speculate that the patches are formed by the condensation of free spectrins, which then leaves the immediate neighborhood depleted of these proteins. This is an interesting hypothesis, and exploring this in live cells using spectrin-GFP constructs will greatly strengthen the article. Will the patch-gap regions evolve into continuous MPS? If so, do these patches expand with time as new spectrin and actin are recruited and merge with neighboring patches, or can the entire patch "diffuse" and coalesce with neighboring patches, thus expanding the MPS region?

      (R9) We agree with the reviewer's interpretation. A virtue of our experimental model and our interpretations of the observations in fixed cells is that it gives rise to informative questions such as the ones posed by the reviewer. See R16 regarding possible live-imaging experiments using tagged βII-spectrin constructs.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Gazal et al. describe the presence of unique gaps and patches of BetaII-spectrin in medial sections of long human motor neuron axons. BII-spectrin, along with Alpha-spectrin, forms horizontal linkers between 180nm spaced F-actin rings in axons. These F-actin rings, along with the spectrin linkers, form membrane periodic structures (MPS) which are critical for the maintenance of the integrity, size, and function of axons. The primary goal of the authors was to address whether long motor axons, particularly those carrying familial mutations associated with the neurodegenerative disorder ALS, show defects in gaps and patches of BetaII-spectrin, ultimately leading to degradation of these neurons.

      (R10) We thank the reviewer for the detailed and accurate description of the data shown.

      Strengths:

      The experiments are well-designed, and the authors have used the right methods and cutting-edge techniques to address the questions in this manuscript. The use of human motor neurons and the use of motor neurons with different familial ALS mutations is a strength. The use of isogenic controls is a positive. The induction of gaps and patches by the kinase inhibitor staurosporine and their rescue by Latrunculin A is novel and well-executed. The use of biochemical assays to explore the role of calpains is appropriate and well-designed. The use of STED imaging to define the periodicity of MPS in the gaps and patches of spectrin is a strength.

      (R11) We thank the reviewer for the positive comments on the manuscript, the techniques used and the proposed model.

      Weaknesses:

      The primary weakness is the lack of rigorous evaluation to validate the proposed model of spectrin capture from the gaps into adjacent patches by the use of photobleaching and live imaging. Another point is the lack of investigation into how gaps and patches change in axons carrying the familial ALS mutations as they age, since 2 weeks is not a time point when neurodegeneration is expected to start.

      (R12) See R16 regarding possible live-imaging experiments using tagged βII-spectrin constructs.

      We don't discard the notion that axons carrying familial ALS mutations will show defects in MPS formation and/or stability when observed at longer culture times, or under culture conditions that promote neuronal aging (Guix et al., 2021). Thus, we continue to work with these cells, but the goal of such project lies well beyond the primary message of the present manuscript, as we discuss in the second paragraph of the Discussion section.

      Reviewer #3 (Public review):

      Summary:

      Gazal et al present convincing evidence supporting a new model of MPS formation where a gap-and-patch MPS pattern coalesces laterally to give rise to a lattice covering the entire axon shaft.

      Strengths:

      (1) This is a very interesting study that supports a change in paradigm in the model of MPS lattice formation.

      (2) Knowledge on MPS organization is mainly derived from studies using rat hippocampal neurons. In the current manuscript, Gazal et al use human IPS-derived motor neurons, a highly relevant neuron type, to further the current knowledge on MPS biology.

      (3) The quality of the images provided, specifically of those involving super-resolution, is of a high standard. This adequately supports the conclusions of the authors.

      (R13) We thank the reviewer for the positive comments on the manuscript, the techniques used and the proposed model.

      Weaknesses:

      (1) The main concern raised by the manuscript is the assumption that staudosporine-induced gap and patch formation recapitulates the physiological assembly of gaps and patches of betaII-spectrin.

      (R14) Along the project, various gaps-and-patches parameters were measured in different conditions and stainings. In all these examinations the only parameter that changed considerably was their abundance. While this suggests that the gaps-and-patches features are comparable between control and staurosporine-treated cells, we acknowledge as a general caution regarding negative data—that subtle qualitative differences cannot be entirely ruled out. We have now emphasized this possibility in the 9th paragraph of the Discussion section.

      (2) One technical challenge that limits a more compelling support of the new model of MPS formation is that fixed neurons are imaged, which precludes the observation of patch coalescence.

      (R15) See R16 regarding possible live-imaging experiments using tagged βII-spectrin constructs.

      Recommendations for the authors:

      Reviewing Editor Comments:

      The reviewers all agree that the work would strongly benefit from live imaging to assess the maturation dynamics of the gap/patch pattern.

      (R16) Reviewers agreed that some of the conclusions of our manuscript would benefit from live imaging for validation. Various anticipated technical and biological challenges made these approaches not to be conducted for this initial study on human motor neurons. Just to mention the most important, from previous work of our labs, these cells themselves are difficult to transfect at 2 weeks in culture. Also, ectopically expression of tagged βII-spectrin escapes normal expression control and it has been noticed that ectopic expression yields to protein localization that does not necessarily reflect the endogenous distribution, or that produces cellular responses that precludes the observation of the phenomena under study. These difficulties in studying over-expressed tagged βII-spectrin have been reported in the field, with mentions that the analysed axons were those expressing “low levels of the construct” (Boyer et al., 2026; Zhong et al., 2014; Zhou et al., 2022). Taking this into account, we did not anticipate that, for the goals of the present project, live-imaging was to be included. However, given the positive comments and reception of our conclusions, we sought to try to perform this challenging and risky approach. To that end, we used a C-terminus tagged mouse βII-spectrin-GreenLantern plasmid to transfect our cells (a kind gift from Dr. Subjohit Roy, UCSD, USA). After 3 rounds of differentiating cells and trying various combinations of plasmid quantity, lipofectimine-to-DNA ratios and times of transfection (amongst other parameters), we have got an extremely low efficiency of transfection, and the few expressing neurons showed a distribution of βII-spectrin-GreenLantern that did not match our observations of immunolocalization of endogenous βII-spectrin. Taking all these into account, the present version of the manuscript will not include live-cell imaging on expressed tagged βII-spectrin. Given that reviewers found that some statements in the initial submission would have been better supported by live-imaging, we made changes in the manuscript so as to acknowledge the limitations of concluding dynamic mechanisms from fixed samples (see for example last sentences on 5th paragraph of the Discussion section). Having said so, we hope to be able, in the future, to overcome these experimental challenges and be able to establish live-imaging of βII-spectrin in neurons. For example, to avoid unregulated transgene expression, Heller and colleagues recently generated a βII- spectrin-mNeonGreen conditional knock-in (cKI) mice, consisting of a LoxP- flanked alternative final exon of endogenous βII-spectrin with a C- terminal mNeonGreen fusion that is expressed upon Cre expression (Heller et al., 2025). The implementation and further development of such approaches will be very helpful in new studies on the dynamics of βII-spectrin and the MPS as a whole. However, the scale of work needed to accomplish those approaches represent stand-alone projects.

      Reviewer #1 (Recommendations for the authors):

      In the section "The MPS is absent in beta-II spectrin gaps, the authors mention that the presence of MPS in patches suggests that the axons are not undergoing degeneration. I don't think this is a good criterion to use, despite the citations they take support from.

      (R17) We agree with the reviewer's suggestion: in virtue of the unlikely connection between the cited developmental axon degeneration process in sensory neurons and the possible axon degeneration of long term cultures of human-iPSCs-derived motor neurons studied here, we have eliminated the sentence of reference

      The authors show that degradation by proteases does not happen in their case. In this regard, they may want to discuss the recent article by Heller et al, Science 2025 (https://doi.org/10.1126/science.adn6712) and Hofmann et al, Sci. Rep., 2022 (https://doi.org/10.1038/s41598-022-18562-5)

      (R18) By western blot analysis, we did not see evident changes in proteolysis-derived fragments. However it is likely that even when finding phenotypes with protease inhibitors, protein fragments accumulation is below the sensitivity of western blots. We were expecting gross changes observable by western blot in the case proteolysis explained gap formation.

      Calpain and Caspase activity has been shown to be relevant in different aspects of MPS biology. To the works cited by the reviewer, now one has to add the very recent work by Fei and colleagues (Fei et al., 2026). We have modified part of the Discussion section to analyse our results in this broader context.

      Briefly, Hofmann and colleagues found that acute treatment with calpain inhibitors right before axotomy lead to an increase in percentage of periodic βII-spectrin (referred by authors as “periodicity”) in the regenerated axons in a 2-hour period. Interestingly, the βII-spectrin patches they describe at distal portions did not increase in number, but they increased in size. This indicates that in the particular situation of axonal regeneration calpain activity puts a brake into MPS formation within patches. This invited us to re-examine our own protease inhibition experiments, and measured patch length in this. The new results are shown in Supplementary Fig. 6 and and further analysed in the Discussion section. In summary, our changes were much less notable than the ones found in regenerating axons, but follow the same trend: protease inhibitors made patches longer.

      On the other hand, Heller and colleagues found in live-imaging studies that calpain activity contributes to the steady-state dynamics of βII-spectrin exchange in a mature MPS lattice. More recently, Fei and colleagues found that caspase or calpain inhibition does not change the steady-state organization of a mature MPS lattice when observing treated axons after fixation samples. Fei and colleagues find a relevant role for calpains whenever massive endocytosis (of any kind) is engaged experimentally. Interestingly, all these studies, including ours, examined calpains roles in MPS in different scenarios. When looked in detail, we don’t believe that these are contradictory results among them, and a complete picture of calpains (and caspases) roles in MPS assembly, growth, maintenance and remodeling will have to take into account all the above mentioned results, including ours. All these analyses are now included in the Discussion section.

      Minor comments:

      (1) "Recently, it was proposed that this continuous MPS organization arises from the coalescence of discontinuous "patches" of incomplete MPS units that originate in the distal axon and migrate proximally (Zhong et al. 2014)." Please check the citation. Should it be Hoffman et al. 2022?

      (R19) The reviewer is correct. The proper citation has now been included.

      (2) Is there an established link between ALS and spectrin? I would suggest decreasing the emphasis on this as no clear conclusions are achieved.

      (R20) As stated in the text, the study of ALS mutations is justified from two aspects: one aspect is that there are several tubulin and other cytoskeletal proteins whose mutations are linked to ALS (Castellanos-Montiel et al., 2020) and microtubules dynamics has been shown to affect the cortical skeleton (Qu et al., 2017). Second, since human motor neurons are affected in ALS, we thought that a complete characterization of the βII-spectrin cortical cytoskeleton in these cells should include ALS-related mutations. We have now included an a basic MPS description in TDP43 and SOD1 mutation (Suppl. Fig. 5).

      The aspect of ALS-related mutations only occupies two short paragraphs in the main text and some panels in Supplementary information. To follow the suggestions by the Reviewer, we have downplayed the relative relevance of these results in the text, without compromising the amount of data we show.

      (3) There is a typo in the approximate symbol used for 150 kDa in the section where calpain and caspase activity is reported.

      (R21) Typo corrected.

      (4) Please add the Latrunculin concentration used in the main text, as it makes it easier for the reader.

      (R22) Done.

      (5) In the Discussion, paragraph starting with "We further showed ...", there is a typo where Zhong et al is cited.

      (R23) Corrected.

      (6) Supplementary Figure 1B: attachment instead of 'atachment'.

      (R24) Corrected.

      (7) Include DIVs or time in the schematic. It is easier for the reader to understand.

      (R25) We have now included time references in schematics of Suppl. Fig1B.

      (8) Supplementary Figure 1C

      Unable to distinguish βII-spectrin and βIII-tubulin in the merged image. Separate figure panels will help.

      (R26) The merged images in the reconstructions are merely to better show the tracing individual axons at such low magnification. Relevant portions with only βII-spectrin channels are shown in C1 and C2. Separated individual channels are shown elsewhere across the manuscript.

      (9) Supplementary Figure 4D

      Why is there so much cleavage product for αII-spectrin across DMSO and treatment? It varied over batches as well. Doesn't this mean that αII-spectrin is going through more proteolytic cleavage? Why?

      (R27) The amount of cleavage product for αII-spectrin is not a surprise to us. For instance, although calpains and caspases can potentially process both α- and β-spectrin, in in vivo scenarios where calpain activity is triggered there are much more fragments of α-spectrin being produced (Czogalla & Sikorski, 2005). On the other hand, our staining of cleaved-αII-spectrin by the SNTF antibody by immunofluorescence (Fig4C) parallels the findings by western blot -high levels of cleaved-αII-spectrin across treatments. A similar strong staining using this antibody has been recently shown in the intact axon (Heller et al., 2025). It will be interesting in the future to address if these fragments have any biological significance beyond being mere byproducts of αII-spectrin processing.

      Reviewer #2 (Recommendations for the authors):

      Suggestions for improving the quality of the manuscript:

      (1) Live imaging in combination with FRAP assays will help define whether the capture of spectrin from gaps into patches is true. Fixed neurons only provide static information and may not reflect real-time physiological effects.

      (R28) See R16 regarding possible live-imaging experiments using tagged βII-spectrin constructs.

      (2) Could the presence of F-actin trails in axons facilitate the formation of patches? Will the use of formin/Arp2/3 inhibitors rescue the effect of staurosporine, similar to Latrunculin A?

      (R29) Very interesting suggestion. It is likely that different pools of F-actin contribute to the dynamic of MPS formation, and actin trails are definitely worth investigating in this context.

      (3) Figure 8 lacks a latrunculin A treated condition? Why is this not present?

      (R30) The quantification of that treatment was excluded for space and readability. We have now included the values of group LatA + DMSO in Fig8Cand D and rearranged the whole figure.

      (4) Does neuronal stimulation have any effect (KCl treatment) on gaps and patches?

      (R31) Very interesting suggestion. Unfortunately, we have not examined whereas neuronal stimulation affects any parameter of the gaps-and-patches structure.

      (5) Please check the manuscript for typos and reference insertion points in the text. More than a couple were noted.

      (R32) We have corrected typos.

      Reviewer #3 (Recommendations for the authors):

      This is a very interesting study that supports a change in paradigm in the model of MPS lattice formation.

      (1) One major concern is the assumption that staudosporine-induced gap and patch formation recapitulates the physiological assembly of gaps and patches of betaII-spectrin, solely based on their morphological similarity. This should be further discussed in the manuscript. Further analysis of additional cytoskeleton components, including microtubules in staurosporine-treated neurons, could also be provided.

      (R33) See R14.

      (2) In Figure 1E, betaIII-tubulin and NF-H seem to accumulate in betaII-spectrin-rich axonal enlargements. If these are patches, how do you reconcile this finding with Figure 2C-D, where NF-M and alphaII-tubulin are not specifically enriched in betaII-spectrin patches?

      (R34) We actually show that axonal enlargements and patches are structurally unrelated, in many aspects. We mention these axonal enlargements as a way to perform an exhaustive characterization of all βII-spectrin features found in these axons.

      (3) One technical challenge that limits a more compelling support of the new model of MPS formation is that fixed neurons are imaged, which precludes the observation of patch coalescence. This should be further discussed in the revised version of the manuscript.

      (R35) The limitation of the experimental approach is now further discussed (see for example last sentences on 5th paragraph of the Discussion section).

      (4) On a more general note, the title of some of the Results sub-sections could be revised to convey the findings of those sub-sections and not the Methods that were used (example: "Quantitave and Qualitative analyses of betII-spectrin distribution....").

      (R36) According to the suggestion, we have changed the title of this subsection.

      References

      Boyer, N. P., Sharma, R., Wiesner, T., Parperis, C., Delamare, A., Pelletier, F., Jullien, N., Bhatt, A. M., Parra-Rivas, L. A., Kearney, P. J., Shavarebi, F., Leterrier, C., & Roy, S. (2026). Spectrin condensates provide a nidus for assembling the axonal membrane-associated periodic skeleton. iScience, 29(1), 114454. https://doi.org/10.1016/j.isci.2025.114454

      Castellanos-Montiel, M. J., Chaineau, M., & Durcan, T. M. (2020). The Neglected Genes of ALS: Cytoskeletal Dynamics Impact Synaptic Degeneration in ALS. Frontiers in Cellular Neuroscience, 14, 594975. https://doi.org/10.3389/fncel.2020.594975

      Czogalla, A., & Sikorski, A. F. (2005). Spectrin and calpain: A “target” and a “sniper” in the pathology of neuronal cells. Cellular and Molecular Life Sciences: CMLS, 62(17), 1913–1924. https://doi.org/10.1007/s00018-005-5097-0

      Guix, F. X., Capitán, A. M., Casadomé-Perales, Á., Palomares-Pérez, I., López Del Castillo, I., Miguel, V., Goedeke, L., Martín, M. G., Lamas, S., Peinado, H., Fernández-Hernando, C., & Dotti, C. G. (2021). Increased exosome secretion in neurons aging in vitro by NPC1-mediated endosomal cholesterol buildup. Life Science Alliance, 4(8), e202101055. https://doi.org/10.26508/lsa.202101055

      Heller, E., Kurup, N., & Zhuang, X. (2025). The membrane skeleton is constitutively remodeled in neurons by calcium signaling. Science (New York, N.Y.), 389(6760), eadn6712. https://doi.org/10.1126/science.adn6712

      Qu, Y., Hahn, I., Webb, S. E. D., Pearce, S. P., & Prokop, A. (2017). Periodic actin structures in neuronal axons are required to maintain microtubules. Molecular Biology of the Cell, 28(2), 296–308. https://doi.org/10.1091/mbc.E16-10-0727

      Zhong, G., He, J., Zhou, R., Lorenzo, D., Babcock, H. P., Bennett, V., & Zhuang, X. (2014). Developmental mechanism of the periodic membrane skeleton in axons. eLife, 3, e04581. https://doi.org/10.7554/eLife.04581

      Zhou, R., Han, B., Nowak, R., Lu, Y., Heller, E., Xia, C., Chishti, A. H., Fowler, V. M., & Zhuang, X. (2022). Proteomic and functional analyses of the periodic membrane skeleton in neurons. Nature Communications, 13(1), 3196. https://doi.org/10.1038/s41467-022-30720-x

    1. eLife Assessment

      This manuscript describes convincing and very interesting findings that substantially advance our understanding of a major research question on the role of Cx32 hemichannels in the Schwann cell paranode. It provides an interdisciplinary integration of imaging, in silico approaches, and functional data. This important study proposes a new mechanism with profound physiological relevance and provides new insights into glial modulation of electrical conduction in sensory/motor myelinated nerves.

    2. Reviewer #1 (Public review):

      The manuscript by Butler et al. explores a novel physiological role for connexin 32 (Cx32) hemichannels in Schwann cells of peripheral nerves. Building on the authors' prior work on CO<sub>2</sub>-sensitive gating of connexin hemichannels, this study proposes that axonal activity-dependent mitochondrial CO<sub>2</sub> production promotes the opening of Cx32 hemichannels in adjacent Schwann cells, a process regulated by carbonic anhydrase (CA) activity and AQP1. This work reveals a new form of intercellular communication that may contribute to the regulation of conduction velocity.

      The authors aimed to determine whether CO<sub>2</sub> acts as an activity-dependent signal in peripheral nerves through activation of Cx32 hemichannels in myelinating Schwann cells. The study is strengthened by the use of complementary techniques, including in silico approaches, pharmacological manipulation, dye uptake assays, calcium imaging, adenoviral delivery of dominant-negative Cx32 constructs targeted to Schwann cells, and extracellular recordings in isolated sciatic nerves. Together, these methods allow the authors to connect molecular mechanisms with tissue-level function.

      The study has a few technical limitations, and some aspects of the interpretation require caution. Limitations in antibody specificity complicate interpretation of the precise distribution of the signaling pathway components studied here. Dye uptake into the outer myelin layer is consistent with hemichannel opening, but it does not by itself prove that Cx32 directly mediates the observed permeability changes. Similarly, Ca<sup>2+</sup> signals associated with Cx32 activation could reflect direct Ca<sup>2+</sup> permeability through Cx32 or secondary activation of other Ca<sup>2+</sup> entry or release pathways. Finally, hemichannel opening is assessed primarily using FITC uptake, which may not fully capture the complexity of Cx32 gating or distinguish between different conductive states.

      Overall, the authors provide substantial evidence that activity-dependent CO<sub>2</sub> production can influence Schwann cells through a pathway involving CA, AQP1, and Cx32. The results support the broad conclusions of the study, although some direct mechanistic links require further validation. The work is likely to have an important impact because it proposes a novel role for CO<sub>2</sub> as a local signaling molecule in peripheral nerves and may provide new insight into how Schwann cells detect axonal activity and regulate peripheral nerve physiology.

      Comments on revised version.

      The authors have addressed all of my concerns. The manuscript is now much improved and reads very well. Congrats to all the research team.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      The manuscript by Butler et al. explores a novel physiological role for connexin 32 (Cx32) hemichannels in Schwann cells at peripheral nerves. Building on the authors' prior work on CO<sub>2</sub> - sensitive gating of connexins, this study proposes that mitochondrial CO<sub>2</sub> production dependent on neuronal activity promotes the opening of Cx32 hemichannels in the paranode, which in turn modulates neuronal activity by reducing conduction velocity. This hypothesis is addressed using a multifaceted approach that includes immunofluorescence microscopy, dye uptake assays, calcium imaging, computational modeling, and extracellular recordings in isolated sciatic nerves.

      Among the strengths of the study are the interdisciplinary integration of imaging, in silico approaches, and functional data. Also, this study proposes a new mechanism with profound physiological relevance. Specifically, Butler et al. provide new insights into glial modulation of electrical conduction in sensory/motor myelinated nerves.

      In the current state, the study has some limitations. The evidence linking Cx32 to the observed dye uptake and conduction velocity changes relies primarily on pharmacological inhibition with carbenoxolone, which lacks specificity. The imaging data show overlapping marker signals that preclude the anatomical distinction between nodes and paranodes. FITC uptake, while convincing to test Cx32 hemichannel gating, lacks spatial-temporal information and validation of distribution and localization to viable intracellular compartments. Moreover, while the findings are intriguing, functional proof that Cx32 regulates conduction velocity through ATP release or other downstream effects remains incomplete. Further work using targeted genetic tools, live-tissue imaging, and additional controls would strengthen the mechanistic conclusions.

      Overall, the manuscript offers compelling preliminary evidence that supports a new role for Cx32 in peripheral nerve physiology and raises important questions for future investigation.

      We thank the reviewer for their comments and agree that the evidence for involvement of Cx32 is indirect. We have now used viral expression of Cx32<sup>DN</sup> in SCs to remove CO<sub>2</sub> sensitivity from the endogenous Cx32 to strengthen this link. We have reviewed our presentation of the morphology in terms of the node/paranode/juxtaparanode distribution and adjusted accordingly. We have added new data using GCaMP transduced into Schwann cells that provides the live-tissue imaging that the reviewer requests.

      Reviewer #2 (Public review):

      Summary:

      This article aims to demonstrate that local production of CO<sub>2</sub> at the axonal node opens Cx32 hemichannels in the Schwann cell paranode, and that CO<sub>2</sub> diffuses through the AQP1 channel to reach Cx32 and trigger its opening. The authors also present evidence supporting a physiological role for this regulatory mechanism. They propose that CO<sub>2</sub>-dependent Cx32 activation mediates activity-dependent Ca<sup>2+</sup> influx into the paranode, and by increasing the leak current across the myelin sheath, it contributes to a slowing of action potential conduction velocity.

      The study presents a very interesting and novel mechanism for the physiological regulation of Cx32 hemichannels. The findings are relevant to the field, and the methods and results are of good quality, with some improvements in interpretation and explanation required, and some minor experimental suggestions.

      Strengths:

      The article is solid in terms of the novelty of the findings and relevance for the physiology of myelinated axons. In addition, it is of major interest for the Connexin field because it explores a physiological way to open Cx32 hemichannels. The experiments are well elaborated, and most of them are sufficient for the main points described by the authors. The finding that nervous activity will trigger the mechanism of hemichannel opening by CO2 is probably the most relevant biological mechanism derived from this article.

      Weaknesses:

      Throughout the manuscript, the authors interpret their findings as if the described mechanism specifically occurs in the node and paranode regions. However, there is no direct evidence identifying the precise site of CO<sub>2</sub> production or the activation site of Cx32 hemichannels. Therefore, statements such as the one in the title ("activity-dependent CO<sub>2</sub> production in the axonal node opens Cx32 in the Schwann cell paranode") should be reconsidered or removed, as they may be misleading and are not essential to the interpretation of the data. In addition, the participation of aquaporin AQP1 as the main conduit for CO2 diffusion through the plasma membrane could have another interpretation.

      We thank the reviewer for their comments and agree that we do not have direct evidence for the site of CO<sub>2</sub> production or the site of activation of Cx32 hemichannels. This direct evidence is extremely difficult to obtain, and we therefore depend on indirect arguments. Mitochondria represent the major source of CO<sub>2</sub>, and their distribution will therefore indicate where CO<sub>2</sub> is likely to be produced. We agree that this is not essential to the interpretation of the data and have adjusted the text as recommended. We have added a section to the Discussion to consider this point in more detail. The reviewer alludes to a reported interaction between AQP1 and NaV1.8 as a possible alternative interpretation. We can confidently rule this out as the AQP1 blocker has no effect on the compound action potential.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Main comments:

      (1) While the imaging system used in this study is technically capable of resolving nodes and paranodes, interpretation depends critically on marker specificity and tissue orientation. In some figures, markers such as Caspr or KCNA2 appear to partially overlap with KCNQ2 or the putative axonal node, which could reflect biological proximity but may also result from incomplete spatial separation in the z-dimension or the curvature of teased fibers. Similarly, Cx32 immunoreactivity or FITC signal is occasionally seen within nodal gaps, raising questions about how accurately this data supports the author's hypothesis. Additionally, while the authors claim that AQP1 is localized in nodes, the data suggest the opposite. Clarifying these patterns using fluorescence intensity line scans or additional nodal markers such as Nav1.6 or Ankyrin G would help distinguish overlapping signals from true domain-specific localization and reinforce the spatial conclusions of the study.

      We have changed our presentation of the localisation studies. We have concentrated on colocalization of Cx32 and AQP1 (now Fig 2) and moved the other studies to supplements to this figure. While we have retained the same images of Cx32 and AQP1 localisation, we have emphasized that these are SIM images and thus higher resolution than conventional LSM images, and also from a single optical plane. We have also clarified that the colocalization studies are restricted to analysis of the node/paranode regions.

      (2) To strengthen the conclusion that Cx32 specifically mediates the observed dye uptake, additional data or an alternative approach would be valuable. One feasible, though technically demanding, strategy would be the use of AAV-mediated delivery of Cx32-targeting shRNA directly into the sciatic nerve, ideally under a Schwann cell-specific promoter. This approach could achieve localized, cell-type-specific knockdown of Cx32 within a relevant time frame. Alternatively, the authors are encouraged to consider using additional pharmacological inhibitors to exclude the contribution of other conduction pathways, such as pannexin channels. These complementary strategies would reduce the interpretive ambiguity associated with non-specific blockade.

      We agree that this is desirable and have used Cx32<sup>DN</sup> under the control of the Mpz promoter (delivered by AAV via intranerval injection). This approach has several advantages -the Cx32<sup>DN</sup> subunit coassembles with endogenous Cx32<sup>WT</sup> and the heteromeric assemblies lack CO<sub>2</sub> sensitivity (first shown in Butler & Dale, 2023; and this strategy used with Cx26 to demonstrate its role in the control of breathing van de Wiel, 2020). This is a new figure (Fig 9). We have included supplemental figures with Fig 9 to document the coassembly of Cx32<sup>DN</sup> with Cx32<sup>WT</sup> by FRET.

      These new data test a very specific hypothesis: that CO<sub>2</sub> binding to Cx32 is responsible for the CO<sub>2</sub> sensitivity of the nerve. We find by comparing transduced and non-transduced fibres in the same nerve that Cx32<sup>DN</sup> essentially abolishes activity dependent loading of FITC into the Schwann cells.

      (3) Related to FITC experiments: Assuming the hypothesis of the authors is correct and CO2 release is restricted to the node, one should expect that if the major source of CO2 is in the nodal mitochondria, the hemichannels adjacent to the node will open first, assuming the spatial-temporal diffusion of CO2. To demonstrate this point, I would strongly suggest performing tissue imaging with real-time dye uptake. This approach should capture the FITC wave starting from the Cx32 channel opening in the paranode, as expected. Visualization of uptake in fixed and sectioned tissue is not the ideal approach to detect functional hemichannel opening in intact, viable cells, and at this point, they do not demonstrate that the uptake occurs in the node. From my perspective, if real-time experiments using isolated axons are feasible, it would make this paper more solid.

      The suggested method is not practical as the FITC in solution will be fluorescent and thus obscure the entry of FITC into the paranode. We have however expressed GCaMP8 under the control of the Mpz promoter, and this is expressed at paranodes and gives a CO<sub>2</sub> and activity-dependent Ca<sup>2+</sup> signal at the paranode. This gives a real time measure of the effect of CO<sub>2</sub> on the nerve. The GCaMP8 signal is enhanced by AZ and blocked by TC AQP1-1 (see below).

      (4) In Figure 5, Supplement 1, the authors present data using GRAB-ATP to suggest that Cx31.3 hemichannels do not release ATP under CO<sub>2</sub> stimulation. However, control experiments with GRAB-ATP alone (without Cx31.3 expression) are not shown, and parallel conditions with Cx32-expressing cells are lacking. Including these controls would strengthen the manuscript. Finally, testing the permeability of Cx31.3 to FITC directly, using the same conditions as in the main experiments, would clarify whether the discrepancy reflects differences in molecular permselectivity or CO<sub>2</sub> sensitivity.

      Figure 5 supplement 1, does show GRAB<sub>ATP</sub> alone without Cx31.3 expression (in the box plot). However, we have now added raw traces for this to the figure in panel B. CO<sub>2</sub>-dependent and voltage dependent ATP release via Cx32 has been previously shown in two papers (Butler & Dale 2023, Frontiers Cell Neurosci; Lovatt et al 2025, J Biol Chem). The Cx32<sup>DN</sup> result (above) further eliminates any contribution of Cx31.3.

      (5) Suggestion: It would be valuable to explore whether the proposed mechanism is conserved across both motor and sensory neurons, as this would broaden its physiological relevance. Since the sciatic nerve contains both fiber types, selective analysis or comparative data could clarify whether hemichannel activity is differentially regulated or restricted to a specific neuronal subtype.

      This is a great idea, but well beyond the scope of this paper. In an ex vivo preparation it would be very difficult to selectively stimulate the sensory vs motor fibres.

      Suggestions to improve data presentation and other minor comments:

      (1) Reduce/reorganize the figures to make the paper straightforward. For example, (a) immunofluorescence data showing the CO2 signaling machinery could be represented in one single figure; (b) Figure 1 could include all the findings and keep it as a final figure to summarize what the authors claim.

      We thank the reviewer for these suggestions. We prefer to keep Fig 1 up front to have our hypothesis clear for the reader to assist their interpretation as they go through the paper. We have altered the balance of figure supplements and main figures that document the immunolocalisation studies to concentrate on the main areas of novelty (AQP1 and Cx32 colocalisation and CA localisation).

      (2) The following phrase in the Results section is incomplete: "There was colocalization between Cx32 and CytC in the Schwann cell paranode, and (Fig 2, mean; 95% confidence interval, M1: 0.314; 0.198, 0.431 and M2: 0.261; 0.165, 0.357)."

      We have corrected this

      Additionally, the three values for M1 and M2 should be clearly defined and contextualized. In the current state, I couldn't understand them.

      The three values are mean and lower and upper 95% confidence limit:

      M1: mean 0.314; 95% CI, 0.198 to 0.431

      We have now made this clearer in the text.

      (3) It is unclear whether the authors calculate Manders' coefficients across the whole image or selectively at the node/paranode. Clarifying this would help interpret the specificity of co-localization claims.

      The Manders’ coefficients were selectively calculated at the node/paranode and we have amended the text to clarify this.

      (4) It is possible that mislocalization of CytC and SFXN1 could reflect antibody unspecificity or post-isolation alterations in protein distribution (e.g., apoptosis or stress). The authors briefly discussed this observation, but it could be a good idea to consider the use of an additional antibody to validate mitochondria localization.

      Apoptosis or stress is unlikely as the isolated nerves were fixed immediately after isolation with little dissection prior to fixation.

      The SFXN1 antibody was validated by Fowler et al 2013, and IP-HTMS confirmed SFXN1 as an interacting partner with Cx32. In this paper they also described SFXN1 as being present at the plasma membrane, the speculation being that it was taken there by Cx32.

      We think this is probably a valid result and we have further cited the Fowler et al 2013 paper in our discussion of this point.

      (5) Figure 4: The legend states: "Arrow heads indicate the node, and arrows depict the outer myelin." However, no arrows are visible in the figure. Please check.

      Corrected.

      (6) Figure 5: Keep consistency: Include in panel N that trpa1 inhibitor is in the presence of 70mmHg PCO2, as indicated for cbx in the same panel.

      Done

      (7) Figure 5 Supplement 1: Normalization using 1 concentration of ATP could not be appropriate if the sensor-dependent signal is not linear. If possible, authors should make a concentration-response curve and fit the data using the appropriate equation.

      Over the range we are measuring ATP (low µM) GRAB<sub>ATP</sub> is approximately linear to allow a single point calibration -we documented this in Butler and Dale 2023. This is also shown in the original paper describing GRAB<sub>ATP</sub> (Wu et al 2022 Neuron). We have clarified this point in the methods by referring to these papers.

      (8) Figure 6: The increase in FITC signal could represent a basal uptake over time. Authors should clarify the magnitude/rate of the basal uptake. Another option is showing a picture of the uptake using the control frequency at a time of 10 min. Legend: It is not clear in panel C if this picture corresponds to frequency stimulation. If so, it would be beneficial to specify the time.

      Could dye loading in this Fig simply be time dependent rather than stimulation dependent? Our data show that this is not the case -the dye loading controls of Fig 5A were exposed to FITC for 10 mins at 35 mmHg PCO<sub>2</sub> -very little loading is apparent. We now explicitly make this point in the text. Our use of Cx32<sup>DN</sup> also eliminates this explanation, by demonstrating the necessity of CO<sub>2</sub> binding to Cx32 for dye loading to occur.

      As there is no panel C in this figure, we assume the referee means panel B and have added the frequency of stimulation and time duration used to achieve the loading.

      (9) Please revise the legend of Figure 7. It seems to refer to a previous version of the manuscript's figure.

      Thanks for pointing this out. We omitted giving a letter to one of the panels and we have corrected this so that legend and figure now correspond.

      (10) Figures 10 and 11. Please consider including a bright field image or indicating with an arrow where the node and/or paranode is located.

      The old Fig 11 has been omitted. The old Fig 11 is now Fig 10. Unfortunately, we cannot add a bright field image as we did not save these in this experiment.

      (11) Figure 11. The authors could consider doing this experiment in the presence of Cx32 blockers to strengthen their conclusion.

      We have decided to remove this figure as it the information it contains is shown in the new GCaMP8 figure (Fig 12).

      (12) Figure 12: Calcium signal increases in different areas beyond the ROI. Not clear that the calcium signal is restricted to the node, as shown in previous figures. Please clarify if the preparation is different.

      We agree that this is a limitation – there is a lot of out of focus light due to Fluo4 being membrane permeable and loading many fibres within the nerve (potentially both axon and Schwann cell). Importantly, this phenomenon occurs in the in-focus ROI (for which we show BF image).

      As we think this is basically a limitation of using Fluo4-AM, we have now produced better data using GCaMP8 under the Mpz promoter (new Fig 12). This expresses at the paranode and in far fewer fibres so the resolution of the recordings is better. We have added these new data into the main body of the paper and relegated the Fluo4 data as a figure supplement to Fig 12 that provides independent supporting information.

      (13) Figure 13: Please indicate the stimulation frequency. The authors could consider attaching Figure 7 Supplement 1 to this figure to make the manuscript straightforward.

      Frequency now indicated.

      With regard to the original Figure 7 supplement 1 -thanks for this suggestion. After consideration, we have split this up and attached it as figure supplements to the relevant figures (Figure 6 and Figure 8). We have added equivalent data to Fig 7 (effect of H<sub>2</sub>O<sub>2</sub>). We think this simplifies presentation for the readers.

      (14) Figure 7 Supplement 1 and Figure 8 Supplements: Please indicate trace colors in panel A of these figures. Also, correct the spelling issue in the legend of Figure 8 Supplement 1 (for panel B).

      Corrected

      (15) Statistical clarifications: The authors should specify which experimental groups were included in some statistical analysis where p-values are reported, but the information about which groups are compared is missing.

      Corrected

      Reviewer #2 (Recommendations for the authors):

      (1) Localization of CO<sub>2</sub> production and Cx32 activation

      Throughout the manuscript, the authors interpret their findings as if the described mechanism specifically occurs in the node and paranode regions. However, there is no direct evidence identifying the precise site of CO<sub>2</sub> production or the activation site of Cx32 hemichannels. Therefore, statements such as the one in the title ("activity-dependent CO<sub>2</sub> production in the axonal node opens Cx32 in the Schwann cell paranode") should be reconsidered or removed, as they may be misleading and are not essential to the interpretation of the data.

      We agree that we have not shown this -and now exercise more caution in the description of the results and discuss this point.

      (2) Figures 2 and 3 - Cx32, mitochondria, and AQP1 localization

      In Figures 2 and 3, it is difficult to clearly discern the localization of Cx32, mitochondria, and AQP1 in the nodal and paranodal regions. The addition of zoomed-in images and 3D reconstructions (or at least orthogonal views) would greatly help clarify whether these components are indeed localized to the axon or Schwann cell, and whether they are specifically enriched in nodal or paranodal domains. As currently presented, the images suggest that all components of this "triad" are broadly distributed within the cells, not restricted to, nor particularly enriched in, nodal or paranodal areas. This observation further supports the concern raised in point 1.

      We have revised our presentation of the localisation more clearly and added a section to the discussion to consider this point more fully. We now explicitly mention that these are SIM images and in a single optical plane, therefore colocalization is genuine. We have also clarified that the calculation of Manders’ coefficients was performed only at the node/paranode regions. However, we accept that these components are distributed more widely than the node/paranode.

      (3) Figure 5 - Clarify legend labels

      In the graph shown in Figure 5, the legend would benefit from more descriptive labeling of the experimental groups. For clarity, indicate that FCCP was applied alone, and that HCO30031 was co-applied with high PCO<sub>2</sub>, to simplify interpretation for the reader.

      Corrected

      (4) Additional experiment to block mitochondrial CO<sub>2</sub> production

      An experiment should be added to completely or significantly inhibit mitochondrial CO<sub>2</sub> production, for example, by combining FCCP treatment with a TCA cycle inhibitor such as fluoroacetate. This would more directly demonstrate that CO<sub>2</sub> generation is required for hemichannel opening during FCCP treatment. It is important to control for this because FCCP can increase ROS production as a result of compensatory metabolic activity (i.e., increased NADH/FADH<sub>2</sub> generation). Since Cx32 hemichannels are known to be modulated by ROS, and can also regulate mitochondrial ROS production, it is crucial to distinguish the role of CO<sub>2</sub> from that of ROS in these experiments.

      Thanks for this great comment, as it gave us the idea of linking activity-dependent (rather than FCCP-evoked) gating of Cx32 to the TCA cycle and, as the reviewer says, CO<sub>2</sub> generation more directly. As fluoroacetate is only effective at inhibiting the TCA cycle in glial cells, we used H<sub>2</sub>O<sub>2</sub> at 50 µM which is highly effective at blocking aconitase in neurons (Tretter & Adam-Vizi, 2000). This greatly reduced FITC dye loading in response to activity. We now include these data in the paper (Fig 7).

      We note that our new data with Cx32<sup>DN</sup> further establishes the link to CO<sub>2</sub> as opposed to ROS.

      Furthermore, to complement the experiments involving carbonic anhydrase (CA) manipulation, additional controls or mechanistic validation may be necessary to support the conclusions drawn.

      We think that our use of Cx32<sup>DN</sup> greatly strengthens our conclusions that CO<sub>2</sub> is the messenger from the axon that gates Cx32 in the paranode.

      (5) AQP1 and Na<sup>+</sup> channel interaction - alternative interpretation

      It has been reported that AQP1 interacts with voltage-gated Na<sup>+</sup> channels, influencing action potential generation. For example, in AQP1 knockout mice, current injection-evoked action potentials show a reduced peak inward current, suggesting impaired Nav1.8 function (Zhang et al., J. Biol. Chem., 2010; doi: 10.1074/jbc.M109.090233). This raises the possibility that the observed effects of AQP1 inhibition (e.g., with TC AQP1-1) could also result from altered Na<sup>+</sup> channel activity, not just impaired CO<sub>2</sub> transport. I suggest that this alternative interpretation be acknowledged and discussed, as the current data do not rule it out.

      While constitutive KO of AQP1 does alter action potential generation in DRGs and an interaction between AQP1 and Nav1.8 has been documented, we do not think that this is a viable alternative interpretation of our data. We have measured the CAP during all our manipulations including the use of TC AQP1-1, and its amplitude is unaltered (see Fig 8 fig supplement 1 and Fig 13D). Our data therefore shows that, in the context of our experiments, application of the AQP1 blocker, TC AQP1-1, does not alter Na<sup>+</sup> channel activity. The difference between our data and the evidence from AQP1 knock-out may arise from the nature of an acute application of an antagonist (short term effect without changing protein expression) and constitutive knock out, which is likely to have longer term effects. We have added some discussion to address this point (last few lines, Page 9).

      (6) Figures 11A and 12C - Add heat map calibration

      In Figures 11A and 12C, the changes in Ca<sup>2+</sup> signals are difficult to interpret. In some areas, color changes appear to occur outside of cellular structures. I recommend including a heat map calibration scale for both figures to facilitate the interpretation of the signal intensity and localization.

      We agree that these data are limited by the technique used, and as mentioned above we now have GCaMP8 data that has better resolution and strengthens our conclusions.

    1. eLife Assessment

      This study presents a useful methodological advance that better enables the simultaneous measurement of gene expression and chromatin accessibility in individual cells. The evidence supporting the improved detection of gene expression is solid. The method has the potential to be more broadly impactful if it were expanded to include orthogonal validation strategies. This method will be of interest to those studying transcription and gene regulation.

    2. Reviewer #1 (Public review):

      In the manuscript entitled "Flexible and high-throughput simultaneous profiling of gene expression and chromatin accessibility in single cells," Soltys and colleagues present easySHARE-seq, a method described as an improvement upon SHARE-seq for the simultaneous measurement of RNA transcripts and chromatin accessibility.

      The authors demonstrate the utility of easySHARE-seq by profiling approximately 20,000 nuclei from the murine liver, successfully annotating cell types and linking cis-regulatory elements to target genes. The authors claim that easySHARE-seq supports longer read lengths potentially enabling better variant discovery or allele-specific signal assessment, though they do not provide direct evidence to support these specific claims.

      A key strength of the protocol is enhanced sequencing efficiency, achieved by shortening the Index 1 read from 99 to 17 nucleotides. This reduction does not come at a significant cost to barcode diversity, retaining approximately 3.5 million combinations. Additionally, the approach allows for the sequencing of a sub-library to assess quality prior to final barcoding and sequencing which seems quite clever.

      While the increase in RNA transcript recovery is substantial, it appears to come at a cost: there is a notable decrease in ATAC fragments per cell compared to the original SHARE-seq (and other platforms). Likely as a result, the dimensionality reduction (UMAP) shows good resolution for RNA profiles but relatively poor resolution for accessibility profiles. Furthermore, the presented data suggests potential ambient RNA contamination; specifically, the detection of Albumin in HSCs and B cells is likely an artifact of the protocol rather than a biological signal.

      Overall, the study is well-presented and represents a promising advance. However, there are significant shortcomings that should be addressed, particularly regarding "leaky" transcript recovery and reduced ATAC performance.

    3. Reviewer #2 (Public review):

      Aims:

      The authors sought to optimize SHARE-seq, a multimodal single-cell method, to improve the simultaneous profiling of gene expression and chromatin accessibility. Their goal was to enhance barcode design for better sequencing efficiency and cost savings, while improving overall data quality. They then applied their optimized method, easySHARE-seq, to study liver sinusoidal endothelial cells (LSECs) to demonstrate its utility in examining gene regulation and spatial zonation.

      Strengths:

      The improved barcode design is an advance, increasing the proportion of sequencing reads dedicated to biological information rather than barcode identification. This modification offers practical benefits in terms of sequencing costs and read length, potentially reducing alignment errors. The method also demonstrates improved RNA detection compared to the original SHARE-seq protocol. The biological applications showcase how simultaneous measurement of both modalities enables analyses that would be practically impossible with single-modality approaches, particularly in examining how chromatin states change along developmental or spatial trajectories.

      Weaknesses:

      There is a notable reduction in chromatin accessibility detection compared to the original SHARE-seq method, likely limiting the use of the method in certain situations.

      Overall:

      The authors achieve their aim of creating an optimized protocol with improved barcode design and enhanced RNA detection. The method represents a useful advance for specific experimental contexts where the trade-offs are appropriate.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      In the manuscript entitled "Flexible and high-throughput simultaneous profiling of gene expression and chromatin accessibility in single cells," Soltys and colleagues present easySHARE-seq, a method described as an improvement upon SHARE-seq for the simultaneous measurement of RNA transcripts and chromatin accessibility.

      The authors demonstrate the utility of easySHARE-seq by profiling approximately 20,000 nuclei from the murine liver, successfully annotating cell types and linking cisregulatory elements to target genes. The authors claim that easySHARE-seq supports longer read lengths potentially enabling better variant discovery or allele-specific signal assessment, though they do not provide direct evidence to support these specific claims.

      A key strength of the protocol is enhanced sequencing efficiency, achieved by shortening the Index 1 read from 99 to 17 nucleotides. This reduction does not come at a significant cost to barcode diversity, retaining approximately 3.5 million combinations. Additionally, the approach allows for the sequencing of a sub-library to assess quality prior to final barcoding and sequencing which seems quite clever.

      While the increase in RNA transcript recovery is substantial, it appears to come at a cost: there is a notable decrease in ATAC fragments per cell compared to the original SHARE-seq (and other platforms). Likely as a result, the dimensionality reduction (UMAP) shows good resolution for RNA profiles but relatively poor resolution for accessibility profiles. Furthermore, the presented data suggests potential ambient RNA contamination; specifically, the detection of Albumin in HSCs and B cells is likely an artifact of the protocol rather than a biological signal.

      Overall, the study is well-presented and represents a promising advance. However, there are significant shortcomings that should be addressed, particularly regarding "leaky" transcript recovery and reduced ATAC performance.

      Recommendations:

      (1) To provide a comprehensive view of the current field, the authors should include Scale Biosciences (Scale Bio) in their discussion of available commercial platforms.

      We added Scale Biosciences to the relevant part in the introduction.

      (2) A head-to-head comparison with the 10x Genomics Multiome platform would be of significant interest to the single-cell genomics community and would better contextualize the performance of easySHARE-seq.

      We agree that a comparison to the 10x Multiome technology would be of interest in the community. Therefore, we included such a dataset profiling murine liver nuclei in the comparison in Figure 1 E&F as well as Suppl. Fig. 1 L&M. The resulting comparison remains consistent - easySHARE-seq compares favourably to other multiomic technique in RNA-seq data quality (UMIs/cell) but not in ATAC-seq data quality (fragments/cell).

      (3) Optimizing ATAC Performance: I strongly suggest exploring methods to improve ATAC sensitivity. As the authors note, the improvement in RNA recovery may result from fewer processing steps and stronger fixation. It would be valuable to test if decreasing fixation back to 2% (as in the original SHARE-seq) recovers ATAC data quality, and to determine if the fixation level or the number of steps is the key variable in preserving transcripts.

      We thank the reviewer for this suggestion. We agree that knowing the specific step(s) impacting ATAC-seq data quality would be highly valuable. Unfortuantely, we are not in a position to perform the additional wetlab experiments. It remains an area of improvement as we develop the technique further. We can confirm, however, that our early trials showed that the extent of fixation is negatively correlated with ATAC-seq data recovery.

      (4) The authors allude to the possibility of scaling this assay using a barcoded poly(T). Explicit inclusion or demonstration of this capability would dramatically increase interest in this protocol. Perhaps ATAC could be scaled using a barcoded Tn5?

      We thank the reviewer for this suggestion. Since we cannot perform further experiments, we expanded and clarified on upscaling this assay in our Supplementary Notes and referred to them in the text.

      We also added a paragraph specifically discussing the use of barcoded Tn5 in the Supplementary Notes.

      (5) The number of HSCs and B cells expressing Albumin is problematic and suggests significant ambient RNA issues that need to be addressed or computationally corrected.

      We thank the reviewer for pointing out this potential issue. We have used ‘decontX’ to estimate and ‘de-contaminate’ our UMI counts. We have added a histogram of estimated fraction of contaminated counts per nuclei to Suppl. Fig. 1. We have used the decontaminated counts to re-generate the analysis in Fig. 2 B&C and Suppl. Fig. 2 F. This filtering step did not change the results of these analyses; in fact it strengthened the results and improved clarity. We have added the relevant information to the Methods section and codebase and discussed the results and implications in the Supplementary Notes which we briefly summarize here:

      “As reported in Suppl. Fig. 10, decontX identifies mean contaminated counts of 9.6% and median contaminated counts of 1.4%, suggesting that few cells that are heavily contaminated strongly inflate the overall estimation of contaminated counts. This could be due to 1) doublets or b) wrongly assigned cell types. The authors of decontX report contamination values of 1-4% in commercial droplet-based protocols and 11-14% in plate-based protocols, suggesting that easySHARE-seq performs better than other plate-based assays.”

      We again want to thank the reviewer for this suggestion. It has improved the manuscript.

      Reviewer #2 (Public review):

      Aims:

      The authors sought to optimize SHARE-seq, a multimodal single-cell method, to improve the simultaneous profiling of gene expression and chromatin accessibility. Their goal was to enhance barcode design for better sequencing efficiency and cost savings, while improving overall data quality. They then applied their optimized method, easySHARE-seq, to study liver sinusoidal endothelial cells (LSECs) to demonstrate its utility in examining gene regulation and spatial zonation.

      Strengths:

      The improved barcode design is an advance, increasing the proportion of sequencing reads dedicated to biological information rather than barcode identification. This modification offers practical benefits in terms of sequencing costs and read length, potentially reducing alignment errors. The method also demonstrates improved RNA detection compared to the original SHARE-seq protocol. The biological applications showcase how simultaneous measurement of both modalities enables analyses that would be practically impossible with single-modality approaches, particularly in examining how chromatin states change along developmental or spatial trajectories.

      Weaknesses:

      There is a notable reduction in chromatin accessibility detection compared to the original SHARE-seq method, likely limiting the broad use of the method. While the authors are transparent about this tradeoff, additional discussion would be helpful regarding how this affects data interpretation. Comparisons showing consistency between easySHARE-seq and SHARE-seq chromatin accessibility patterns at the single-cell level would strengthen confidence in the method.

      Overall:

      The authors achieve their aim of creating an optimized protocol with improved barcode design and enhanced RNA detection. The method represents a useful advance for specific experimental contexts where the tradeoffs are appropriate. Recommendations for the authors:

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Figure 1F appears identical to Supplementary Figure 1M. This should be corrected if this is in error.

      Fixed.

      Reviewer #2 (Recommendations for the authors):

      The following comments are intended to strengthen the work.

      (1) scATAC-seq Performance and Data Consistency

      While I appreciate the authors' transparency regarding scATAC-seq performance, the extent of underperformance warrants greater emphasis. Additionally, does the average ATAC-seq signal recapitulate previously published results? At the single-cell level, how consistent are easySHARE-seq and SHARE-seq data? I suspect that increased dropout in scATAC-seq may distort consistency between datasets. This should be explicitly discussed in terms of data interpretation.

      We thank the reviewer for this suggestion. We have cross-referenced the open chromatin regions in this study and we summarise the result at the end of the ‘benchmarking’ paragraph. We have further expanded on the limitations in our study in the ATAC-seq data given the lower data quality in the relevant part of the discussion. We should note that a direct comparison between SHARE-seq and this study is challenging due to different sample tissues.

      (2) LSEC Biological Investigations

      The biological investigations could be strengthened (though this may reflect my limited expertise with LSECs).

      (a) Enhancer analysis depth

      While the authors quantify potential enhancers through RNA-ATAC correlations within individual cells and identify genes regulated by multiple enhancers, a deeper exploration of enhancer biology would strengthen the manuscript. Potential questions include: Do genes sharing correlated enhancer activity also show correlated expression? How do enhancer number and strength relate to gene expression levels? How do RNA-ATAC correlations scale with ATAC peak height? Are stronger enhancers more tightly linked to gene expression? Perhaps the authors explored these questions without finding significant patterns, but this should be clarified.

      We thank the reviewer for this suggestions. We performed several analyses aimed at exploring enhancer biology with this dataset. We added a simple comparison for UMIs per gene between genes with at least one associated peak compared to those without in Suppl. Fig. 3I. We provide the corresponding plot for fragments per peak in Suppl. Fig. 3J. We also explored the relationship between gene expression and chromatin accessibility; here, we found that gene expression levels do not correlate with peak heights of chromatin accessibility (possibly because chromatin accessibility signals were somewhat binary). The corresponding plot has been added to Suppl. Fig. 3K. We added a small paragraph discussing these findings in the main text.

      (b) Correlation magnitude interpretation

      The reported correlation values are extremely small. Does this reflect weak biological linkages or primarily experimental noise? If experimental noise, how does variation in detection per gene influence the confidence in this type of analysis?

      We thank the reviewer for raising this potential issue. We identify a total of 40,957 significant peak-gene associations with a mean Spearman correlation of 0.1 (± 0.056; Suppl. Fig. 3E). This analytical workflow to identify these gene-peak associations was first described alongside SHARE-seq in Ma et al.. For context, they reported significant peak-gene associations to have a mean Spearman correlation of 0.026 (± 0.015; Ma et al. Table S4).

      Generally, we hypothesize that these low correlation values in this type of analysis are the results of sparseness of single-cell data, especially in chromatin accessibility. Therefore, the power to detect gene–peak associations increases with cell number (Ma et al., Fig. 3B) and the limited cell numbers in the analysis in this study likely results in an enrichment of the most strongly correlated associations among those detected. We have added a comparison of UMIs per gene for genes with and without a significant gene-peak correlation, illustrating this dynamic (Suppl. Fig. 3I). Furthermore, we have described this relationship and limitation in the relevant part of the results section.

      (c) Zonation analysis framing

      The zonation analysis is compelling, but the authors should more explicitly emphasize that defining pseudotime and examining chromatin state dynamics is only possible because both modalities are measured simultaneously. And more detail on the Monocle3 pseudotime analysis is needed, as it is unclear how this was really done.

      We expanded our description on the pseudotime analysis using Monocle in the relevant section in the Methods. Furthermore, we explicitly point out that this type of analysis relies on simultaneous measurements of both modalities at the end of the results section.

    1. eLife Assessment

      This important study provides new insights into the neuronal dynamics of the locus coeruleus in relation to hippocampal sharp-wave ripples. Using high-temporal-resolution, multi-site electrophysiological recordings in rats, the authors present convincing evidence that ripples and locus coeruleus activity are inversely correlated to levels of arousal and noradrenaline tone is modulated by hippocampo-cortical coupling. Overall, the work will be of interest to neuroscientists studying large-scale brain coordination and memory processes.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript by Yang et al. investigates the relationship between multi-unit activity in the locus coeruleus, putatively noradrenergic locus coeruleus, hippocampus (HP) sharp-wave ripples (SWR) and spindles using multi-site electrophysiology in freely behaving male rats. The study focuses on SWR during quiet wake and non-REM sleep, and their relation to cortical states (identified using EEG recordings in frontal areas) and LC units.

      The manuscript highlights differential modulation of LC units as a function of HP-cortical communication during wake and sleep. They establish that ripples and LC units are inversely correlated to levels of arousal: wake, i.e. higher arousal correlates with higher LC unit activity and lower ripple rates. The authors show that LC neuron activity is strongly inhibited just before SWR detected during wake. During non-REM sleep, they distinguish "isolated" ripples from SWR coupled to spindles and show that inhibition of LC neuron activity is absent before spindle-coupled ripples but not before isolated ripples, suggesting a mechanism where noradrenaline (NA) tone is modulated by HP-cortical coupling. This result has interesting implications for the roles of noradrenaline in the modulation of sleep-dependent memory consolidation, as ripple-spindle coupling is a mechanism favoring consolidation. The authors further show that NA neuronal activity is downregulated before spindles.

      Strengths:

      In continuity with previous work from the laboratory, this work expands our understanding of the activity of neuromodulatory systems in relation to vigilance states and brain oscillations, an area of research that is timely and impactful. The manuscript presents strong results suggesting that NA tone varies differentially depending on coupling of HP SWR with cortical spindles. The authors place their findings back in the context of identified roles of HP ripples and coupling to cortical oscillations for memory formation in a very interesting discussion. The distinction of LC neuron activity between awake, ripple-spindle coupled events and isolated ripples is an exciting result and its relation to arousal and memory opens fascinating lines of research.

      Weaknesses:

      I regretted that the paper fell short of trying to push this line of idea a bit further, for example by contrasting in the same rats the LC unit-HP ripple coupling during exploration of a highly familiar context (as seemingly was the case in their study) versus a novel context, which would increase arousal and trigger memory-related mechanisms. Any kind of manipulation of arousal levels and investigation of the impact on awake vs nonREM sleep LC-HP ripple coordination would considerably strengthen the scope of the study.

      Comments on revised version.

      The authors have added methodological details to the results section after the first round of reviews, improving the manuscript readability. Some points might still be improved, for example, the authors use a delta/gamma ratio to track cortical states for example, but there is no methods section corresponding to this metric. Authors write that higher SI corresponds to a lower arousal state that is associated with "more synchronized cortical population activity, higher ripple rate and reduced LC neurons firing" but there are no references or analysis to support this statement, only examples showing changes in SI over a few minutes.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, authors studied the synchrony between ripple events in Hippocampus, cortical spindles and Locus Coeruleus spiking. The results in this study together with the established literature on the relationship of hippocampal ripples with widespread thalamic and cortical waves, guided authors to propose a role for Locus Coeruleus spiking patterns in memory consolidation. The findings provided here, i.e. correlations between LC spiking activity and Hippocampal ripples, could provide basis for future studies probing the directional flow or the necessity of these correlations in the memory consolidation process. Hence, the paper provides enough scientific advance to highlight the elusive yet important role of Norepinephrine circuitry in the memory processes.

      Strengths:

      Authors were able to demonstrate correlations of Locus Coeruleus spikes with hippocampal ripples as well as with cortical spindles. Specific strength of the paper is in the demonstration that the spindles that activate with the ripples are comparatively different in their correlations with Locus Coeruleus than those which do not.

      Weaknesses:

      The claims regarding the roles of these specific interactions were mostly derived from the literature that these processes individually contribute to the memory process, without any evidence of these specific interactions being necessary for memory processes. There are also issues with the description of methods, validation of shuffling procedures and unclear presentation and the interpretation of the findings, which are described in points that follow. I believe addressing these weaknesses might improve and add to the strength of the findings.

      Comments on revised version.

      The authors addressed all of my major concerns during the revision. As a result, the study now provides convincing evidence as well as improved presentation of results, that makes this manuscript important to the broader field of neuroscience, beyond the specific sub-field.

    4. Reviewer #3 (Public review):

      This manuscript examines how locus coeruleus (LC) activity relates to hippocampal ripple events across behavioral states in freely moving rats. Using multi-site electrophysiological recordings, the authors report that LC activity is suppressed prior to ripple events, with the magnitude of suppression depending on ripple subtype. Suppression is stronger during wakefulness than during NREM sleep and least pronounced for ripples coupled to spindles.

      The study is technically sound and addresses a timely and important question regarding how LC activity interacts with hippocampal and thalamocortical network events across vigilance states. While the findings are interesting, they remain observational in nature. Following revision, the manuscript has substantially improved in both presentation and interpretation of the results, and most concerns have been addressed satisfactorily. I therefore only have a few minor considerations that the authors may wish to explore further in the current study or in future work, as these directions could provide additional mechanistic insight and would likely be of considerable interest to the field.

      The authors demonstrate clearly that tonic LC firing rates preceding ripples differ significantly between wake-associated ripples (highest LC firing), isolated ripples during NREM sleep (lower LC firing), and spindle-coupled ripples (lowest LC firing). They also appropriately note that baseline firing differences will naturally influence the magnitude of LC suppression, which they also observe (highest LC reduction for wake ripples, then isolated ripples and last spindle-coupled ripples). However, this aspect could be explored further, as it may provide additional insight into the regulation of spindle-associated ripple events. Since LC activity appears to decline gradually prior to ripple occurrence (Suppl. Figure 2), it would be interesting to test whether this gradual reduction helps organize the emergence of isolated versus spindle-coupled ripples. For example, isolated ripples may occur during the initial phase of LC decline, whereas spindle-coupled ripples may preferentially emerge when LC activity reaches its lowest levels. Such a relationship could also be consistent with the stronger synchronization observed for spindle-ripple coupling.

      Related to this point, it would also be informative to examine whether isolated spindles occur more randomly in time, whereas spindle-associated ripple events appear more temporally clustered. If a single isolated spindle occurs, the associated LC suppression might be more pronounced. In contrast, when multiple spindle-associated ripple events occur in succession, LC activity may already be reduced following the first event, resulting in smaller additional suppression preceding subsequent events. Exploring this possibility could help clarify how LC dynamics shape the temporal emergence of ripple-subtypes

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The manuscript by Yang et al. investigates the relationship between multi-unit activity in the locus coeruleus, putatively noradrenergic locus coeruleus, hippocampus (HP), sharp-wave ripples (SWR), and spindles using multi-site electrophysiology in freely behaving male rats. The study focuses on SWR during quiet wake and non-REM sleep, and their relation to cortical states (identified using EEG recordings in frontal areas) and LC units.

      The manuscript highlights differential modulation of LC units as a function of HP-cortical communication during wake and sleep. They establish that ripples and LC units are inversely correlated to levels of arousal: wake, i.e., higher arousal correlates with higher LC unit activity and lower ripple rates. The authors show that LC neuron activity is strongly inhibited just before SWR is detected during wake. During non-REM sleep, they distinguish "isolated" ripples from SWR coupled to spindles and show that inhibition of LC neuron activity is absent before spindle-coupled ripples but not before isolated ripples, suggesting a mechanism where noradrenaline (NA) tone is modulated by HP-cortical coupling. This result has interesting implications for the roles of noradrenaline in the modulation of sleep-dependent memory consolidation, as ripple-spindle coupling is a mechanism favoring consolidation. The authors further show that NA neuronal activity is downregulated before spindles.

      Strengths:

      In continuity with previous work from the laboratory, this work expands our understanding of the activity of neuromodulatory systems in relation to vigilance states and brain oscillations, an area of research that is timely and impactful. The manuscript presents strong results suggesting that NA tone varies differentially depending on the coupling of HP SWR with cortical spindles. The authors place their findings back in the context of identified roles of HP ripples and coupling to cortical oscillations for memory formation in a very interesting discussion. The distinction of LC neuron activity between awake, ripple-spindle coupled events and isolated ripples is an exciting result, and its relation to arousal and memory opens fascinating lines of research.

      Weaknesses:

      I regretted that the paper fell short of trying to push this line of idea a bit further, for example, by contrasting in the same rats the LC unit-HP ripple coupling during exploration of a highly familiar context (as seemingly was the case in their study) versus a novel context, which would increase arousal and trigger memory-related mechanisms. Any kind of manipulation of arousal levels and investigation of the impact on awake vs non-REM sleep LC-HP ripple coordination would considerably strengthen the scope of the study.

      We agree that conducting specific behavioral tests before electrophysiological recordings, as well as manipulating arousal during the recording session, would strengthen the study. These experiments are planned for future work, and we acknowledged this point in the discussion.

      We added the following text in the Discussion: “Conducting behavioral assays prior to electrophysiological recordings, along with spatially and temporally precise modulation of LC activity during recording sessions, will be essential for achieving a mechanistic understanding of network dynamics and its functional role for memory consolidation in future investigations.”

      The main result shows that LC units are not modulated during non-REM sleep around spindle-coupled ripples (named spRipples, 17.2% of detected ripples); they also show that LC units are modulated around ripple-coupled spindles (ripSpindles, proportion of detected spindles not specified, please add). These results seem in contradiction; this point should be addressed by the authors.

      The detection of coupled events - spindle-coupled ripples (spRipple) and ripple-coupled spindles (ripSpindle) - was performed independently, although, some overlap cannot be excluded. We found that LC suppression was generally weak around both types of coupled events. Specifically, LC suppression around spRipples and ripSpindles reached significance (exceeding the 95% confidence interval) in 4 sessions (from 3 rats) and 3 sessions (from 2 rats), respectively, out of a total of 20 sessions (from 7 rats).

      We revised the manuscript by providing additional information in the Results section and adding a Supplementary Figure 5 showing a significant correlation (Pearson r = 0.72, p = 0.0003) between the modulation index (MI) for spRipple and ripSpindle.

      Results are displayed per recording session, with 20 sessions total recorded from 7 rats (2 to 8 sessions per rat), which implies that one of the rats accounts for 40% of the dataset. Authors should provide controls and/or data displayed as average per rat to ensure that results are now skewed by the weight of that single rat in the results.

      High-quality recordings from the LC in behaving rats are technically challenging and relatively rare; therefore, we included all valid datasets in analysis. The average modulation index (MI), calculated per animal and per session, fell within a consistent range (Supplementary Figure 3) despite variability in the number of recording sessions (2–8 sessions per rat).

      In its current form, the manuscript presents a lack of methodological detail that needs to be addressed, as it clouds the understanding of the analysis and conclusions. For example, the method to account for the influence of cortical state on LC MUA is unclear, both for the exact methods (shuffling of the ripple or spindle onset times) and how this minimizes the influence of cortical states; this should be better described. If the authors wish to analyze unit modulation as a function of cortical state, could they also identify/sort based on cortical states and then look at unit modulation around ripple onset? For the first part of the paper, was an analysis performed on quiet wake, non-REM sleep, or both?

      The LC activity around rippled was modulated at multiple temporal scales. First, we observed a relatively sharp drop in the LC firing rate ~ 2 s before the ripple onset. When computing peri-ripple LC activity over a longer time window ([–12, 12] sec), we observed a rather slow decrease in the LC firing rate beginning as early as 10 s before the ripple onset (Supplementary Figure 2).

      Considering two temporal scales, we hypothesized that slow modulation of LC activity might be related to fluctuations of the global brain state. We quantified the ongoing cortical state using a synchronization index (SI), calculated as a power ratio (1–4 Hz/30–90 Hz) of the EEG within 4-s windows and computed the corresponding ripple and LC-MUA rates. Figure 3A (in the main manuscript) illustrates that a higher SI (more synchronized cortical population activity) corresponded to a lower arousal state and reduced LC tonic firing; this brain state was associated with a higher ripple activity. As shown in the new Figure 3B, the LC firing rate was negatively correlated with the SI and ripple rate. Thus, slow LC modulation was likely driven by cortical state transitions.

      To correct for the influence of the global brain state on the peri-ripple LC activity, we generated surrogate events by jittering the times of detected ripples. First, we confirmed that triggering the hippocampal LFP on the surrogate events lacked the ripple-specific frequency component (main Figure 3C) and the SI state did not differ around ripples and surrogate events (main Figure 3D). Plotting the LC activity around surrogate evens captured its state-dependent dynamics (Figure 3 or Supplementary Figure 2, orange trace). To extract state-independent peri-ripple LC modulation, we subtracted the state-related LC activity (orange trace) from the ripple-triggered LC activity (blue trace). The resulting trace yielded a corrected estimate of ripple-associated LC activity that was largely free from the confounding influence of cortical state transitions (main Figure 3E).

      In the Results subsection “LC-NE neuron spiking is suppressed around hippocampal ripples”, we reported LC modulation without accounting for the cortical state (main Figure 2). The state-dependent effects were instead examined in the subsequent Results subsection, “LC firing and ripple occurrence are state-dependent and inversely related” we report state-corrected LC modulation (main Figure 3). Finally, in the Results subsection “Peri-ripple LC modulation depends on the cortical–hippocampal interaction,” we characterized LC activity around ripples across different cortical states (quite awake and NREM sleep).

      We revised Methods and Results to provide more methodological details and a rationale for each analysis, as requested.

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors studied the synchrony between ripple events in the Hippocampus, cortical spindles, and Locus Coeruleus spiking. The results in this study, together with the established literature on the relationship of hippocampal ripples with widespread thalamic and cortical waves, guided the authors to propose a role for Locus Coeruleus spiking patterns in memory consolidation. The findings provided here, i.e., correlations between LC spiking activity and Hippocampal ripples, could provide a basis for future studies probing the directional flow or the necessity of these correlations in the memory consolidation process. Hence, the paper provides enough scientific advances to highlight the elusive yet important role of Norepinephrine circuitry in the memory processes.

      Strengths:

      The authors were able to demonstrate correlations of Locus Coeruleus spikes with hippocampal ripples as well as with cortical spindles. A specific strength of the paper is in the demonstration that the spindles that activate with the ripples are comparatively different in their correlations with Locus Coeruleus than those that do not.

      Weaknesses:

      The claims regarding the roles of these specific interactions were mostly derived from the literature that these processes individually contribute to the memory process, without any evidence of these specific interactions being necessary for memory processes. There are also issues with the description of methods, validation of shuffling procedures, and unclear presentation and the interpretation of the findings, which are described in the points that follow. I believe addressing these weaknesses might improve and add to the strength of the findings.

      We believe that our responses to the Reviewer 1 and Reviewer 2, corresponding revisions of the manuscript and new figures adequately addressed all issues raised by the Reviewer 2.

      Reviewer #3 (Public review):

      Summary:

      This manuscript examines how locus coeruleus (LC) activity relates to hippocampal ripple events across behavioral states in freely moving rats. Using multi-site electrophysiological recordings, the authors report that LC activity is suppressed prior to ripple events, with the magnitude of suppression depending on the ripple subtype. Suppression is stronger during wakefulness than during NREM sleep and is least pronounced for ripples coupled to spindles.

      The study is technically competent and addresses an important question regarding how LC activity interacts with hippocampal and thalamocortical network events across vigilance states.

      Weaknesses:

      The results are interesting, but entirely observational. Also, the study in its current form would benefit from optimization of figure labeling and presentation, and more detailed result descriptions to make the findings fully interpretable. Also, it would be beneficial if the authors could formulate the narrative and central hypothesis more clearly to ease the line of reasoning across sections.

      We improved the presentation of results by incorporating additional figures and expanding the detail in the figure captions. In the main text, we clarified specific hypotheses and provided a rationale underlying each analysis.

      Comments:

      (1) Stronger evidence that recorded units represent noradrenergic LC neurons would reinforce the conclusions. While direct validation may not be possible, showing absolute firing rates (Hz) across quiet wake, active wake, NREM, and REM, and comparing them to published LC values, would help.

      We added the requested data and a Supplementary Figure 1 in the revised manuscript: “The average firing rates of LC single units were 1.70 ± 0.21 Hz during wakefulness, 0.51 ± 0.07 Hz during NREM sleep, and 0.014 ± 0.01 Hz during REM sleep (Supplementary Figure 1). Firing rates differed significantly across arousal states, with the highest activity during wakefulness, reduced activity during NREM sleep, and minimal activity during REM sleep (one-way ANOVA: F(2,38) = 39.8, p < 0.0001). This firing pattern is characteristic of LC-NE neurons and is consistent with existing literature.”

      (2) The analyses rely almost exclusively on z-scored LC firing and short baselines (~4-6 s), which limits biological interpretation. The authors should include absolute firing rates alongside normalized values for peri-ripple and peri-spindle analyses and extend pre-event windows to at least 20-30 s to assess tonic firing evolution. This would clarify whether differences across ripple subtypes arise from ceiling or floor effects in LC activity; if ripples require LC silence, the relative drop will appear larger during high-firing wake states. This limitation should be discussed and, if possible, results should be shown based on unnormalized firing rates.

      We agree with the reviewer that a longer pre-event window provides a clearer estimate of baseline LC activity. However, given that both ripples and spindles are brief oscillatory events, we tested a range of time windows and found that a 12-s interval adequately captures baseline LC activity dynamics. Accordingly, we included plots with extended pre-event windows (−12 to 12 s), as requested.

      We added in the revised manuscript absolute firing rates for well-isolated LC single units. Because the number of neurons contributing to LC multi-unit activity (LC-MUA) is unknown, we avoided averaging absolute firing rates for this signal. For LC-MUA, we implemented a normalization approach in which firing rates (50-ms bins) around ripple or spindle are scaled to a baseline period preceding the trigger event (−12 to −10 s). Importantly, unlike z-scoring, this normalization method preserves baseline differences across behavioral states. As shown in Author response image 1A and new Figure 5 in the main manuscript, baseline LC firing rates were highest prior to awake ripples and lowest prior to sleep spindles. During ripples occurring in wakefulness, LC activity did not decrease to the levels observed during sleep. In contrast, during NREM sleep, LC activity was downregulated during both ripples and spindles, although it did not reach complete silence around either oscillatory event.

      Author response image 1B illustrates a slow downward drift in the LC firing rate preceding either ripple or spindle. The slow LC dynamics likely reflected gradual transitions toward more synchronized brain state, which is optimal for ripple generation. In contrast, event-specific LC modulation had faster dynamics (Author response image 1B, highlighted interval) and was largely absent in cases where spRipples and ripSpindles were not associated with LC suppression (Author response image 1C).

      To minimize the influence of global state fluctuations and emphasize event-related dynamics, we therefore presented the main results using state-corrected and z-scored PETHs.

      Please also refer to our response to Reviewer 1 regarding the two temporal scales of LC modulation.

      Author response image 1.

      LC modulation around sleep oscillations. (A) Peri-event LC-MUA during awake and NREM sleep. LC activity and the range of peri-event LC modulation differed across behavioral states; it was overall higher preceding ripples occurring in wakefulness than in NREM sleep, and it was the lowest around sleep spindles. Despite the state-dependent differences in the firing rate, LC modulation was observed around all oscillatory events. During wakefulness, LC activity did not decrease to the levels observed during NREM sleep. During NREM sleep, LC activity was down-regulated around both ripples and spindles, and the LC firing did not completely cease around either oscillatory event. (B) Peri-event LC-MUA around isolated oscillatory events. LC activity exhibited fast peri-event dynamics (highlighted interval) superimposed on slower, state-dependent fluctuations. (C) Peri-event LC-MUA around coupled oscillatory events. Fast peri-event LC modulation was absent, while slow fluctuations were preserved around coupled oscillatory events. For all plots, LC-MUA firing rate was scaled to a pre-event baseline interval [-12 to -10 sec] to preserve baseline differences in LC activity across behavioral states. Bin size: 50 ms. isoRipple – isolated ripple, isoSpindle – isolated spindle, spRipple - spindle-coupled ripple, ripSpindle - ripple-coupled spindle.}

      (3) Because spindles often occur in clusters, the timing of ripple occurrence within these clusters could influence LC suppression. Indicate whether this structure was considered or discuss how it might affect interpretation (e.g., first vs. subsequent ripples within a spindle cluster).

      We did not consider spindle clusters and classified the event as ripple-coupled spindle if the ripple occurred between the spindle on and offset.

      (4) While the observational approach is appropriate here, causal tests (e.g., optogenetic or chemogenetic manipulation of LC around ripple events and in memory tasks) would considerably strengthen the mechanistic conclusions. At a minimum, a discussion of how such approaches could address current open questions would improve the manuscript.

      We agree that conducting causal tests would strengthen the study. We added the following text in the Discussion: “Conducting behavioral assays prior to electrophysiological recordings, along with spatially and temporally precise modulation of LC activity during recording sessions, will be essential for achieving a mechanistic understanding of network dynamics and its functional role for memory consolidation in future investigations.”

      (5) Please show how "Synchronization Index" (SI) differs quantitatively across behavioral states (wake, NREM, REM) and discuss whether it could serve as a state classifier. This would strengthen interpretations of the correlations between SI, ripple occurrence, and LC activity.

      We plotted the awake state-normalized SIs for awake and NREM sleep. Due to small number of REM sleep episodes, SI for REM sleep is not shown. The average SI during NREM sleep was significantly higher than during awake state, consistent with the well-established dominance of low-frequency (1-4 Hz) oscillatory power and reduced high-frequency (30-90 Hz) power during NREM sleep.

      Although SI could potentially serve as a behavioral state classifier, we have chosen not to address this point to maintain the focus in the discussion on new results.

      Author response image 2.

      Synchronization index differentiates behavioral states.

      (6) The current use of SI to denote a delta/gamma power ratio is unconventional, as "SI" typically refers to phase-locking metrics. Consider adopting a more standard term, such as delta/gamma power ratio. Similarly, it would be easier to follow if you use common terminology (AUC) to describe the drop in LC-MUA rather than using "MI" and "sub-MI".

      The ranges of delta and gamma bands might vary across studies; therefore, we prefer using SI, as defined here and in our previous publications (Novitskaya et al., 2016; Yang et al., 2019, 2021). We calculated the modulation index (MI) as the area under the curve of the peri-event time histogram within the 1 second preceding ripple onset. To avoid potential confusion with the AUC calculated over the entire signal window, we opted to use MI.

      (7) The logic in Figure 3 is difficult to follow. The brain state (delta/gamma ratio) appears unchanged relative to surrogate events (3C), while LC activity that is supposedly negatively correlated to delta/gamma changes markedly (3D-E). Could this discrepancy reflect the low temporal resolution (4-s windows) used to calculate delta/gamma when the changes occur on a shorter time scale?

      We appreciate the reviewer’s question. We revised the results and Figure 3 legend to clarify this point. The main Figures 3E and 3F show the 'state-corrected' peri-ripple LC activity. The purpose of generating ‘surrogate’ events was precisely to capture the component of LC activity dynamics that can be explained by cortical state fluctuations alone. As shown in Supplementary Figure 2, the orange trace represents LC activity aligned to surrogate events and, as the Reviewer noted, shows a clear decrease, yet at a slower time scale. We interpret this surrogate-aligned signal as the LC modulation attributable specifically to cortical state fluctuations. Importantly, shuffled events were associated with similar SIs (cortical state), but absent HPC LFP power increase in the ripple range (140-250 Hz), as shown in the main Figures 3C and 3D, respectively. To isolate the peri-event LC dynamics, we subtracted the state-related component (Figure 3, orange trace) from the ripple-triggered LC activity (blue trace). This correction yielded an estimate of ripple-associated LC activity that is largely independent of the confounding influence of ongoing cortical state.

      Please, see our detailed response to the Reviewer 1 about multiple time scales of LC dynamics.

      (8) There are apparent inconsistencies between Figures 4B and 4C-D. In B, it seems that the difference between the 10th and 90th percentile is mostly in higher frequencies, but in C and D, the only significant difference is in the delta band.

      We repeated this analysis, clarified inconsistency, and revised Figure 4 legend.

      (9) Because standard sleep scoring is based on EEG and EMG signals, please include an example of sleep scoring alongside the data used for state classification. It would also be relevant to include the delta/gamma power ratio in such an example plot.

      We replaced ‘standard’ with ‘previously established” sleep scoring procedure and added a Supplementary Figure 4 showing representative NREM sleep and wake episodes with corresponding EEG and SI.

      (10) Can variability in modulation index (subMI) across ripple subsets reflect differences in recording quality? Please report and compare mean LC firing rates across subsets to confirm this is not a confounding factor.

      We agree that considering recording quality and unit stability over time as potential confounding factors is important. We therefore carefully evaluated each dataset to ensure the absence of significant drift in the LC firing rate. However, we find that comparing mean LC firing rates across subsets of ripples, as suggested by the Reviewer, is insufficient to control for recording stability, as LC activity varies substantially across behavioral states. At present, we are not aware of a robust method to fully eliminate variability related to recording quality and unit stability over time.

      (11) Figure 6B: If the brown trace represents LC-MUA activity around random time points, why would there be a coinciding negative peak as relative to real sleep spindles? Or is it the subtracted trace?

      We have revised Figure 7 (original Figure 6) and its legend to improve clarity and readability.

      (12) On page 8, lines 207-209, the authors write "Importantly, neither the LC-MUA rate nor SIs differed during a 2-sec time window preceding either group of spindles". It is unclear which data they refer to, but the statement seems to contradict Figure 6E as well as the following sentence: "Across sessions, MI values exceeded 95% CI in 17/20 datasets for isoSpindles and only 3/20 for ripSpindles". This should be clarified.

      We have revised the corresponding text to improve clarity and readability.

      (13) The results in Figures 5C and 6F do not align. It seems surprising that ripple-coupled spindles show a considerably higher LC modulation than spindle-coupled ripples, as these events should overlap. Could the discrepancy be due to Z-score normalization as mentioned above? Please include a discussion of this to help the interpretation of the results.

      In the original manuscript, Figure 6F was mistakenly labelled for ripple-coupled (ripSpindles) and isolated (isoSpindles) spindles. Now it has been corrected.

      Please, also see our response to the Reviewer 1 weaknesses.

      (14) The text implies that 8 recordings came from one rat and two each from six others. This should be confirmed, and it should be explained how the recordings were balanced and analyzed across animals.

      Since high-quality recordings from LC in behaving animals are challenging and rare, we used all valid sessions. We addressed the same point in our response to the Reviewer 1 weaknesses.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Below are some suggestions for clarification/information that are needed to improve the paper's readability (and the understanding of the analysis and methods).

      (1) The authors describe a consistently negative correlation between cortical EEG synchronization index and ripple rate or LC-MUA, show an example in Figure 3A, and report a range of r values in the text with a mention of p < 0.01. The reported p-value is presumably the highest p-value for the correlations - please specify. Visualization of the results might be improved by adding example correlations (also true for later correlations in Figure 6).

      We revised the result description accordingly and included correlation plots in Figures 3 and 7.

      (2) Description of statistical testing is missing for Figure 3C (nothing in the text or the figure legend); there is also no statistics section in the methods. For Figure 4, the statistics are reported for the Friedman test but not the post-hoc tests. Exact p-value and statistics should be reported for the comparison of LC-MUA rate and SI in the 2 s preceding spindles.

      We have added the statistical results requested and revised figure legends by providing additional information. We added the Statistic Analysis section in the Methods.

      Figure 3D (original Fig.3C): “Average Synchronization Index (SI) around ripples and shuffled events. The cortical state preceding shuffled events and ripples was comparable, as confirmed by the absence of significant differences in SI (Wilcoxon signed-rank test; shuffled: Z = -0.20, p = 0.84; ripples: Z = 0.14, p = 0.88). Cortical synchrony increased following both events (shuffled: Z = -3.50, p = 0.00044; ripples: Z = -3.66, p = 0.00026). Similar cortical state dynamics surrounding shuffled events and ripples indicate that the surrogate events adequately capture the cortical state associated with ripple occurrence.

      Figure 6: Intra-ripple frequency (A) and peak amplitude (B) for different ripple types. Boxwhisker plots show the median, the 1st and 3rd quartiles, and min/max. Gray dots show data from individual rats. *** - p < 0.001 for post hoc pairwise comparisons (Wilcoxon signed-rank tests with Holm–Bonferroni correction for multiple comparisons).

      We revised the Results accordingly: “The ripple subtypes differed in the intra-ripple frequency (Friedman test, chi2 = 35.62, p < 0.0001, post hoc pairwise comparisons were performed using Wilcoxon signed-rank tests with Holm–Bonferroni correction for multiple comparisons. awRipple vs isoRipple: p = 0.00003 awRipple vs spRipple: p = 0.00004 isoRipple vs spRipple: p = 0.0002}), with awRipples being the fastest and spRipples the slowest (Figure 6A).There was no difference in the ripple peak amplitude (Friedman test, $\chi$2 = 3.7, p = 0.16; Figure 6B).”

      (3) The method description of ripple-spindle coupling detection is missing.

      We have added the description of ripple-spindle coupling detection in the Methods.

      (4) Based on Figure 6D, the authors report that ripple-coupled spindles are significantly shorter than isolated spindles. What are the measurements reported on lines 206-207, and how do they relate to the averaged spectrograms shown in Figure 6D?

      Spindle duration was calculated as the time between spindle onset and offset (as described now in the Methods and Figure 7 legend). Ripple-coupled spindle was considered if at least one ripple occurred between the spindle onset and offset. The duration of ripple-coupled and uncoupled spindles was statistically compared (the stats is reported in text). In Figure 7E, the peri-event averaged EEG spectrograms are plotted for isolated and ripple-coupled spindles, highlighting the difference in the event duration.

      (5) None of the color scales have legends (Figures 2A, B, C, Figure 3D, etc.).

      We have added the color scales on all Figures.

      (6) Description of what is represented in the box plots is missing.

      We have added the description.

      (7) Figure 4C, D, legend for the color code is missing.

      We have added color scales legends.

      (8) Figure 5A legend, assuming this should read intra-ripple frequency instead of inter-ripple.

      We corrected the typo.

      (9) Figure 5E, while LC units are not modulated before, it could still be informative to overlay the z-scored firing rate on the same graph for comparison.

      Figure 6E (original Figure 5E) shows overlay for awRipples and isoRipples.

      (10) The discussion states a 4s resolution for cortical state quantification (line 237), but the methods mention 2.5s (line 382).

      We corrected this discrepancy.

      (11) Results, p.5, line 138, Methods and materials, p.13, line 423: 30% in result text but 20% in method, please correct.

      We corrected this discrepancy.

      (12) The manuscript cites the biorxiv version of Osorio-Forero et al., but the paper has been published since then; please update.

      We updated this reference.

      (13) Results, p.2, line 70. The average duration of a session is presented in seconds. Minutes or hours would be more meaningful to the reader.

      We consider this suggestion as optional.

      (14) Figure 2C is not referenced.

      We added the reference to Figure 2C.

      (15) Reference missing line 406.

      We added the reference.

      (16) Lines 352-356: There seems to be an error in the sentence (an extra verb, or an "and" missing somewhere).

      We have corrected this sentence.

      (17) Figure 3C "synchronization".

      We corrected this typo.

      Reviewer #2 (Recommendations for the authors):

      (1) Line 94 states that "A significant peri-ripple decrease in LC-SUA"; however, which test and how many samples were used are unclear.

      We revised this text as follows: “A significant peri-ripple (± 6 s) decrease in LCSUA, detected by the firing suppression exceeding 2 SDs, was observed in 13 of 15 cases (n = 4 rats).”

      (2) Line 96 states that "we calculated the modulation onset, duration, and magnitude". Please define modulation before presenting the comparisons.

      We now illustrate the extraction of quantitative variables in Figure 2D.

      (3) Line 119 states that "we generated surrogate time series for each session by shuffling ripple onset times" which gives the impression that ripple events were shuffled throughout the sleep; however, the method section states that it was jittered within a specific time window for each event. Please clarify the matter.

      We have substantially revised this section to improve clarity and readability.

      (4) Line 120 states that "Comparisons of SI values before and after ripples and surrogate events confirmed that surrogate events preserved the cortical states in which ripples occurred". Ripple power doesn't seem to be different in pre vs post in the shuffled data (Figure 3B). If ripple timing was randomized, please clarify the observation shown in Figure 3C that the shuffled events had higher SI after than before, as also seen in the real SI data? Please also elaborate what specific groups were significantly different in before vs after bars; data, shuffle, or both?

      We have substantially revised this section to improve clarity and readability.

      (5) Line 113 and Figure 3A: Because both LC activity and HPC ripples were correlated to SI, the direct relationship between LC and HPC independent of SI (a covariate) was not clear. The authors might be able to conduct a partial correlation analysis to show this effect.

      We appreciate this suggestion and added the correlation plots in Figures 3 and 7. After careful consideration, we believe that the suggested partial correlation analysis does not contribute substantially beyond the main findings already presented.

      (6) Figure 5A: Inter-ripple frequency needs definition, not provided in the paper nor in the reference paper. The value (180 Hz) suggests a time interval of around 5 ms, which I fail to understand.

      We apologize for this typo. In Figure 6A (original Fig.5A), intra-ripple frequency is plotted. We have corrected this typo in the text and figure legend.

      (7) Figure 5D: Comparison between aw and sp ripples should also be shown. Please explain the dashed line at 10 (y-axis) a.u.

      Figure 6E (original Fig.5E) shows LC activity around awRipples and isoRipples.

      (8) Figure 5E: Legend states aw and iso ripples, but the caption says NREM sleep. Please clarify this matter.

      We have revised Figure 6 legend (original Figure 5).

      (9) Figure 6B: If the spindle time is permuted randomly, why is LC activity in the permuted data still modulated by the spindle times? Can you test the significance of the modulation index of the shuffled data?

      The LC modulation around shuffled time points was not significant. Figure 7C shows LC modulation dynamics around spindles; brown trace showing state-corrected LCMUA trace (after subtraction of LC-MUA around shuffled events).

      (10) Line 203: Is the unit in Hz (events per second) correctly calculated or shown? ~15 events per second seems arbitrarily large.

      We corrected the units for the event rate. We report the mean oscillatory frequency of spindles ~15 Hz, not events per second.

      (11) Line 207 states that "neither the LC-MUA rate nor SIs differed during a 2-sec time window preceding either group of spindles"; however, from Figure 6E, the average trace and errors around them (errors need to be stated clearly, for e.g., SEM or SD) show that they are non-overlapping and different. I suspect tests such as the rank-sum test, which test the difference in the central tendencies (as opposed to the KS test, which tests the overall trend in the distribution of the continuous data), might reveal the difference between these values.

      We compared the absolute (not normalized) LC-MUA rate and SI during 2 sec time window preceding spindle onset and did not find any statistical differences. In Figure 7F, the difference during ~ 2 sec before the spindle onset is due to the z-score normalization to their own baseline.

      We revised the Result text to improve clarity.

      (12) Line 209: Modulation seems to be greater in ripp-spindles as shown in fig 6E-F, yet, the text and the interpretation are the opposite i.e,. iso spindles had greater modulation. Hence, authors might have to provide further clarifications or analyses.

      We corrected the labelling in all plots.

      (13) Line 316: Claims of "suppression of noradrenergic system facilitating the generation of hippocampal ripples and sleep spindles by memory synchrony" are not fully supported by data, as the data seem to be correlational. Also, claims of "preserved LC activity during ripples coinciding with sleep spindles suggest a role for NE in facilitating cross-regional communication underlying memory-related information transfer" lack clarity and contradict the earlier mechanism. Both "suppression" as well as "preservation" of LC neurons are proposed to mechanistically support memory synchrony and/or consolidation in two different brain states (awake and sleep). The authors might need to clarify how both suppression as well as preservation (which I assume is not an activation or positive modulation) of LC neurons can help in memory synchrony or consolidation.

      We revised this part of discussion by making it less speculative.

      Reviewer #3 (Recommendations for the authors):

      I would recommend that the authors optimize their figure and result presentation, as the current version of the manuscript is unclear in several places, limiting the interpretation of results.

      We substantially revised the manuscript to improve the results presentation and readability.

      (1) Multiple results are described but not shown quantitatively. Please plot quantifications and statistics (mean {plus minus} error and individual values) in relevant figures. For example, the results referenced on p. 4 (l. 113-116), p. 5 (l. 129-133, 143-147), p. 6 (l. 159161), p. 7 (l. 188-190), and p. 8 (l. 203-207) should be supported by explicit data plots.

      We have revised the manuscript to ensure all results are supported by quantitative and statistical analyses. We revised figures and legends and added new plots showing individual datapoints.

      (2) Improvements in figures and descriptions are needed. Below are some examples I found:

      (a) All figures with color scales lack labeling of the color axis, i.e., measure and unit.

      We have revised the figures accordingly.

      (b) Use precise labeling of axes such as "ripple-band power" and "LC-MUA firing rate", rather than just "power" and "firing rate".

      We have revised the figures accordingly.

      (c) Figure 1: Indicate behavioral state (wake vs. sleep) in the example trace.

      We have indicated the behavioral state (quiet awake) in the figure legend.

      (d) Define "peri-ripple" windows explicitly (e.g., {plus minus}6 s or {plus minus}30 s).

      We have revised the text and figure legends accordingly.

      (e) Clarify how "modulation magnitude" is calculated (line 96).

      We now illustrate the extraction of quantitative variables in Figure 2D

      (f) Figure 2C: The white overlaid mean trace lacks Y-axis labeling.

      We have added y-axis labeling.

      (g) Figure 3A: The labeling of "amplitude" is confusing when referring to firing frequency.

      We have corrected the figure labelling.

      (h) Figure 4B: Is the X-axis time from ripple onset?

      We have corrected the figure labelling.

      (i) Figure 4C-D lacks an X-axis or color legend.

      We have added x-axis and color legend.

      (j) Figures 5-6: Include tonic firing rates and time scales.

      We have added in the main text the time scales and average firing rates for LC single units and also show it in Supplementary Figure 1. Because the number of neurons contributing to LC multi-unit activity (LC-MUA) is unknown, we avoided averaging absolute firing rates for this signal. For LC-MUA, we implemented a normalization approach in which firing rates (50-ms bins) around ripple were scaled to a baseline period preceding the trigger event (−12 to −10 s). Importantly, unlike z-scoring, this normalization method preserved baseline differences across behavioral states, as shown in new Figure 5.

      (k) Add tonic firing rate baselines where relevant.

      We have added the Supplementary Figure 1 and new Figure 5 showing the difference in the LC baseline firing rate across behavioral states.

      (3) Minor Comments to add more clarity

      (a) Clarify "spike train" selection criteria (Methods, p. 4, line 93).

      We revised the text as follows: “In six out of twenty LC-MUA recordings, we could reliably isolate spikes from a total of 15 single units (LC-SUA, n = 4 rats).”

      (b) Define "EEG transients" (p. 4, line 109) and support with data.

      We revised the text as follows: “Indeed, transient spectral changes in the prefrontal EEG coincided with the occurrence of hippocampal ripples (Figure 2B).”

      (c) You refer to Figure 3E as a histogram (p. 5, line 128), but I believe it shows an average trace.

      We have corrected this typo.

      (d) Standard sleep scoring procedures normally involve EMG measurements (p. 6, line 154).

      We have replaced ‘standard’ with “previously established”.

      (e) Explain how surrogate shuffling preserves the distribution of behavioral states.

      We revised the text as follows: “We first verified that hippocampal LFPs (140– 250 Hz) triggered on these surrogate events lacked the ripple-specific frequency component (Figure 3C), and that the SI state did not differ between real ripples and surrogate events (Figure 3D).”

      (f) You refer to inter-ripple frequency (p. 6, line 168), which suggests time between ripples. Do you mean the "intra-ripple" or simply ripple frequency?

      We have corrected this typo.

      (g) Ensure all references cited in the text (e.g., p. 12, line 406) are included in the bibliography.

      We have updated the bibliography.

      (h) On p. 10, line 304-305 authors refer to observations related to offline memory consolidation. However, the present study does not contain any behavioral memory data.

      We have revised the Discussion to make it less speculative about the role of describe LC dynamics for offline memory consolidation.

      References

      Novitskaya Y, Sara SJ, Logothetis NK, Eschenko O (2016) Ripple-triggered stimulation of the locus coeruleus during post-learning sleep disrupts ripple/spindle coupling and impairs memory consolidation. Learn Mem 23:238-248.

      Yang M, Logothetis NK, Eschenko O (2019) Occurrence of Hippocampal Ripples is Associated with Activity Suppression in the Mediodorsal Thalamic Nucleus. J Neurosci 39:434-444.

      Yang M, Logothetis NK, Eschenko O (2021) Phasic activation of the locus coeruleus attenuates the acoustic startle response by increasing cortical arousal. Sci Rep 11:1409.

    1. eLife Assessment

      This work presents fundamental findings on the probability of use and access of inseticide-treated nets and evaluates the effectiveness of different distribution strategies in six African countries. The authors propose a sophisticated methodological framework that accounts for many sources of uncertainty, providing compelling strength of evidence.

    2. Reviewer #1 (Public review):

      [Editors' note: this version has been assessed by the Reviewing Editor without further input from the original reviewers. The authors have addressed the comments raised in the previous round of review.]

      This paper aims to improve the accuracy of predictions of the impact of ITN strategies by developing a method to estimate duration of ITN access and use over time on a subnational scale from cross-sectional survey data and the numbers ITNs received annually. The subnational estimates are then input into a mathematical model to predict clinical cases under different ITN distribution strategies.

      Strengths:

      The approach is novel and addresses a useful and timely topic. It makes use of available routine data, and has considered all of the relevant components of ITN distributions.

      The authors have made revisions, particularly to the methods, appendices and title - leaving the paper easier to follow, and with a clear, consistent aim. The assumptions are clearly stated.

    3. Reviewer #2 (Public review):

      Summary:

      The authors design a custom Bayesian model to estimate the probabilities of access, use and use given access of insecticide-treated nets in six African countries, providing sub-national estimates and inferring the average duration of ITN use and access. An individual-based model was employed to simulate malaria epidemics and estimate the effectiveness of different ITN distribution strategies. The study finds that the mean probability of use or access did not reach 80% (a universal coverage formerly targeted by WHO) for any of the regions even for biennial campaigns, demonstrates that switching from triennial to biennial distribution campaigns increases population use by 7.9%, and evaluates the impact of employing more efficient ITNs on P. falciparum prevalence.

      Strengths:

      The authors developed a data-driven model that accounts for data collection imperfections and sources of uncertainty while differentiating between ITN use and access. They developed a methodology to infer the timing of mass campaign from publicly available data instead of assuming fixed dates. The probability of use given access allows determining the regions where ITN distribution is least effective. This work can help better inform future interventions by identifying regions where increasing mass campaign frequency or employing better ITNs are most effective. Finally, in addition to insights on ITN access and use for the six countries analyzed, the paper contributes with a methodological framework that can likely be extended to other countries.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      This paper aims to improve the accuracy of predictions of the impact of ITN strategies by developing a method to estimate duration of ITN access and use over time on a subnational scale from cross-sectional survey data and the numbers ITNs received annually. The subnational estimates are then input into a mathematical model to predict clinical cases under different ITN distribution strategies.

      Strengths:

      The approach is novel and addresses a useful and timely topic. It makes use of available routine data, and has considered all of the relevant components of ITN distributions.

      The authors have made revisions, particularly to the methods, appendices and title - leaving the paper easier to follow, and with a clear, consistent aim. The assumptions are clearly stated.

      Weaknesses:

      The weaknesses are shared with other models of a similar complexity - it is not easy for a casual reader to fully understand the model or the implications of the assumptions which were required to be made. That routine data is used is good for availability, but data quality may be an issue in some places.

      Reviewer #2 (Public review):

      Summary:

      The authors design a custom Bayesian model to estimate the probabilities of access, use and use given access of insecticide-treated nets in six African countries, providing sub-national estimates and inferring the average duration of ITN use and access. An individual-based model was employed to simulate malaria epidemics and estimate the effectiveness of different ITN distribution strategies. The study finds that the mean probability of use or access did not reach 80% (a universal coverage formerly targeted by WHO) for any of the regions even for biennial campaigns, demonstrates that switching from triennial to biennial distribution campaigns increases population use by 7.9%, and evaluates the impact of employing more efficient ITNs on P. falciparum prevalence.

      Strengths:

      The authors developed a data-driven model that accounts for data collection imperfections and sources of uncertainty while differentiating between ITN use and access. They developed a methodology to infer the timing of mass campaign from publicly available data instead of assuming fixed dates. The probability of use given access allows determining the regions where ITN distribution is least effective. This work can help better inform future interventions by identifying regions where increasing mass campaign frequency or employing better ITNs are most effective. Finally, in addition to insights on ITN access and use for the six countries analyzed, the paper contributes with a methodological framework that can likely be extended to other countries.

      Weaknesses:

      Since the models employed are rather complex, the methodology description may be hard to follow for some readers. In addition, the models assume many hypotheses, including exponential decay of ITN use/access and narrow prior distributions. It is worth noting that, in the revised version of the manuscript, the authors justified the choice of exponential decay and narrow prior distributions, and made a significant effort to clarify the methodology and the model equations.

      Comments on revised version:

      I appreciate the improvements made to the text. The methodology description is much clearer now. I have no further suggestions.

      We thank the reviewers and editors for their constructive and insightful comments throughout the review process.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      P8 'Improving ITN use' L218 

      The numbers do not seem add up to me. "...increases across all settings of 14.5% (95% CrI:14.5, 14.6), from 41.7%% to 49.6%. Greater increases are predicted to be seen for ITN use with mean use across all settings increasing from 58.0% to 66.2%, an increase of 19.5% CrI (95% CrI:19.5, 19.6)."

      Thank you for highlighting this. We have reviewed all reported results on mean use, access and use given access. The previous text reported a mixture of absolute and relative % changes, as well as a mixture of raw mean estimates across all regions and population-weighted means across regions. In the extract above we had inadvertently mixed different metrics. Given administrative-one regions can vary notably in population between different countries, we have ensured estimates are now consistently reported as population-weighted means, so that countries with finer-scaled administrative-one regions, such as Burkina Faso, do not artificially bias a raw mean estimate across all sub-national regions. We have also reported % changes as absolute percentage-point increases throughout, rather than relative ones to improve clarity.

      Methods p18: There is notation in the text which does not seem to be explained. It is in the appendices, but the appendices should be optional extra information rather than essential for understanding. 

      We have reviewed the text in the main methods to check notation explanations. Following this, we have removed a use of subscript $i$, which is only used in the appendices to explicitly indicate region-specific parameters, and have clarified that lambda is a decay parameter.

      There are assumptions made and these are clearly explained in the text. However, how much the highlighted results rest on the assumptions was not clear, and there was little on this in the discussion. 

      For example, it might seem disappointing that changing from triennial to biennial ITN campaigns would only lead to an increase from 41.7% to 49.6%. The most important assumptions driving this could be clearer. Additionally, after reading I was not sure what the likely consequences of the assumption that ITN are used continuously were.

      We have added some additional text “to the discussion to clarify the modest predicted increase under biennial campaigns may, in part, be influenced by our assumed exponential loss function, and have highlighted that larger increases in mean use could plausibly be predicted under alternative ITN loss functions”. However, we have also commented that our mean use estimates are broadly in agreement with time series modelled estimates by Bertozzi-Villa et al. (2021) who utilised a sigmoidal/smooth-compact loss function.

      In relation to the assumption of continuous use, we have added additional text in the ‘Historical use, access and retention times’ methods section to clarify that “if ITN use were systematically higher during high-transmission rainy seasons, our assumption of continuous use may underestimate the protective impact of ITNs during these periods”. As stated at the start of that paragraph, the data available from DHS surveys was too infrequent to investigate seasonal fluctuations.

      P14 The text seems to imply that current transmission intensity is the only criterion for decisions about interventions. However, it is likely that the reasons for the current intensity, such as vectorial capacity, historical transmission and interventions should also play a role. The wording could reflect this.

      We have added additional text to clarify that current transmission intensity should not be treated as the only criterion for deprioritisation decisions:

      “However, current incidence should be considered alongside the factors that gave rise to that transmission intensity, with caution exercised when deprioritising mass campaigns in areas where historically higher transmission may currently be suppressed by high ITN access, high use given access, or other interventions.”

      Minor points 

      There are several definite numbers in the first paragraph of the Introduction - these are estimates rather than the absolute truth, but the wording does not acknowledge that there is uncertainty.

      We have made minor edits to clarify that these values are estimates rather than exact quantities. Measures of uncertainty, such as credible intervals were not always possible to source; for example, some of these are median estimates inferred from figures in Bertozzi-Villa et al. (2021).

      L634 typo - logisitic 

      Now corrected.

      L1731 typo https://https://

      Now corrected.

      L881 "access at random" - perhaps not the easiest for non-modelers

      We have re-written this to clarify “when ITNs in a household can provide access to more individuals than the number of users, access is assigned at random to non-users within each household under our framework”.

      Appendix 1, table 1: Using alpha for both age and also overdispersion on use or access is of course valid, but I found it a little confusing.

      To avoid confusion, we have added the following clarification in brackets:

      “Meanwhile, the overdispersion parameter, $\alpha_i^0$ (unrelated to the notation for ITN age), controls the variability of the probability of individual access around the mean”

      I suspect that the model was actually fitted in Stan via the R interface rstan (L589, L1151 and elsewhere).

      We have now clarified this throughout.

    1. eLife Assessment

      This convincing contribution addresses a question of practical importance: when collecting tilt-series data, what is the optimal angular step size between successive tilt images? The work provides valuable practical insights into cryo-ET data acquisition by demonstrating that balancing two competing demands - sufficient dose per individual tilt image and fine angular sampling - is essential to achieve high-quality tomographic reconstructions. They demonstrate that tilt-series acquired with finer increments (1-3 degrees) yield superior alignment accuracy and improved template-matching performance,

    2. Reviewer #1 (Public review):

      This work addresses a question of practical importance that had never been systematically analysed in the cryo-ET field: when collecting tilt-series data, what is the optimal angular step size between successive tilt images? Due to the upper limit in electron exposure (100 - 150 e⁻/Ų), this question is important, since finer angular sampling improves attainable reconstruction resolution (Crowther criterion) but reduces the signal-to-noise ratio of each individual image, potentially compromising both image quality and the ability to computationally align successive frames. To address this, the authors designed a thorough benchmarking study comparing five tilt increments (1{degree sign}, 2{degree sign}, 3{degree sign}, 5{degree sign}, and 10{degree sign}) while keeping the total dose and tilt range constant. They evaluated the consequences at every stage of the cryo-ET workflow - from raw image quality and tilt-series alignment, through template matching for ribosome detection, to high-resolution subtomogram averaging - with the goal of providing the community with an evidence-based recommendation for data acquisition.

      The manuscript is well written, and the experimental design is carefully thought out. The work provides valuable practical insights into cryo-ET data acquisition by demonstrating that balancing two competing demands - sufficient dose per individual tilt image and fine angular sampling - is essential to achieve high-quality tomographic reconstructions. The identification of a practical optimum at 3{degree sign} tilt increment is the key contribution of the work. It will be interesting to see in the future whether this optimum shifts for smaller molecular targets, and how emerging tilt interpolation strategies such as cryoTIGER may interact with the choice of experimental angular increment.

      The conclusions of this paper are mostly well supported by data, but some aspects of data analysis need to be clarified and/or extended, including:

      (1) Line 109: The authors state that the tilt range was kept at {plus minus}60{degree sign} relative to the lamella plane. Assuming a typical lamella pre-tilt of ~10{degree sign}, the absolute stage tilt would approach its mechanical limit. Two clarifications would be appreciated: (a) What was the average pre-tilt across all lamellae? (b) How many dark tilt images, if any, were excluded during tomogram reconstruction?

      (2) Line 148: "When analysing tomographic volumes, we found that tomograms from data with a smaller increment displayed higher SNR values (see Fig. 2B)." It would be helpful to specify which comparisons are statistically meaningful (e.g. Mann-Whitney U test?). While the difference between 1{degree sign} and 2{degree sign} appears pronounced, the differences between 2{degree sign}, 3{degree sign}, and 5{degree sign} seem minimal. From my point of view, reporting the mean SNR values +/- standard deviations for each condition would already indicate some significance. Furthermore, since SNR is expected to depend on lamella thickness, it should be clarified whether the average lamella thickness is comparable across the five datasets.

      (3) Line 167: "Indeed, the variation in maximum resolution correlates with lamella thickness across all datasets (see Fig. 2F)." The reported R² values of 0.30 (1{degree sign}), 0.38 (2{degree sign}), 0.66 (3{degree sign}), 0.61 (5{degree sign}), and 0.60 (10{degree sign}) reveal a notably weak linear relationship for the finer tilt increments. It is also difficult to assess whether the lamella thickness distributions are comparable across conditions from the current figures - visually, the 1{degree sign} dataset appears to be based on thinner lamellae, while the 10{degree sign} dataset appears to include thicker samples. A histogram of lamella thickness distributions for each condition, provided as supplementary material, would greatly aid interpretation. Given this thickness dependency, reporting mean +/- standard deviation of lamella thickness per condition is highly appreciated.

      (4) Figure 4: It should be specified which tomogram subsets were used for the Rosenthal-Henderson analysis, whether lamella thickness was taken into account in the subset selection, and whether ribosomes too close to the lamella edges were excluded. Finally, linear fits should be displayed across the full x-axis range for all tilt increments to facilitate direct visual comparison.

      (5) General: Were ribosomes located at the lamella edges excluded from the analysis? As demonstrated in the authors' own prior work (Tuijtel et al., Science Advances, 2024), Ga-FIB milling induces structural damage at the lamella surfaces. To exclude the influence on the STA results, particles near the lamella edges should be removed prior to analysis, and the criteria for this exclusion should be stated explicitly.

      The aim of the authors was to provide the cryo-ET community with an evidence-based recommendation for the choice of tilt increment, and they largely succeeded in this goal. The identification of 3{degree sign} as a practical optimum - balancing sufficient dose per tilt image for effective per-particle refinement with fine enough angular sampling for accurate tilt-series alignment - is well supported by the data and consistent across the multiple quality metrics employed. The conclusion that coarser increments (5{degree sign} and 10{degree sign}) compromise tomogram quality, template matching accuracy, and STA resolution is robust and clearly demonstrated. However, the conclusion rests entirely on a single biological system using ribosomes as the sole molecular target, which are exceptionally favourable due to their abundance, size, and electron contrast. Whether the identified optimum holds for smaller, lower-abundance, or lower-contrast targets remains an open question.

      In future, it would be particularly interesting to test whether emerging tilt interpolation strategies, such as cryoTIGER, which is particularly intriguing, can effectively compensate for coarser experimental angular sampling in post-processing. Here, the optimal experimental increment may shift, and the interaction between these two approaches represents a promising direction for future work. More broadly, as cryo-ET datasets grow larger and public repositories expand, the practical tradeoffs between acquisition time, data storage, and structural quality identified here will become increasingly relevant to the field.

    3. Reviewer #2 (Public review):

      The determination of macromolecular structures directly within their native cellular environment is becoming increasingly routine, making standardized data collection strategies essential. In this manuscript, Tuijtel et al. provide a timely and valuable contribution by benchmarking key acquisition parameters and establishing practical guidelines for in situ cryo-electron tomography (cryo-ET). Critically, the authors present a systematic framework for optimizing data collection to achieve the highest attainable resolution.

      Using Dictyostelium cells as a model system, the authors generate multiple datasets at a constant total dose while varying the tilt increment. They demonstrate that tilt-series acquired with finer increments (1-3 degrees) yield superior alignment accuracy and improved template-matching performance, resulting in higher-quality reconstructions than those collected with coarser increments (5 degrees or above). Furthermore, the authors show that for subtomogram averaging, a 3-degree tilt increment outperforms all other conditions tested, particularly after per-particle refinement as implemented in M.

      Overall, the manuscript is clearly written, and the conclusions are well supported by the data presented. I have no major concerns. There are some minor points that the authors should address, including:

      (1) The phrase "electron optical density distribution" (line 31, Introduction) should be revised to "electrostatic potential" or "Coulomb potential distribution," which more accurately reflects what is measured in cryo-EM/ET.

      (2) The authors state that the maximum tolerable electron dose is approximately 100-150 e⁻/Ų (line 34, Introduction). This is an oversimplification, as bacterial specimens, for example, have been shown to tolerate doses of 200 e⁻/Ų or higher (see Breigel et al., PNAS, 2009; https://www.pnas.org/doi/10.1073/pnas.0905181106#T1). The statement should be revised to reflect this variability.

      (3) Lines 56-57: The authors do not cite their own prior work benchmarking tilt-series acquisition strategies on in vitro samples. This earlier study provides important context and should be referenced and briefly discussed.

    1. Author response:

      Response to Reviewer #1

      Our work builds upon the foundations of what we term the “CM family”, specifically the Connectome Model (CM) introduced by Kovács et al.. This was a deliberate choice, as our objectives substantially overlap with those of works in this family. Moreover, we wished to avoid reinventing the wheel—starting instead from a solid body of work with validations we found convincing (thereby inheriting this solidity) and, importantly, addressing the same research community using a “familiar” conceptual language. We therefore wish to clarify how our contributions indeed constitute new conceptual insights into the genomic specification of neural circuitry.

      The function implemented by a neural circuit clearly depends on how information propagates between its nodes and connections; the contribution of synapses—their number and properties—cannot be neglected when understanding, manipulating, or designing such function. To the best of our understanding, in Kovács et al., the primary objects of interest are binary connectomes (presence or absence of synapses) or weighted connectomes where “in the occasion of multiple [genetic] rules contributing to the same link”, “the weight of each link correspond[s] to the number of rules involved”. In Barabási et al., a “relaxed” version of the CM directly provides weights for an artificial neural network without explicitly specifying how each weight might result from the combination of a specific number of synapses and their respective properties. The random variable formalism and the introduction of conductances that we propose precisely add this further—yet important—element of complexity and representational detail: synaptic multiplicity. This extends existing models with the hope of laying the groundwork for what could, in the distant future, become a technology capable of producing neural circuits genetically programmed to implement a defined function.

      Regarding the proposed validation, we acknowledge its limitations, but we clarify that at the time this work was conducted, to the best of our knowledge, no public datasets existed to perform validation as the reviewer envisions. We therefore did the best that was materially feasible: we assumed the biological correctness of the model (also based on the validations accompanying the models upon which ours was built) and verified, through simulation, that it could be used to obtain genetic variables of interest capable of producing neural agents able to solve a pre-specified task—even with the additional constraint of genetic rules derived from experimental data.

      Response to Reviewer #2

      We address the points raised by Reviewer #2 in the following paragraphs.

      Regarding point (1), we agree with the reviewer that considering single-gene expression features is a simplification, especially in the case of chemical synapses. However, as with the CM, our model can also be extended to account for combinatorial rules. One possibility is to add columns to the X matrix, as many as there are gene expression patterns of interest. For each new column, a function would be defined to compute the expression feature from the expression features of the genes involved in the pattern, and this function would be used to populate the values of the new columns. The O matrix would likewise be updated with the corresponding new probabilities. While such extension is possible, it is important to note that this gives rise to the problem of combinatorial explosion of genetic rules, with the consequent construction of matrices whose dimensionality becomes difficult to handle. Moreover, the biological plausibility of the model would then shift toward how these functions are defined, along with the interpretation of the values contained in the X matrix. Depending on the use case of our model, one possible solution to the combinatorial explosion problem could be to consider only expression patterns valid for synapse formation by extracting this information from available experimental data, thereby restricting the number of rules. We acknowledge that this problem remains open and will require more precise formulations and future work.

      Regarding point (2), Equation (11) can be derived from the assumption that the various synapses between two neurons behave as resistors in parallel. Accepting this, the equivalent conductance Guv, as denoted in the paper, can be expressed as the sum of all conductances between neurons u and v. Moving to the random variable formalism and having defined 𝒢 as the random variable representing the “signed conductance of a synapse randomly selected from the ones that connect neurons u and v”, the equivalent conductance (as a random variable) becomes ℬ·𝒢. Recall that ℬ is the random variable representing the number of synaptic connections between two neurons of interest. At this point, under the further assumption that the random variables ℬ and 𝒢 are independent, the expectation of the equivalent conductance can be calculated as the product of the expected values of ℬ and 𝒢. Equation (11) follows immediately from this. We acknowledge that these assumptions may not correspond to biological reality, but we consider them a reasonable starting point for addressing the problem.

      Finally, we explain the reasons why the baselines suggested by the reviewer are not included in the work. We did not train classical MLPs because the main objective of the work was not to develop new bio-inspired architectures aimed at generically improving the performance of neural networks in RL, and we deemed it an additional source of confusion to propose a comparison that would suggest this direction. The main objective of the work is instead to contribute to the modeling of synaptogenesis and to lay the groundwork for—or advance the state of knowledge of—what will be a future technology that allows us to manipulate it (synaptogenesis). A similar reasoning applies to a potential baseline in which the weight matrix is constructed from Equation (7). Again, the interest is not in verifying that conductances provide a performance advantage, but rather that they are a necessary element for a sufficient level of biological plausibility. Beyond this, the exclusive and direct use of matrix B in the simulation of synaptogenesis introduces a quantization problem as described in the Appendix.

      Response to Reviewer #3

      We believe the concerns raised by the reviewer regarding the weaknesses of the work are legitimate. We wish to emphasize that all claims made in the paper were made in good faith, with the intent to generate enthusiasm for the discipline while avoiding excess or the assertion of anything incorrect or untruthful. Given that the work is inherently interdisciplinary, we recognize that reader expectations depend on their reference community, and we clarify that our primary area of expertise is AI, and that the biological claims were therefore made from this perspective.

    1. eLife Assessment

      This important work employed a recent functional muscle network analysis to evaluate rehabilitation outcomes in post-stroke patients. The research direction is relevant and supported by solid evidence from gross motor function assessment. The framework is a step toward standardized assessment of motor recovery in the rehabilitation process, but future studies would focus on linking functional recovery to muscle interaction biomarkers to provide more physiologically grounded interpretations.

    2. Reviewer #1 (Public review):

      This study addresses an important clinical challenge by proposing muscle network analysis as a tool to evaluate rehabilitation outcomes. The research direction is relevant and the findings suggest further research.

      The revised manuscript included additional methodological details and a supplementary comparison with conventional NMF.

      Comments on latest version:

      No additional comments.